A Scalable Approach to Solving Simulation-Based Network Security Games
AI 摘要
MetaDOAR通过分层学习和缓存优化,提升了大规模网络安全博弈中的多智能体强化学习性能。
主要贡献
- 提出了MetaDOAR框架,结合双重预言机/PSRO范式。
- 引入了基于学习的、分区感知的过滤层,减少搜索空间。
- 利用Q值缓存,减少冗余计算,提高决策质量。
方法论
使用结构化嵌入学习紧凑的状态表示,结合束搜索和评论家网络,通过LRU缓存提高效率。
原文摘要
We introduce MetaDOAR, a lightweight meta-controller that augments the Double Oracle / PSRO paradigm with a learned, partition-aware filtering layer and Q-value caching to enable scalable multi-agent reinforcement learning on very large cyber-network environments. MetaDOAR learns a compact state projection from per node structural embeddings to rapidly score and select a small subset of devices (a top-k partition) on which a conventional low-level actor performs focused beam search utilizing a critic agent. Selected candidate actions are evaluated with batched critic forwards and stored in an LRU cache keyed by a quantized state projection and local action identifiers, dramatically reducing redundant critic computation while preserving decision quality via conservative k-hop cache invalidation. Empirically, MetaDOAR attains higher player payoffs than SOTA baselines on large network topologies, without significant scaling issues in terms of memory usage or training time. This contribution provide a practical, theoretically motivated path to efficient hierarchical policy learning for large-scale networked decision problems.