GRASP: Gradient Realignment via Active Shared Perception for Multi-Agent Collaborative Optimization
AI 摘要
GRASP通过主动共享感知实现梯度对齐,优化多智能体协作,提升收敛速度。
主要贡献
- 提出GRASP框架,通过主动共享感知实现梯度对齐
- 定义广义Bellman均衡作为稳定目标
- 理论证明了共识方向的存在性和可达性
方法论
利用智能体的独立梯度推导共识梯度,实现智能体的主动感知和团队协作优化。
原文摘要
Non-stationarity arises from concurrent policy updates and leads to persistent environmental fluctuations. Existing approaches like Centralized Training with Decentralized Execution (CTDE) and sequential update schemes mitigate this issue. However, since the perception of the policies of other agents remains dependent on sampling environmental interaction data, the agent essentially operates in a passive perception state. This inevitably triggers equilibrium oscillations and significantly slows the convergence speed of the system. To address this issue, we propose Gradient Realignment via Active Shared Perception (GRASP), a novel framework that defines generalized Bellman equilibrium as a stable objective for policy evolution. The core mechanism of GRASP involves utilizing the independent gradients of agents to derive a defined consensus gradient, enabling agents to actively perceive policy updates and optimize team collaboration. Theoretically, we leverage the Kakutani Fixed-Point Theorem to prove that the consensus direction $u^*$ guarantees the existence and attainability of this equilibrium. Extensive experiments on StarCraft II Multi-Agent Challenge (SMAC) and Google Research Football (GRF) demonstrate the scalability and promising performance of the framework.