Multi-agent cooperation through in-context co-player inference
AI 摘要
论文提出利用序列模型的上下文学习能力,通过多智能体合作训练,实现无需硬编码的智能体间合作。
主要贡献
- 提出利用序列模型进行上下文学习以实现智能体合作
- 证明了在上下文学习中,智能体易受勒索的特性促进了合作
- 表明标准分散强化学习结合合作者多样性是学习合作行为的可扩展方法
方法论
训练序列模型智能体对抗不同的合作者,使其在情节内快速学习,并通过上下文学习适应对方,从而实现合作。
原文摘要
Achieving cooperation among self-interested agents remains a fundamental challenge in multi-agent reinforcement learning. Recent work showed that mutual cooperation can be induced between "learning-aware" agents that account for and shape the learning dynamics of their co-players. However, existing approaches typically rely on hardcoded, often inconsistent, assumptions about co-player learning rules or enforce a strict separation between "naive learners" updating on fast timescales and "meta-learners" observing these updates. Here, we demonstrate that the in-context learning capabilities of sequence models allow for co-player learning awareness without requiring hardcoded assumptions or explicit timescale separation. We show that training sequence model agents against a diverse distribution of co-players naturally induces in-context best-response strategies, effectively functioning as learning algorithms on the fast intra-episode timescale. We find that the cooperative mechanism identified in prior work-where vulnerability to extortion drives mutual shaping-emerges naturally in this setting: in-context adaptation renders agents vulnerable to extortion, and the resulting mutual pressure to shape the opponent's in-context learning dynamics resolves into the learning of cooperative behavior. Our results suggest that standard decentralized reinforcement learning on sequence models combined with co-player diversity provides a scalable path to learning cooperative behaviors.