AI Agents 相关度: 9/10

Matching Multiple Experts: On the Exploitability of Multi-Agent Imitation Learning

Antoine Bergerault, Volkan Cevher, Negar Mehr
arXiv: 2602.21020v1 发布: 2026-02-24 更新: 2026-02-24

AI 摘要

研究多智能体模仿学习中策略的纳什均衡差距,并提出在特定条件下降低差距的方法。

主要贡献

  • 证明了通用马尔可夫博弈中学习低可利用策略的困难性
  • 提出利用专家均衡的策略优势假设来克服挑战
  • 基于行为克隆误差和贴现因子,给出了纳什模仿差距的上限

方法论

通过提供反例和硬度结果,分析纳什均衡差距,并在特定假设下推导差距的理论上限。

原文摘要

Multi-agent imitation learning (MA-IL) aims to learn optimal policies from expert demonstrations of interactions in multi-agent interactive domains. Despite existing guarantees on the performance of the resulting learned policies, characterizations of how far the learned polices are from a Nash equilibrium are missing for offline MA-IL. In this paper, we demonstrate impossibility and hardness results of learning low-exploitable policies in general $n$-player Markov Games. We do so by providing examples where even exact measure matching fails, and demonstrating a new hardness result on characterizing the Nash gap given a fixed measure matching error. We then show how these challenges can be overcome using strategic dominance assumptions on the expert equilibrium. Specifically, for the case of dominant strategy expert equilibria, assuming Behavioral Cloning error $ε_{\text{BC}}$, this provides a Nash imitation gap of $\mathcal{O}\left(nε_{\text{BC}}/(1-γ)^2\right)$ for a discount factor $γ$. We generalize this result with a new notion of best-response continuity, and argue that this is implicitly encouraged by standard regularization techniques.

标签

Multi-Agent Imitation Learning Nash Equilibrium Game Theory

arXiv 分类

cs.LG cs.GT cs.MA