AI Agents 相关度: 9/10

CRAFT: Grounded Multi-Agent Coordination Under Partial Information

Abhijnan Nath, Hannah VanderHoeven, Nikhil Krishnaswamy
arXiv: 2603.25268v1 发布: 2026-03-26 更新: 2026-03-26

AI 摘要

CRAFT是一个多智能体benchmark,评估在部分信息下LLM的协同和实用沟通能力。

主要贡献

  • 提出了CRAFT基准,用于评估LLM在部分信息下的多智能体协同能力
  • 提出了一个诊断框架,将失败分解为空间接地、信念建模和实用沟通错误
  • 发现更强的推理能力并不一定转化为更好的协同

方法论

构建了一个多智能体3D结构构建任务,通过自然语言沟通,分析不同模型在该任务上的表现并诊断错误。

原文摘要

We introduce CRAFT, a multi-agent benchmark for evaluating pragmatic communication in large language models under strict partial information. In this setting, multiple agents with complementary but incomplete views must coordinate through natural language to construct a shared 3D structure that no single agent can fully observe. We formalize this problem as a multi-sender pragmatic reasoning task and provide a diagnostic framework that decomposes failures into spatial grounding, belief modeling and pragmatic communication errors, including a taxonomy of behavioral failure profiles in both frontier and open-weight models. Across a diverse set of models, including 8 open-weight and 7 frontier including reasoning models, we find that stronger reasoning ability does not reliably translate to better coordination: smaller open-weight models often match or outperform frontier systems, and improved individual communication does not guarantee successful collaboration. These results suggest that multi-agent coordination remains a fundamentally unsolved challenge for current language models. Our code can be found at https://github.com/csu-signal/CRAFT

标签

多智能体 协同 语言模型 基准 实用沟通

arXiv 分类

cs.CL cs.AI