Multimodal Learning 相关度: 8/10

Distributed Partial Information Puzzles: Examining Common Ground Construction Under Epistemic Asymmetry

Yifan Zhu, Mariah Bradford, Kenneth Lai, Timothy Obiso, Videep Venkatesha, James Pustejovsky, Nikhil Krishnaswamy
arXiv: 2603.05450v1 发布: 2026-03-05 更新: 2026-03-05

AI 摘要

论文研究了AI在多模态协同场景下构建共同基础的难题,并提出了DPIP数据集进行评估。

主要贡献

  • 提出了DPIP协同任务和多模态数据集
  • 评估了LLMs和DEL在共同基础建模上的表现
  • 揭示了LLMs在跟踪任务进展和信念状态方面的不足

方法论

设计DPIP任务收集多模态数据,分别用LLMs和DEL方法建模共同基础,并进行对比评估。

原文摘要

Establishing common ground, a shared set of beliefs and mutually recognized facts, is fundamental to collaboration, yet remains a challenge for current AI systems, especially in multimodal, multiparty settings, where the collaborators bring different information to the table. We introduce the Distributed Partial Information Puzzle (DPIP), a collaborative construction task that elicits rich multimodal communication under epistemic asymmetry. We present a multimodal dataset of these interactions, annotated and temporally aligned across speech, gesture, and action modalities to support reasoning over propositional content and belief dynamics. We then evaluate two paradigms for modeling common ground (CG): (1) state-of-the-art large language models (LLMs), prompted to infer shared beliefs from multimodal updates, and (2) an axiomatic pipeline grounded in Dynamic Epistemic Logic (DEL) that incrementally performs the same task. Results on the annotated DPIP data indicate that it poses a challenge to modern LLMs' abilities to track both task progression and belief state.

标签

共同基础 多模态 认知推理 语言模型

arXiv 分类

cs.AI cs.CL