Multimodal Learning 相关度: 8/10

Gesturing Toward Abstraction: Multimodal Convention Formation in Collaborative Physical Tasks

Kiyosu Maeda, William P. McCarthy, Ching-Yi Tsai, Jeffrey Mu, Haoliang Wang, Robert D. Hawkins, Judith E. Fan, Parastoo Abtahi
arXiv: 2602.08914v1 发布: 2026-02-09 更新: 2026-02-09

AI 摘要

研究人机协作中语言和手势如何演化为高效的共享抽象,并构建多模态协同模型。

主要贡献

  • 揭示了物理协作中语言和手势抽象的形成机制
  • 提出了多模态环境下的概率性约定形成模型
  • 为设计具有约定意识的智能体提供基础

方法论

通过线上语言实验和线下增强现实协作实验,观察参与者如何通过语言和手势建立共享抽象。

原文摘要

A quintessential feature of human intelligence is the ability to create ad hoc conventions over time to achieve shared goals efficiently. We investigate how communication strategies evolve through repeated collaboration as people coordinate on shared procedural abstractions. To this end, we conducted an online unimodal study (n = 98) using natural language to probe abstraction hierarchies. In a follow-up lab study (n = 40), we examined how multimodal communication (speech and gestures) changed during physical collaboration. Pairs used augmented reality to isolate their partner's hand and voice; one participant viewed a 3D virtual tower and sent instructions to the other, who built the physical tower. Participants became faster and more accurate by establishing linguistic and gestural abstractions and using cross-modal redundancy to emphasize key changes from previous interactions. Based on these findings, we extend probabilistic models of convention formation to multimodal settings, capturing shifts in modality preferences. Our findings and model provide building blocks for designing convention-aware intelligent agents situated in the physical world.

标签

人机协作 多模态沟通 约定形成 增强现实

arXiv 分类

cs.HC cs.AI