AI Agents 相关度: 9/10

SpokenUS: A Spoken User Simulator for Task-Oriented Dialogue

Jonggeun Lee, Junseong Pyo, Jeongmin Park, Yohan Jo
arXiv: 2603.16783v1 发布: 2026-03-17 更新: 2026-03-17

AI 摘要

论文提出了SpokenTOD数据集和SpokenUS口语用户模拟器,用于提升口语对话系统的鲁棒性。

主要贡献

  • 构建了大规模口语任务导向对话数据集SpokenTOD
  • 提出了具有Barge-in机制的口语用户模拟器SpokenUS
  • 验证了SpokenUS对下游智能体的挑战性,可用于训练和评估

方法论

通过数据增强构建SpokenTOD,并在其基础上,设计了专用于Barge-in的SpokenUS模型,最后进行了人工和自动评估。

原文摘要

Robust task-oriented spoken dialogue agents require exposure to the full diversity of how people interact through speech. Building spoken user simulators that address this requires large-scale spoken task-oriented dialogue (TOD) data encompassing spoken user behaviors, yet existing datasets are limited in scale and domain coverage, with no systematic pipeline for augmenting them. To address this, we introduce \textbf{SpokenTOD}, a spoken TOD dataset of 52,390 dialogues and 1,034 hours of speech augmented with four spoken user behaviors -- cross-turn slots, barge-in, disfluency, and emotional prosody -- across diverse speakers and domains. Building on SpokenTOD, we present \textbf{SpokenUS}, a spoken user simulator grounded in TOD with a dedicated architecture for barge-in. SpokenUS achieves comparable goal coverage to significantly larger models while substantially outperforming all baselines in Human MOS, disclosing slot values gradually across the dialogue as humans do rather than front-loading them. Further analysis confirms that SpokenUS's spoken behaviors pose meaningful challenges to downstream agents, making it a practical tool for training and evaluating more robust spoken dialogue systems.

标签

口语对话系统 用户模拟器 数据集 Barge-in

arXiv 分类

cs.CL