AI Agents 相关度: 8/10

WavBench: Benchmarking Reasoning, Colloquialism, and Paralinguistics for End-to-End Spoken Dialogue Models

Yangzhuo Li, Shengpeng Ji, Yifu Chen, Tianle Liang, Haorong Ying, Yule Wang, Junbo Li, Jun Fang, Zhou Zhao
arXiv: 2602.12135v1 发布: 2026-02-12 更新: 2026-02-12

AI 摘要

WavBench是一个用于评估端到端口语对话模型推理、口语化和副语言能力的综合基准。

主要贡献

  • 提出了WavBench基准,包含Pro, Basic, Acoustic三个子集
  • 定义了口语化听觉质量的新标准
  • 提供了对现有模型的在推理、口语化和副语言能力方面的评估

方法论

WavBench通过构建包含推理、口语化和副语言三个方面的对话数据集,评估SOTA模型。

原文摘要

With the rapid integration of advanced reasoning capabilities into spoken dialogue models, the field urgently demands benchmarks that transcend simple interactions to address real-world complexity. However, current evaluations predominantly adhere to text-generation standards, overlooking the unique audio-centric characteristics of paralinguistics and colloquialisms, alongside the cognitive depth required by modern agents. To bridge this gap, we introduce WavBench, a comprehensive benchmark designed to evaluate realistic conversational abilities where prior works fall short. Uniquely, WavBench establishes a tripartite framework: 1) Pro subset, designed to rigorously challenge reasoning-enhanced models with significantly increased difficulty; 2) Basic subset, defining a novel standard for spoken colloquialism that prioritizes "listenability" through natural vocabulary, linguistic fluency, and interactive rapport, rather than rigid written accuracy; and 3) Acoustic subset, covering explicit understanding, generation, and implicit dialogue to rigorously evaluate comprehensive paralinguistic capabilities within authentic real-world scenarios. Through evaluating five state-of-the-art models, WavBench offers critical insights into the intersection of complex problem-solving, colloquial delivery, and paralinguistic fidelity, guiding the evolution of robust spoken dialogue models. The benchmark dataset and evaluation toolkit are available at https://naruto-2024.github.io/wavbench.github.io/.

标签

spoken dialogue benchmark reasoning colloquialism paralinguistics

arXiv 分类

cs.CL