Multimodal Learning 相关度: 8/10

From Oracle to Noisy Context: Mitigating Contextual Exposure Bias in Speech-LLMs

Xiaoyong Guo, Nanjie Li, Zijie Zeng, Kai Wang, Hao Huang, Haihua Xu, Wei Shi
arXiv: 2603.24034v1 发布: 2026-03-25 更新: 2026-03-25

AI 摘要

该论文提出了一种训练框架,用于缓解语音LLM中上下文暴露偏差问题,提高模型在真实场景下的鲁棒性。

主要贡献

  • 提出上下文暴露偏差问题
  • 提出Teacher Error Knowledge, Context Dropout, DPO三种方法
  • 实验证明该框架在真实历史条件下性能提升

方法论

使用带噪声的Whisper hypotheses作为训练数据,引入Context Dropout正则化,并使用DPO在失败案例上进行优化。

原文摘要

Contextual automatic speech recognition (ASR) with Speech-LLMs is typically trained with oracle conversation history, but relies on error-prone history at inference, causing a train-test mismatch in the context channel that we term contextual exposure bias. We propose a unified training framework to improve robustness under realistic histories: (i) Teacher Error Knowledge by using Whisper large-v3 hypotheses as training-time history, (ii) Context Dropout to regularize over-reliance on history, and (iii) Direct Preference Optimization (DPO) on curated failure cases. Experiments on TED-LIUM 3 (in-domain) and zero-shot LibriSpeech (out-of-domain) show consistent gains under predicted-history decoding. With a two-utterance history as context, SFT with Whisper hypotheses reduce WER from 5.59% (oracle-history training) to 5.47%, and DPO further improves to 5.17%. Under irrelevant-context attacks, DPO yields the smallest degradation (5.17% -> 5.63%), indicating improved robustness to misleading context. Our code and models are published on https://github.com/XYGuo1996/Contextual_Speech_LLMs.

标签

Speech-LLM ASR Contextual Bias Robustness Direct Preference Optimization

arXiv 分类

cs.CL cs.AI