Multimodal Learning 相关度: 8/10

From Oracle to Noisy Context: Mitigating Contextual Exposure Bias in Speech-LLMs

Xiaoyong Guo, Nanjie Li, Zijie Zeng, Kai Wang, Hao Huang, Haihua Xu, Wei Shi

arXiv: 2603.24034v1 发布: 2026-03-25 更新: 2026-03-25

下载 PDF arXiv 页面

AI 摘要

该论文提出了一种训练框架，用于缓解语音LLM中上下文暴露偏差问题，提高模型在真实场景下的鲁棒性。

主要贡献

提出上下文暴露偏差问题
提出Teacher Error Knowledge, Context Dropout, DPO三种方法
实验证明该框架在真实历史条件下性能提升

方法论

使用带噪声的Whisper hypotheses作为训练数据，引入Context Dropout正则化，并使用DPO在失败案例上进行优化。

原文摘要

Contextual automatic speech recognition (ASR) with Speech-LLMs is typically trained with oracle conversation history, but relies on error-prone history at inference, causing a train-test mismatch in the context channel that we term contextual exposure bias. We propose a unified training framework to improve robustness under realistic histories: (i) Teacher Error Knowledge by using Whisper large-v3 hypotheses as training-time history, (ii) Context Dropout to regularize over-reliance on history, and (iii) Direct Preference Optimization (DPO) on curated failure cases. Experiments on TED-LIUM 3 (in-domain) and zero-shot LibriSpeech (out-of-domain) show consistent gains under predicted-history decoding. With a two-utterance history as context, SFT with Whisper hypotheses reduce WER from 5.59% (oracle-history training) to 5.47%, and DPO further improves to 5.17%. Under irrelevant-context attacks, DPO yields the smallest degradation (5.17% -> 5.63%), indicating improved robustness to misleading context. Our code and models are published on https://github.com/XYGuo1996/Contextual_Speech_LLMs.

arXiv 分类

cs.CL cs.AI

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类