Disentangling Reasoning in Large Audio-Language Models for Ambiguous Emotion Prediction
AI 摘要
提出一种面向LALM的歧义情感识别方法,通过分布推理和链式思考提升模型对复杂情感的理解。
主要贡献
- 提出歧义感知目标函数,对齐预测与人类感知分布
- 提出结构化的歧义感知链式思考监督,引导情感线索推理
- 在IEMOCAP和CREMA-D数据集上验证了方法的有效性
方法论
重构歧义情感识别为分布推理问题,通过歧义感知目标和链式思考监督提升LALM的推理能力。
原文摘要
Speech emotion recognition plays an important role in various applications. However, most existing approaches predict a single emotion label, oversimplifying the inherently ambiguous nature of human emotional expression. Recent large audio-language models show promise in generating richer outputs, but their reasoning ability for ambiguous emotional understanding remains limited. In this work, we reformulate ambiguous emotion recognition as a distributional reasoning problem and present the first systematic study of ambiguity-aware reasoning in LALMs. Our framework comprises two complementary components: an ambiguity-aware objective that aligns predictions with human perceptual distributions, and a structured ambiguity-aware chain-of-thought supervision that guides reasoning over emotional cues. Experiments on IEMOCAP and CREMA-D demonstrate consistent improvements across SFT, DPO, and GRPO training strategies.