Multimodal Learning 相关度: 8/10

Disentangling Reasoning in Large Audio-Language Models for Ambiguous Emotion Prediction

Xiaofeng Yu, Jiaheng Dong, Jean Honorio, Abhirup Ghosh, Hong Jia, Ting Dang
arXiv: 2603.08230v1 发布: 2026-03-09 更新: 2026-03-09

AI 摘要

提出一种面向LALM的歧义情感识别方法,通过分布推理和链式思考提升模型对复杂情感的理解。

主要贡献

  • 提出歧义感知目标函数,对齐预测与人类感知分布
  • 提出结构化的歧义感知链式思考监督,引导情感线索推理
  • 在IEMOCAP和CREMA-D数据集上验证了方法的有效性

方法论

重构歧义情感识别为分布推理问题,通过歧义感知目标和链式思考监督提升LALM的推理能力。

原文摘要

Speech emotion recognition plays an important role in various applications. However, most existing approaches predict a single emotion label, oversimplifying the inherently ambiguous nature of human emotional expression. Recent large audio-language models show promise in generating richer outputs, but their reasoning ability for ambiguous emotional understanding remains limited. In this work, we reformulate ambiguous emotion recognition as a distributional reasoning problem and present the first systematic study of ambiguity-aware reasoning in LALMs. Our framework comprises two complementary components: an ambiguity-aware objective that aligns predictions with human perceptual distributions, and a structured ambiguity-aware chain-of-thought supervision that guides reasoning over emotional cues. Experiments on IEMOCAP and CREMA-D demonstrate consistent improvements across SFT, DPO, and GRPO training strategies.

标签

情感识别 音频语言模型 歧义情感 链式思考 分布推理

arXiv 分类

cs.SD cs.AI eess.AS