Multimodal Learning 相关度: 9/10

XEmoGPT: An Explainable Multimodal Emotion Recognition Framework with Cue-Level Perception and Reasoning

Hanwen Zhang, Yao Liu, Peiyuan Jiang, Lang Junjie, Xie Jun, Yihui He, Yajiao Deng, Siyu Du, Qiao Liu
arXiv: 2602.05496v1 发布: 2026-02-05 更新: 2026-02-05

AI 摘要

XEmoGPT提出了一种可解释的多模态情感识别框架,提升了情感线索感知和推理能力。

主要贡献

  • 提出XEmoGPT框架,增强情感线索感知和推理
  • 构建大规模情感线索数据集EmoCue,促进线索级推理
  • 引入EmoCue-360指标和EmoCue-Eval基准

方法论

通过VECB和AECB模块增强视频和音频编码器,利用EmoCue数据集训练,提升模型对情感线索的理解和推理能力。

原文摘要

Explainable Multimodal Emotion Recognition plays a crucial role in applications such as human-computer interaction and social media analytics. However, current approaches struggle with cue-level perception and reasoning due to two main challenges: 1) general-purpose modality encoders are pretrained to capture global structures and general semantics rather than fine-grained emotional cues, resulting in limited sensitivity to emotional signals; and 2) available datasets usually involve a trade-off between annotation quality and scale, which leads to insufficient supervision for emotional cues and ultimately limits cue-level reasoning. Moreover, existing evaluation metrics are inadequate for assessing cue-level reasoning performance. To address these challenges, we propose eXplainable Emotion GPT (XEmoGPT), a novel EMER framework capable of both perceiving and reasoning over emotional cues. It incorporates two specialized modules: the Video Emotional Cue Bridge (VECB) and the Audio Emotional Cue Bridge (AECB), which enhance the video and audio encoders through carefully designed tasks for fine-grained emotional cue perception. To further support cue-level reasoning, we construct a large-scale dataset, EmoCue, designed to teach XEmoGPT how to reason over multimodal emotional cues. In addition, we introduce EmoCue-360, an automated metric that extracts and matches emotional cues using semantic similarity, and release EmoCue-Eval, a benchmark of 400 expert-annotated samples covering diverse emotional scenarios. Experimental results show that XEmoGPT achieves strong performance in both emotional cue perception and reasoning.

标签

情感识别 多模态学习 可解释性 情感线索

arXiv 分类

cs.MM cs.AI cs.CV