Multimodal Learning 相关度: 9/10

Follow the Clues, Frame the Truth: Hybrid-evidential Deductive Reasoning in Open-Vocabulary Multimodal Emotion Recognition

Yu Liu, Lei Zhang, Haoxun Li, Hanlei Shi, Yuxuan Ding, Leyuan Qu, Taihao Li
arXiv: 2603.16463v1 发布: 2026-03-17 更新: 2026-03-17

AI 摘要

HyDRA通过混合证据演绎推理解决开放词汇多模态情感识别中的歧义性问题,并提供可解释的证据。

主要贡献

  • 提出HyDRA,一种混合证据演绎推理架构
  • 采用强化学习进行分层奖励塑造,优化推理轨迹
  • 在模糊或冲突场景下,HyDRA优于现有基线

方法论

采用Propose-Verify-Decide协议,使用强化学习优化推理过程,将推理轨迹与任务性能对齐。

原文摘要

Open-Vocabulary Multimodal Emotion Recognition (OV-MER) is inherently challenging due to the ambiguity of equivocal multimodal cues, which often stem from distinct unobserved situational dynamics. While Multimodal Large Language Models (MLLMs) offer extensive semantic coverage, their performance is often bottlenecked by premature commitment to dominant data priors, resulting in suboptimal heuristics that overlook crucial, complementary affective cues across modalities. We argue that effective affective reasoning requires more than surface-level association; it necessitates reconstructing nuanced emotional states by synthesizing multiple evidence-grounded rationales that reconcile these observations from diverse latent perspectives. We introduce HyDRA, a Hybrid-evidential Deductive Reasoning Architecture that formalizes inference as a Propose-Verify-Decide protocol. To internalize this abductive process, we employ reinforcement learning with hierarchical reward shaping, aligning the reasoning trajectories with final task performance to ensure they best reconcile the observed multimodal cues. Systematic evaluations validate our design choices, with HyDRA consistently outperforming strong baselines--especially in ambiguous or conflicting scenarios--while providing interpretable, diagnostic evidence traces.

标签

Multimodal Emotion Recognition Deductive Reasoning Reinforcement Learning Hybrid-evidential Reasoning

arXiv 分类

cs.AI cs.HC