Multimodal Learning 相关度: 9/10

Tri-Subspaces Disentanglement for Multimodal Sentiment Analysis

Chunlei Meng, Jiabin Luo, Zhenglin Yan, Zhenyu Yu, Rong Fu, Zhongxue Gan, Chun Ouyang

arXiv: 2602.19585v1 发布: 2026-02-23 更新: 2026-02-23

下载 PDF arXiv 页面

AI 摘要

提出了Tri-Subspace Disentanglement框架，通过解耦子空间提升多模态情感分析性能。

主要贡献

提出Tri-Subspace Disentanglement (TSD) 框架
设计Subspace-Aware Cross-Attention (SACA) 融合模块
在CMU-MOSI和CMU-MOSEI数据集上取得SOTA结果

方法论

将特征分解为共同、子模态共享和私有子空间，并使用解耦监督和结构化正则化保持子空间独立性。使用SACA进行融合。

原文摘要

Multimodal Sentiment Analysis (MSA) integrates language, visual, and acoustic modalities to infer human sentiment. Most existing methods either focus on globally shared representations or modality-specific features, while overlooking signals that are shared only by certain modality pairs. This limits the expressiveness and discriminative power of multimodal representations. To address this limitation, we propose a Tri-Subspace Disentanglement (TSD) framework that explicitly factorizes features into three complementary subspaces: a common subspace capturing global consistency, submodally-shared subspaces modeling pairwise cross-modal synergies, and private subspaces preserving modality-specific cues. To keep these subspaces pure and independent, we introduce a decoupling supervisor together with structured regularization losses. We further design a Subspace-Aware Cross-Attention (SACA) fusion module that adaptively models and integrates information from the three subspaces to obtain richer and more robust representations. Experiments on CMU-MOSI and CMU-MOSEI demonstrate that TSD achieves state-of-the-art performance across all key metrics, reaching 0.691 MAE on CMU-MOSI and 54.9% ACC-7 on CMU-MOSEI, and also transfers well to multimodal intent recognition tasks. Ablation studies confirm that tri-subspace disentanglement and SACA jointly enhance the modeling of multi-granular cross-modal sentiment cues.

arXiv 分类

cs.MM cs.AI

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类