Tri-Subspaces Disentanglement for Multimodal Sentiment Analysis
AI 摘要
提出了Tri-Subspace Disentanglement框架,通过解耦子空间提升多模态情感分析性能。
主要贡献
- 提出Tri-Subspace Disentanglement (TSD) 框架
- 设计Subspace-Aware Cross-Attention (SACA) 融合模块
- 在CMU-MOSI和CMU-MOSEI数据集上取得SOTA结果
方法论
将特征分解为共同、子模态共享和私有子空间,并使用解耦监督和结构化正则化保持子空间独立性。使用SACA进行融合。
原文摘要
Multimodal Sentiment Analysis (MSA) integrates language, visual, and acoustic modalities to infer human sentiment. Most existing methods either focus on globally shared representations or modality-specific features, while overlooking signals that are shared only by certain modality pairs. This limits the expressiveness and discriminative power of multimodal representations. To address this limitation, we propose a Tri-Subspace Disentanglement (TSD) framework that explicitly factorizes features into three complementary subspaces: a common subspace capturing global consistency, submodally-shared subspaces modeling pairwise cross-modal synergies, and private subspaces preserving modality-specific cues. To keep these subspaces pure and independent, we introduce a decoupling supervisor together with structured regularization losses. We further design a Subspace-Aware Cross-Attention (SACA) fusion module that adaptively models and integrates information from the three subspaces to obtain richer and more robust representations. Experiments on CMU-MOSI and CMU-MOSEI demonstrate that TSD achieves state-of-the-art performance across all key metrics, reaching 0.691 MAE on CMU-MOSI and 54.9% ACC-7 on CMU-MOSEI, and also transfers well to multimodal intent recognition tasks. Ablation studies confirm that tri-subspace disentanglement and SACA jointly enhance the modeling of multi-granular cross-modal sentiment cues.