Multimodal Learning 相关度: 9/10

CLCR: Cross-Level Semantic Collaborative Representation for Multimodal Learning

Chunlei Meng, Guanhong Huang, Rong Fu, Runmin Jian, Zhongxue Gan, Chun Ouyang
arXiv: 2602.19605v1 发布: 2026-02-23 更新: 2026-02-23

AI 摘要

CLCR通过跨层语义协同表示,解决了多模态学习中语义不对齐和误差传播的问题,提升了表征质量。

主要贡献

  • 提出跨层语义协同表示(CLCR)框架
  • 设计层内协同交换域(IntraCED)和层间协同聚合域(InterCAD)
  • 引入正则化项增强共享/私有特征分离,减少跨层干扰

方法论

将模态特征组织成三层语义层级,通过层内共享/私有空间分解和层间选择性融合,构建紧凑的任务表示。

原文摘要

Multimodal learning aims to capture both shared and private information from multiple modalities. However, existing methods that project all modalities into a single latent space for fusion often overlook the asynchronous, multi-level semantic structure of multimodal data. This oversight induces semantic misalignment and error propagation, thereby degrading representation quality. To address this issue, we propose Cross-Level Co-Representation (CLCR), which explicitly organizes each modality's features into a three-level semantic hierarchy and specifies level-wise constraints for cross-modal interactions. First, a semantic hierarchy encoder aligns shallow, mid, and deep features across modalities, establishing a common basis for interaction. And then, at each level, an Intra-Level Co-Exchange Domain (IntraCED) factorizes features into shared and private subspaces and restricts cross-modal attention to the shared subspace via a learnable token budget. This design ensures that only shared semantics are exchanged and prevents leakage from private channels. To integrate information across levels, the Inter-Level Co-Aggregation Domain (InterCAD) synchronizes semantic scales using learned anchors, selectively fuses the shared representations, and gates private cues to form a compact task representation. We further introduce regularization terms to enforce separation of shared and private features and to minimize cross-level interference. Experiments on six benchmarks spanning emotion recognition, event localization, sentiment analysis, and action recognition show that CLCR achieves strong performance and generalizes well across tasks.

标签

多模态学习 跨模态对齐 语义协同表示 共享/私有特征

arXiv 分类

cs.CV cs.AI cs.MM