Multimodal Learning 相关度: 9/10

Quantifying Cross-Modal Interactions in Multimodal Glioma Survival Prediction via InterSHAP: Evidence for Additive Signal Integration

Iain Swift, JingHua Ye, Ruairi O'Reilly
arXiv: 2603.29977v1 发布: 2026-03-31 更新: 2026-03-31

AI 摘要

通过InterSHAP量化多模态融合中信号交互,发现性能提升源于互补信号聚合而非协同作用。

主要贡献

  • 验证了多模态融合性能提升不一定源于跨模态协同作用
  • 提出了基于InterSHAP的量化多模态交互的方法
  • 揭示了Glioma生存预测中WSI和RNA-seq信号的加性贡献

方法论

使用InterSHAP量化WSI和RNA-seq特征融合的跨模态交互,并进行方差分解分析贡献。

原文摘要

Multimodal deep learning for cancer prognosis is commonly assumed to benefit from synergistic cross-modal interactions, yet this assumption has not been directly tested in survival prediction settings. This work adapts InterSHAP, a Shapley interaction index-based metric, from classification to Cox proportional hazards models and applies it to quantify cross-modal interactions in glioma survival prediction. Using TCGA-GBM and TCGA-LGG data (n=575), we evaluate four fusion architectures combining whole-slide image (WSI) and RNA-seq features. Our central finding is an inverse relationship between predictive performance and measured interaction: architectures achieving superior discrimination (C-index 0.64$\to$0.82) exhibit equivalent or lower cross-modal interaction (4.8\%$\to$3.0\%). Variance decomposition reveals stable additive contributions across all architectures (WSI${\approx}$40\%, RNA${\approx}$55\%, Interaction${\approx}$4\%), indicating that performance gains arise from complementary signal aggregation rather than learned synergy. These findings provide a practical model auditing tool for comparing fusion strategies, reframe the role of architectural complexity in multimodal fusion, and have implications for privacy-preserving federated deployment.

标签

多模态学习 生存预测 可解释性 InterSHAP

arXiv 分类

cs.LG cs.AI q-bio.QM