Multimodal Learning 相关度: 9/10

KG-CMI: Knowledge graph enhanced cross-Mamba interaction for medical visual question answering

Xianyao Zheng, Hong Yu, Hui Cui, Changming Sun, Xiangyu Li, Ran Su, Leyi Wei, Jia Zhou, Junbo Wang, Qiangguo Jin
arXiv: 2604.00601v1 发布: 2026-04-01 更新: 2026-04-01

AI 摘要

提出KG-CMI框架,融合知识图谱和Mamba交互,提升医学VQA性能,并实现自由形式答案生成。

主要贡献

  • 提出知识图谱增强的跨模态Mamba交互框架KG-CMI
  • 设计细粒度跨模态特征对齐模块FCFA
  • 利用自由形式答案增强多任务学习FAMT

方法论

构建KG-CMI框架,利用FCFA对齐特征,KGE嵌入知识图谱,CMIR进行跨模态交互,FAMT生成自由形式答案。

原文摘要

Medical visual question answering (Med-VQA) is a crucial multimodal task in clinical decision support and telemedicine. Recent methods fail to fully leverage domain-specific medical knowledge, making it difficult to accurately associate lesion features in medical images with key diagnostic criteria. Additionally, classification-based approaches typically rely on predefined answer sets. Treating Med-VQA as a simple classification problem limits its ability to adapt to the diversity of free-form answers and may overlook detailed semantic information in those answers. To address these challenges, we propose a knowledge graph enhanced cross-Mamba interaction (KG-CMI) framework, which consists of a fine-grained cross-modal feature alignment (FCFA) module, a knowledge graph embedding (KGE) module, a cross-modal interaction representation (CMIR) module, and a free-form answer enhanced multi-task learning (FAMT) module. The KG-CMI learns cross-modal feature representations for images and texts by effectively integrating professional medical knowledge through a graph, establishing associations between lesion features and disease knowledge. Moreover, FAMT leverages auxiliary knowledge from open-ended questions, improving the model's capability for open-ended Med-VQA. Experimental results demonstrate that KG-CMI outperforms existing state-of-the-art methods on three Med-VQA datasets, i.e., VQA-RAD, SLAKE, and OVQA. Additionally, we conduct interpretability experiments to further validate the framework's effectiveness.

标签

医学VQA 知识图谱 Mamba 多模态学习

arXiv 分类

cs.CV