Multimodal Learning 相关度: 10/10

MedCausalX: Adaptive Causal Reasoning with Self-Reflection for Trustworthy Medical Vision-Language Models

Jianxin Lin, Chunzheng Zhu, Peter J. Kneuertz, Yunfei Bai, Yuan Xue
arXiv: 2603.23085v1 发布: 2026-03-24 更新: 2026-03-24

AI 摘要

MedCausalX通过自反思和因果推理,提升医学视觉语言模型的可信度和可靠性。

主要贡献

  • 提出了CRMed数据集,包含细粒度解剖标注和因果推理链。
  • 设计了双阶段自适应反思架构,进行因果分析和验证。
  • 引入轨迹级因果校正目标,区分因果依赖和捷径关联。

方法论

构建CRMed数据集,使用自反思架构,通过error-attributed reinforcement learning优化推理链,显式建模医学VLMs中的因果推理。

原文摘要

Vision-Language Models (VLMs) have enabled interpretable medical diagnosis by integrating visual perception with linguistic reasoning. Yet, existing medical chain-of-thought (CoT) models lack explicit mechanisms to represent and enforce causal reasoning, leaving them vulnerable to spurious correlations and limiting their clinical reliability. We pinpoint three core challenges in medical CoT reasoning: how to adaptively trigger causal correction, construct high-quality causal-spurious contrastive samples, and maintain causal consistency across reasoning trajectories. To address these challenges, we propose MedCausalX, an end-to-end framework explicitly models causal reasoning chains in medical VLMs. We first introduce the CRMed dataset providing fine-grained anatomical annotations, structured causal reasoning chains, and counterfactual variants that guide the learning of causal relationships beyond superficial correlations. Building upon CRMed, MedCausalX employs a two-stage adaptive reflection architecture equipped with $\langle$causal$\rangle$ and $\langle$verify$\rangle$ tokens, enabling the model to autonomously determine when and how to perform causal analysis and verification. Finally, a trajectory-level causal correction objective optimized through error-attributed reinforcement learning refines the reasoning chain, allowing the model to distinguish genuine causal dependencies from shortcut associations. Extensive experiments on multiple benchmarks show that MedCausalX consistently outperforms state-of-the-art methods, improving diagnostic consistency by +5.4 points, reducing hallucination by over 10 points, and attaining top spatial grounding IoU, thereby setting a new standard for causally grounded medical reasoning.

标签

Medical VLM Causal Reasoning Self-Reflection Vision-Language

arXiv 分类

cs.AI