Multimodal Learning 相关度: 9/10

Brain3D: Brain Report Automation via Inflated Vision Transformers in 3D

Mariano Barone, Francesco Di Serio, Giuseppe Riccio, Antonio Romano, Marco Postiglione, Antonino Ferraro, Vincenzo Moscato
arXiv: 2602.22098v1 发布: 2026-02-25 更新: 2026-02-25

AI 摘要

Brain3D利用3D视觉Transformer和分阶段对齐方法,实现脑肿瘤MRI自动报告生成。

主要贡献

  • 提出Brain3D框架,用于从3D脑肿瘤MRI生成放射报告
  • 将预训练2D医学编码器扩展到3D架构
  • 分阶段对齐视觉和语言模型,优化报告生成

方法论

采用膨胀的视觉Transformer作为3D视觉编码器,通过对比学习、监督预热和LoRA进行多阶段对齐。

原文摘要

Current medical vision-language models (VLMs) process volumetric brain MRI using 2D slice-based approximations, fragmenting the spatial context required for accurate neuroradiological interpretation. We developed \textbf{Brain3D}, a staged vision-language framework for automated radiology report generation from 3D brain tumor MRI. Our approach inflates a pretrained 2D medical encoder into a native 3D architecture and progressively aligns it with a causal language model through three stages: contrastive grounding, supervised projector warmup, and LoRA-based linguistic specialization. Unlike generalist 3D medical VLMs, \textbf{Brain3D} is tailored to neuroradiology, where hemispheric laterality, tumor infiltration patterns, and anatomical localization are critical. Evaluated on 468 subjects (BraTS pathological cases plus healthy controls), our model achieves a Clinical Pathology F1 of 0.951 versus 0.413 for a strong 2D baseline while maintaining perfect specificity on healthy scans. The staged alignment proves essential: contrastive grounding establishes visual-textual correspondence, projector warmup stabilizes conditioning, and LoRA adaptation shifts output from verbose captions to structured clinical reports\footnote{Our code is publicly available for transparency and reproducibility

标签

3D MRI 视觉语言模型 放射报告生成 脑肿瘤 Transformer

arXiv 分类

cs.CV