Multimodal Learning 相关度: 9/10

Curia-2: Scaling Self-Supervised Learning for Radiology Foundation Models

Antoine Saporta, Baptiste Callard, Corentin Dancette, Julien Khlaut, Charles Corbière, Leo Butsanets, Amaury Prat, Pierre Manceron
arXiv: 2604.01987v1 发布: 2026-04-02 更新: 2026-04-02

AI 摘要

Curia-2通过优化预训练策略和扩展模型规模,显著提升了放射影像Foundation Models的性能。

主要贡献

  • 改进了放射影像的预训练策略
  • 构建了更大规模的多模态CT/MRI FM
  • 提出了针对放射影像的2D/3D评测基准CuriaBench

方法论

在Curia框架基础上,通过优化预训练策略,扩展Vision Transformer规模,并构建新的评估基准。

原文摘要

The rapid growth of medical imaging has fueled the development of Foundation Models (FMs) to reduce the growing, unsustainable workload on radiologists. While recent FMs have shown the power of large-scale pre-training to CT and MRI analysis, there remains significant room to optimize how these models learn from complex radiological volumes. Building upon the Curia framework, this work introduces Curia-2, which significantly improves the original pre-training strategy and representation quality to better capture the specificities of radiological data. The proposed methodology enables scaling the architecture up to billion-parameter Vision Transformers, marking a first for multi-modal CT and MRI FMs. Furthermore, we formalize the evaluation of these models by extending and restructuring CuriaBench into two distinct tracks: a 2D track tailored for slice-based vision models and a 3D track for volumetric benchmarking. Our results demonstrate that Curia-2 outperforms all FMs on vision-focused tasks and fairs competitively to vision-language models on clinically complex tasks such as finding detection. Weights will be made publicly available to foster further research.

标签

放射影像 Foundation Model 自监督学习 多模态 医学图像

arXiv 分类

cs.CV cs.LG