Multimodal Learning 相关度: 7/10

A data- and compute-efficient chest X-ray foundation model beyond aggressive scaling

Chong Wang, Yabin Zhang, Yunhe Gao, Maya Varma, Clemence Mottez, Faidra Patsatzi, Jiaming Liu, Jin Long, Jean-Benoit Delbrouck, Sergios Gatidis, Akshay S. Chaudhari, Curtis P. Langlotz
arXiv: 2602.22843v1 发布: 2026-02-26 更新: 2026-02-26

AI 摘要

提出CheXficient模型,通过主动数据管理实现高效的胸部X光影像基础模型预训练。

主要贡献

  • 提出一种数据和计算高效的医学影像基础模型预训练方法
  • CheXficient模型在更少的数据和计算资源下达到与全数据模型相当甚至更优的性能
  • 通过主动数据管理,优先选择信息丰富的训练样本,提升模型泛化能力

方法论

采用主动数据管理策略,在预训练过程中有选择地优先考虑信息量大的训练样本,减少数据冗余和类别不平衡的影响。

原文摘要

Foundation models for medical imaging are typically pretrained on increasingly large datasets, following a "scale-at-all-costs" paradigm. However, this strategy faces two critical challenges: large-scale medical datasets often contain substantial redundancy and severe class imbalance that bias representation learning toward over-represented patterns, and indiscriminate training regardless of heterogeneity in data quality incurs considerable computational inefficiency. Here we demonstrate that active, principled data curation during pretraining can serve as a viable, cost-effective alternative to brute-force dataset enlargement. We introduce CheXficient, a chest X-ray (CXR) foundation model that selectively prioritizes informative training samples. CheXficient is pretrained on only 22.7% of 1,235,004 paired CXR images and reports while consuming under 27.3% of the total compute budget, yet achieving comparable or superior performance to its full-data counterpart and other large-scale pretrained models. We assess CheXficient across 20 individual benchmarks spanning 5 task types, including non-adapted off-the-shelf evaluations (zero-shot findings classification and crossmodal retrieval) and adapted downstream tasks (disease prediction, semantic segmentation, and radiology report generation). Further analyses show that CheXficient systematically prioritizes under-represented training samples, improving generalizability on long-tailed or rare conditions. Overall, our work offers practical insights into the data and computation demands for efficient pretraining and downstream adaptation of medical vision-language foundation models.

标签

医学影像 胸部X光 基础模型 主动学习

arXiv 分类

cs.CV