Multimodal Learning 相关度: 8/10

Any-to-All MRI Synthesis: A Unified Foundation Model for Nasopharyngeal Carcinoma and Its Downstream Applications

Yao Pu, Yiming Shi, Zhenxi Zhang, Peixin Yu, Yitao Zhuang, Xiang Wang, Hongzhao Chen, Jing Cai, Ge Ren
arXiv: 2602.08822v1 发布: 2026-02-09 更新: 2026-02-09

AI 摘要

开发了一种用于鼻咽癌MRI图像合成的统一基础模型,提升RT规划准确性。

主要贡献

  • 提出了一种基于对比视觉表征学习和VLA的统一基础模型。
  • 实现了任意模态到任意模态的MRI合成。
  • 通过统一表示增强了下游RT相关任务(如分割)的性能。

方法论

利用对比编码器学习模态不变的表示,使用CLIP风格的文本引导解码器进行语义一致的合成。

原文摘要

Magnetic resonance imaging (MRI) is essential for nasopharyngeal carcinoma (NPC) radiotherapy (RT), but practical constraints, such as patient discomfort, long scan times, and high costs often lead to incomplete modalities in clinical practice, compromising RT planning accuracy. Traditional MRI synthesis methods are modality-specific, limited in anatomical adaptability, and lack clinical interpretability-failing to meet NPC's RT needs. Here, we developed a unified foundation model integrating contrastive visual representation learning and vision-language alignment (VLA) to enable any-to-all MRI synthesis. The model uses a contrastive encoder for modality-invariant representations and a CLIP-based text-informed decoder for semantically consistent synthesis, supporting any-to-all MRI synthesis via one unified foundation model. Trained on 40,825 images from 13 institutions, it achieves consistently high performance (average SSIM 0.90, PSNR 27) across 26 internal/external validation sites (15,748 images), with superior synthesis fidelity and robustness to noise and domain shifts. Meanwhile, its unified representation enhances downstream RT-relevant tasks (e.g., segmentation). This work advances digital medicine solutions for NPC care by leveraging foundation models to bridge technical synthesis and clinical utility.

标签

MRI合成 鼻咽癌 基础模型 对比学习 视觉语言对齐

arXiv 分类

cs.CV