Multimodal Learning 相关度: 9/10

Vision-Language Model Based Multi-Expert Fusion for CT Image Classification

Jianfa Bai, Kejin Lu, Runtian Yuan, Qingqiu Li, Jilan Xu, Junlin Hou, Yuejie Zhang, Rui Feng
arXiv: 2603.15154v1 发布: 2026-03-16 更新: 2026-03-16

AI 摘要

提出一种多专家融合框架,用于解决多源CT图像的COVID-19检测问题。

主要贡献

  • 提出肺部感知的3D专家模型
  • 开发基于MedSigLIP的切片级和跨切片专家模型
  • 利用源分类器进行源信息感知的模型融合

方法论

构建多个专家模型,使用源分类器预测来源信息,进行专家模型融合和投票。

原文摘要

Robust detection of COVID-19 from chest CT remains challenging in multi-institutional settings due to substantial source shift, source imbalance, and hidden test-source identities. In this work, we propose a three-stage source-aware multi-expert framework for multi-source COVID-19 CT classification. First, we build a lung-aware 3D expert by combining original CT volumes and lung-extracted CT volumes for volumetric classification. Second, we develop two MedSigLIP-based experts: a slice-wise representation and probability learning module, and a Transformer-based inter-slice context modeling module for capturing cross-slice dependency. Third, we train a source classifier to predict the latent source identity of each test scan. By leveraging the predicted source information, we perform model fusion and voting based on different experts. On the validation set covering all four sources, the Stage 1 model achieves the best macro-F1 of 0.9711, ACC of 0.9712, and AUC of 0.9791. Stage~2a and Stage~2b achieve the best AUC scores of 0.9864 and 0.9854, respectively. Stage~3 source classifier reaches 0.9107 ACC and 0.9114 F1. These results demonstrate that source-aware expert modeling and hierarchical voting provide an effective solution for robust COVID-19 CT classification under heterogeneous multi-source conditions.

标签

COVID-19 CT图像 多专家融合 视觉语言模型 MedSigLIP

arXiv 分类

cs.CV