Multimodal Learning 相关度: 9/10

Singpath-VL Technical Report

Zhen Qiu, Kaiwen Xiao, Zhengwei Lu, Xiangyu Liu, Lei Zhao, Hao Zhang
arXiv: 2602.09523v1 发布: 2026-02-10 更新: 2026-02-10

AI 摘要

Singpath-VL是一种用于宫颈细胞学AI辅助诊断的多模态大模型,通过合成数据集和微调实现。

主要贡献

  • 构建大规模宫颈细胞学图像-描述合成数据集
  • 提出基于Qwen3-VL-4B的宫颈细胞学专用MLLM Singpath-VL
  • 开放部分合成数据集和基准测试

方法论

利用通用MLLM作为弱标注器,通过三阶段pipeline合成图像-描述数据集,并在此基础上微调Qwen3-VL-4B模型。

原文摘要

We present Singpath-VL, a vision-language large model, to fill the vacancy of AI assistant in cervical cytology. Recent advances in multi-modal large language models (MLLMs) have significantly propelled the field of computational pathology. However, their application in cytopathology, particularly cervical cytology, remains underexplored, primarily due to the scarcity of large-scale, high-quality annotated datasets. To bridge this gap, we first develop a novel three-stage pipeline to synthesize a million-scale image-description dataset. The pipeline leverages multiple general-purpose MLLMs as weak annotators, refines their outputs through consensus fusion and expert knowledge injection, and produces high-fidelity descriptions of cell morphology. Using this dataset, we then fine-tune the Qwen3-VL-4B model via a multi-stage strategy to create a specialized cytopathology MLLM. The resulting model, named Singpath-VL, demonstrates superior performance in fine-grained morphological perception and cell-level diagnostic classification. To advance the field, we will open-source a portion of the synthetic dataset and benchmark.

标签

多模态学习 细胞病理学 医学影像 合成数据

arXiv 分类

cs.CV