Singpath-VL Technical Report
AI 摘要
Singpath-VL是一种用于宫颈细胞学AI辅助诊断的多模态大模型,通过合成数据集和微调实现。
主要贡献
- 构建大规模宫颈细胞学图像-描述合成数据集
- 提出基于Qwen3-VL-4B的宫颈细胞学专用MLLM Singpath-VL
- 开放部分合成数据集和基准测试
方法论
利用通用MLLM作为弱标注器,通过三阶段pipeline合成图像-描述数据集,并在此基础上微调Qwen3-VL-4B模型。
原文摘要
We present Singpath-VL, a vision-language large model, to fill the vacancy of AI assistant in cervical cytology. Recent advances in multi-modal large language models (MLLMs) have significantly propelled the field of computational pathology. However, their application in cytopathology, particularly cervical cytology, remains underexplored, primarily due to the scarcity of large-scale, high-quality annotated datasets. To bridge this gap, we first develop a novel three-stage pipeline to synthesize a million-scale image-description dataset. The pipeline leverages multiple general-purpose MLLMs as weak annotators, refines their outputs through consensus fusion and expert knowledge injection, and produces high-fidelity descriptions of cell morphology. Using this dataset, we then fine-tune the Qwen3-VL-4B model via a multi-stage strategy to create a specialized cytopathology MLLM. The resulting model, named Singpath-VL, demonstrates superior performance in fine-grained morphological perception and cell-level diagnostic classification. To advance the field, we will open-source a portion of the synthetic dataset and benchmark.