Multimodal Learning 相关度: 9/10

IOSVLM: A 3D Vision-Language Model for Unified Dental Diagnosis from Intraoral Scans

Huimin Xiong, Zijie Meng, Tianxiang Hu, Chenyi Zhou, Yang Feng, Zuozhu Liu
arXiv: 2603.16781v1 发布: 2026-03-17 更新: 2026-03-17

AI 摘要

提出了IOSVLM,一种用于口腔扫描3D统一牙科诊断的视觉语言模型。

主要贡献

  • 提出了一个端到端的3D视觉语言模型IOSVLM。
  • 构建了一个大型多源IOS诊断VQA数据集IOSVQA。
  • 提出了geometry-to-chromatic代理稳定几何感知和跨模态对齐。

方法论

构建3D encoder-projector-LLM结构,将口腔扫描表示为点云,用于统一诊断和生成式VQA,采用两阶段课程训练策略。

原文摘要

3D intraoral scans (IOS) are increasingly adopted in routine dentistry due to abundant geometric evidence, and unified multi-disease diagnosis is desirable for clinical documentation and communication. While recent works introduce dental vision-language models (VLMs) to enable unified diagnosis and report generation on 2D images or multi-view images rendered from IOS, they do not fully leverage native 3D geometry. Such work is necessary and also challenging, due to: (i) heterogeneous scan forms and the complex IOS topology, (ii) multi-disease co-occurrence with class imbalance and fine-grained morphological ambiguity, (iii) limited paired 3D IOS-text data. Thus, we present IOSVLM, an end-to-end 3D VLM that represents scans as point clouds and follows a 3D encoder-projector-LLM design for unified diagnosis and generative visual question-answering (VQA), together with IOSVQA, a large-scale multi-source IOS diagnosis VQA dataset comprising 19,002 cases and 249,055 VQA pairs over 23 oral diseases and heterogeneous scan types. To address the distribution gap between color-free IOS data and color-dependent 3D pre-training, we propose a geometry-to-chromatic proxy that stabilizes fine-grained geometric perception and cross-modal alignment. A two-stage curriculum training strategy further enhances robustness. IOSVLM consistently outperforms strong baselines, achieving gains of at least +9.58% macro accuracy and +1.46% macro F1, indicating the effectiveness of direct 3D geometry modeling for IOS-based diagnosis.

标签

3D Vision-Language Model Intraoral Scans Dental Diagnosis

arXiv 分类

cs.CV cs.AI