Multimodal Learning 相关度: 10/10

CT-Bench: A Benchmark for Multimodal Lesion Understanding in Computed Tomography

Qingqing Zhu, Qiao Jin, Tejas S. Mathai, Yin Fang, Zhizheng Wang, Yifan Yang, Maame Sarfo-Gyamfi, Benjamin Hou, Ran Gu, Praveen T. S. Balamuralikrishna, Kenneth C. Wang, Ronald M. Summers, Zhiyong Lu

arXiv: 2602.14879v1 发布: 2026-02-16 更新: 2026-02-16

下载 PDF arXiv 页面

AI 摘要

CT-Bench数据集发布，包含CT病灶标注和多模态问答，用于提升AI病灶理解能力。

主要贡献

构建首个CT病灶级别的多模态Benchmark数据集CT-Bench
提供病灶图像、元数据及多任务视觉问答
验证并提升了现有模型的病灶分析性能

方法论

构建包含病灶标注的CT数据集，并设计视觉问答任务评估模型在病灶定位、描述和属性识别等方面的能力。

原文摘要

Artificial intelligence (AI) can automatically delineate lesions on computed tomography (CT) and generate radiology report content, yet progress is limited by the scarcity of publicly available CT datasets with lesion-level annotations. To bridge this gap, we introduce CT-Bench, a first-of-its-kind benchmark dataset comprising two components: a Lesion Image and Metadata Set containing 20,335 lesions from 7,795 CT studies with bounding boxes, descriptions, and size information, and a multitask visual question answering benchmark with 2,850 QA pairs covering lesion localization, description, size estimation, and attribute categorization. Hard negative examples are included to reflect real-world diagnostic challenges. We evaluate multiple state-of-the-art multimodal models, including vision-language and medical CLIP variants, by comparing their performance to radiologist assessments, demonstrating the value of CT-Bench as a comprehensive benchmark for lesion analysis. Moreover, fine-tuning models on the Lesion Image and Metadata Set yields significant performance gains across both components, underscoring the clinical utility of CT-Bench.

arXiv 分类

cs.CV cs.AI

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类