Multimodal Learning 相关度: 9/10

Colon-Bench: An Agentic Workflow for Scalable Dense Lesion Annotation in Full-Procedure Colonoscopy Videos

Abdullah Hamdi, Changchun Yang, Xin Gao

arXiv: 2603.25645v1 发布: 2026-03-26 更新: 2026-03-26

下载 PDF arXiv 页面

AI 摘要

构建了大规模结肠镜视频数据集Colon-Bench，并评估了MLLM在该数据集上的性能。

主要贡献

构建了大规模、多类别、密集标注的结肠镜视频数据集Colon-Bench
提出了一个多阶段agentic workflow用于高效标注结肠镜视频
评估了现有MLLM在Colon-Bench数据集上的性能，并提出了colon-skill prompting策略

方法论

采用多阶段agentic workflow，结合时间提议、目标跟踪、AI确认和人工审核来标注结肠镜视频。

原文摘要

Early screening via colonoscopy is critical for colon cancer prevention, yet developing robust AI systems for this domain is hindered by the lack of densely annotated, long-sequence video datasets. Existing datasets predominantly focus on single-class polyp detection and lack the rich spatial, temporal, and linguistic annotations required to evaluate modern Multimodal Large Language Models (MLLMs). To address this critical gap, we introduce Colon-Bench, generated via a novel multi-stage agentic workflow. Our pipeline seamlessly integrates temporal proposals, bounding-box tracking, AI-driven visual confirmation, and human-in-the-loop review to scalably annotate full-procedure videos. The resulting verified benchmark is unprecedented in scope, encompassing 528 videos, 14 distinct lesion categories (including polyps, ulcers, and bleeding), over 300,000 bounding boxes, 213,000 segmentation masks, and 133,000 words of clinical descriptions. We utilize Colon-Bench to rigorously evaluate state-of-the-art MLLMs across lesion classification, Open-Vocabulary Video Object Segmentation (OV-VOS), and video Visual Question Answering (VQA). The MLLM results demonstrate surprisingly high localization performance in medical domains compared to SAM-3. Finally, we analyze common VQA errors from MLLMs to introduce a novel "colon-skill" prompting strategy, improving zero-shot MLLM performance by up to 9.7% across most MLLMs. The dataset and the code are available at https://abdullahamdi.com/colon-bench .

arXiv 分类

eess.IV cs.CV cs.HC

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类