Multimodal Learning 相关度: 9/10

Prototype-Based Knowledge Guidance for Fine-Grained Structured Radiology Reporting

Chantal Pellegrini, Adrian Delchev, Ege Özsoy, Nassir Navab, Matthias Keicher

arXiv: 2603.11938v1 发布: 2026-03-12 更新: 2026-03-12

下载 PDF arXiv 页面

AI 摘要

ProtoSR通过融合自由文本知识，提升了结构化放射报告生成模型的细粒度判别能力，在Rad-ReStruct数据集上取得领先成果。

主要贡献

提出 ProtoSR 模型，融合自由文本知识提升结构化报告精度
构建了基于 MIMIC-CXR 的多模态知识库，包含图像和文本信息
利用指令调优的LLM自动提取自由文本中的信息，构建知识库

方法论

利用LLM从自由文本报告中提取知识，构建多模态知识库，再通过原型条件残差学习将知识融入结构化报告模型中。

原文摘要

Structured radiology reporting promises faster, more consistent communication than free text, but automation remains difficult as models must make many fine-grained, discrete decisions about rare findings and attributes from limited structured supervision. In contrast, free-text reports are produced at scale in routine care and implicitly encode fine-grained, image-linked information through detailed descriptions. To leverage this unstructured knowledge, we propose ProtoSR, an approach for injecting free-text information into structured report population. First, we introduce an automatic extraction pipeline that uses an instruction-tuned LLM to mine 80k+ MIMIC-CXR studies and build a multimodal knowledge base aligned with a structured reporting template, representing each answer option with a visual prototype. Using this knowledge base, ProtoSR is trained to retrieve prototypes relevant for the current image-question pair and augment the model predictions through a prototype-conditioned residual, providing a data-driven second opinion that selectively corrects predictions. On the Rad-ReStruct benchmark, ProtoSR achieves state-of-the-art results, with the largest improvements on detailed attribute questions, demonstrating the value of integrating free-text derived signal for fine-grained image understanding.

arXiv 分类

cs.AI cs.CV cs.LG

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类