Prototype-Based Knowledge Guidance for Fine-Grained Structured Radiology Reporting
AI 摘要
ProtoSR通过融合自由文本知识,提升了结构化放射报告生成模型的细粒度判别能力,在Rad-ReStruct数据集上取得领先成果。
主要贡献
- 提出 ProtoSR 模型,融合自由文本知识提升结构化报告精度
- 构建了基于 MIMIC-CXR 的多模态知识库,包含图像和文本信息
- 利用指令调优的LLM自动提取自由文本中的信息,构建知识库
方法论
利用LLM从自由文本报告中提取知识,构建多模态知识库,再通过原型条件残差学习将知识融入结构化报告模型中。
原文摘要
Structured radiology reporting promises faster, more consistent communication than free text, but automation remains difficult as models must make many fine-grained, discrete decisions about rare findings and attributes from limited structured supervision. In contrast, free-text reports are produced at scale in routine care and implicitly encode fine-grained, image-linked information through detailed descriptions. To leverage this unstructured knowledge, we propose ProtoSR, an approach for injecting free-text information into structured report population. First, we introduce an automatic extraction pipeline that uses an instruction-tuned LLM to mine 80k+ MIMIC-CXR studies and build a multimodal knowledge base aligned with a structured reporting template, representing each answer option with a visual prototype. Using this knowledge base, ProtoSR is trained to retrieve prototypes relevant for the current image-question pair and augment the model predictions through a prototype-conditioned residual, providing a data-driven second opinion that selectively corrects predictions. On the Rad-ReStruct benchmark, ProtoSR achieves state-of-the-art results, with the largest improvements on detailed attribute questions, demonstrating the value of integrating free-text derived signal for fine-grained image understanding.