Multimodal Learning 相关度: 9/10

Automated Histopathology Report Generation via Pyramidal Feature Extraction and the UNI Foundation Model

Ahmet Halici, Ece Tugba Cebeci, Musa Balci, Mustafa Cini, Serkan Sokmen

arXiv: 2602.16422v1 发布: 2026-02-18 更新: 2026-02-18

下载 PDF arXiv 页面

AI 摘要

提出一种基于金字塔特征提取和UNI基础模型的自动病理报告生成框架。

主要贡献

提出基于UNI和Transformer解码器的分层视觉语言框架
采用多分辨率金字塔式patch选择和图像预处理
使用BioGPT分词器优化生物医学术语表示

方法论

利用金字塔特征提取WSI图像特征，通过UNI和Transformer解码器生成报告，并进行检索验证。

原文摘要

Generating diagnostic text from histopathology whole slide images (WSIs) is challenging due to the gigapixel scale of the input and the requirement for precise, domain specific language. We propose a hierarchical vision language framework that combines a frozen pathology foundation model with a Transformer decoder for report generation. To make WSI processing tractable, we perform multi resolution pyramidal patch selection (downsampling factors 2^3 to 2^6) and remove background and artifacts using Laplacian variance and HSV based criteria. Patch features are extracted with the UNI Vision Transformer and projected to a 6 layer Transformer decoder that generates diagnostic text via cross attention. To better represent biomedical terminology, we tokenize the output using BioGPT. Finally, we add a retrieval based verification step that compares generated reports with a reference corpus using Sentence BERT embeddings; if a high similarity match is found, the generated report is replaced with the retrieved ground truth reference to improve reliability.

arXiv 分类

eess.IV cs.AI cs.CV

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类