Multimodal Learning 相关度: 8/10

LoGSAM: Parameter-Efficient Cross-Modal Grounding for MRI Segmentation

Mohammad Robaitul Islam Bhuiyan, Sheethal Bhat, Melika Qahqaie, Tri-Thien Nguyen, Paula Andrea Pérez Toro, Tomas Arias Vergara, Andreas Maier
arXiv: 2603.17576v1 发布: 2026-03-18 更新: 2026-03-18

AI 摘要

LoGSAM利用语音转录和少量参数更新,实现MRI图像肿瘤的自动分割。

主要贡献

  • 提出LoGSAM框架,实现语音驱动的肿瘤分割
  • 参数高效的跨模态Grounding方法
  • 在BRISC 2025数据集上取得SOTA结果

方法论

结合Whisper、GDINO和MedSAM,通过语音转录生成文本提示,引导肿瘤定位和分割,仅LoRA微调GDINO。

原文摘要

Precise localization and delineation of brain tumors using Magnetic Resonance Imaging (MRI) are essential for planning therapy and guiding surgical decisions. However, most existing approaches rely on task-specific supervised models and are constrained by the limited availability of annotated data. To address this, we propose LoGSAM, a parameter-efficient, detection-driven framework that transforms radiologist dictation into text prompts for foundation-model-based localization and segmentation. Radiologist speech is first transcribed and translated using a pretrained Whisper ASR model, followed by negation-aware clinical NLP to extract tumor-specific textual prompts. These prompts guide text-conditioned tumor localization via a LoRA-adapted vision-language detection model, Grounding DINO (GDINO). The LoRA adaptation updates using 5% of the model parameters, thereby enabling computationally efficient domain adaptation while preserving pretrained cross-modal knowledge. The predicted bounding boxes are used as prompts for MedSAM to generate pixel-level tumor masks without any additional fine-tuning. Conditioning the frozen MedSAM on LoGSAM-derived priors yields a state-of-the-art dice score of 80.32% on BRISC 2025. In addition, we evaluate the full pipeline using German dictations from a board-certified radiologist on 12 unseen MRI scans, achieving 91.7% case-level accuracy. These results highlight the feasibility of constructing a modular, speech-to-segmentation pipeline by intelligently leveraging pretrained foundation models with minimal parameter updates.

标签

MRI分割 跨模态学习 参数高效学习 语音转录 Grounding

arXiv 分类

cs.CV