Multimodal Learning 相关度: 5/10

RefineFormer3D: Efficient 3D Medical Image Segmentation via Adaptive Multi-Scale Transformer with Cross Attention Fusion

Kavyansh Tyagi, Vishwas Rathi, Puneet Goyal
arXiv: 2602.16320v1 发布: 2026-02-18 更新: 2026-02-18

AI 摘要

RefineFormer3D是一种高效的3D医学图像分割模型,兼顾精度和效率。

主要贡献

  • 提出RefineFormer3D,一种轻量级transformer架构
  • 使用GhostConv3D进行高效特征提取
  • 利用交叉注意力融合解码器实现自适应多尺度跳跃连接

方法论

利用GhostConv3D、MixFFN3D和交叉注意力融合解码器,构建轻量级transformer进行3D医学图像分割。

原文摘要

Accurate and computationally efficient 3D medical image segmentation remains a critical challenge in clinical workflows. Transformer-based architectures often demonstrate superior global contextual modeling but at the expense of excessive parameter counts and memory demands, restricting their clinical deployment. We propose RefineFormer3D, a lightweight hierarchical transformer architecture that balances segmentation accuracy and computational efficiency for volumetric medical imaging. The architecture integrates three key components: (i) GhostConv3D-based patch embedding for efficient feature extraction with minimal redundancy, (ii) MixFFN3D module with low-rank projections and depthwise convolutions for parameter-efficient feature extraction, and (iii) a cross-attention fusion decoder enabling adaptive multi-scale skip connection integration. RefineFormer3D contains only 2.94M parameters, substantially fewer than contemporary transformer-based methods. Extensive experiments on ACDC and BraTS benchmarks demonstrate that RefineFormer3D achieves 93.44\% and 85.9\% average Dice scores respectively, outperforming or matching state-of-the-art methods while requiring significantly fewer parameters. Furthermore, the model achieves fast inference (8.35 ms per volume on GPU) with low memory requirements, supporting deployment in resource-constrained clinical environments. These results establish RefineFormer3D as an effective and scalable solution for practical 3D medical image segmentation.

标签

3D 医学图像分割 Transformer 深度学习 医疗影像

arXiv 分类

eess.IV cs.CV cs.LG