Multimodal Learning 相关度: 9/10

TriFusion-SR: Joint Tri-Modal Medical Image Fusion and SR

Fayaz Ali Dharejo, Sharif S. M. A., Aiman Khalil, Nachiket Chaudhary, Rizwan Ali Naqvi, Radu Timofte

arXiv: 2603.09702v1 发布: 2026-03-10 更新: 2026-03-10

下载 PDF arXiv 页面

AI 摘要

提出TriFusion-SR，一种用于联合三模态医学图像融合和超分辨率的框架。

主要贡献

提出基于小波的条件扩散框架，用于联合三模态融合和超分辨率。
引入Rectified Wavelet Features (RWF) 校正潜在系数。
设计Adaptive Spatial-Frequency Fusion (ASFF)模块，利用门控通道-空间注意力进行结构驱动的多模态优化。

方法论

利用2D离散小波变换分解多模态特征，进行频率感知的跨模态交互。通过RWF和ASFF模块进行校正和融合。

原文摘要

Multimodal medical image fusion facilitates comprehensive diagnosis by aggregating complementary structural and functional information, but its effectiveness is limited by resolution degradation and modality discrepancies. Existing approaches typically perform image fusion and super-resolution (SR) in separate stages, leading to artifacts and degraded perceptual quality. These limitations are further amplified in tri-modal settings that combine anatomical modalities (e.g., MRI, CT) with functional scans (e.g., PET, SPECT) due to pronounced frequency domain imbalances. We propose TriFusionSR, a wavelet-guided conditional diffusion framework for joint tri-modal fusion and SR. The framework explicitly decomposes multimodal features into frequency bands using the 2D Discrete Wavelet Transform, enabling frequency-aware crossmodal interaction. We further introduce a Rectified Wavelet Features (RWF) strategy for latent coefficient calibration, followed by an Adaptive Spatial-Frequency Fusion (ASFF) module with gated channel-spatial attention to enable structure-driven multimodal refinement. Extensive experiments demonstrate state-of-the-art performance, achieving 4.8-12.4% PSNR improvement and substantial reductions in RMSE and LPIPS across multiple upsampling scales.

arXiv 分类

cs.CV

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类