Multimodal Learning 相关度: 8/10

HAM: A Training-Free Style Transfer Approach via Heterogeneous Attention Modulation for Diffusion Models

Yeqi He, Liang Li, Zhiwen Yang, Xichun Sheng, Zhidong Zhao, Chenggang Yan
arXiv: 2603.24043v1 发布: 2026-03-25 更新: 2026-03-25

AI 摘要

提出了一种基于异构注意力调制(HAM)的免训练扩散模型风格迁移方法。

主要贡献

  • 提出了异构注意力调制(HAM)框架
  • 引入了全局注意力规则(GAR)和局部注意力移植(LAT)机制
  • 实现了在风格迁移中保持内容图像的身份信息

方法论

通过风格噪声初始化和异构注意力调制,在扩散过程中控制图像的风格和内容,实现免训练的风格迁移。

原文摘要

Diffusion models have demonstrated remarkable performance in image generation, particularly within the domain of style transfer. Prevailing style transfer approaches typically leverage pre-trained diffusion models' robust feature extraction capabilities alongside external modular control pathways to explicitly impose style guidance signals. However, these methods often fail to capture complex style reference or retain the identity of user-provided content images, thus falling into the trap of style-content balance. Thus, we propose a training-free style transfer approach via $\textbf{h}$eterogeneous $\textbf{a}$ttention $\textbf{m}$odulation ($\textbf{HAM}$) to protect identity information during image/text-guided style reference transfer, thereby addressing the style-content trade-off challenge. Specifically, we first introduces style noise initialization to initialize latent noise for diffusion. Then, during the diffusion process, it innovatively employs HAM for different attention mechanisms, including Global Attention Regulation (GAR) and Local Attention Transplantation (LAT), which better preserving the details of the content image while capturing complex style references. Our approach is validated through a series of qualitative and quantitative experiments, achieving state-of-the-art performance on multiple quantitative metrics.

标签

风格迁移 扩散模型 注意力机制

arXiv 分类

cs.CV