Multimodal Learning 相关度: 8/10

LogoDiffuser: Training-Free Multilingual Logo Generation and Stylization via Letter-Aware Attention Control

Mingyu Kang, Hyein Seo, Yuna Jeong, Junhyeong Park, Yong Suk Choi
arXiv: 2603.09759v1 发布: 2026-03-10 更新: 2026-03-10

AI 摘要

LogoDiffuser提出一种免训练的多语言logo生成方法,通过可控注意力机制融合文字和视觉元素。

主要贡献

  • 提出免训练的多语言logo生成方法LogoDiffuser
  • 利用基于图像的字符输入,实现鲁棒的字符结构控制
  • 通过核心token识别和注意力图聚合,提升生成效果

方法论

LogoDiffuser使用多模态扩散Transformer,通过图像输入字符,分析注意力机制,注入信息量最大的注意力图,并进行层级聚合。

原文摘要

Recent advances in text-to-image generation have been remarkable, but generating multilingual design logos that harmoniously integrate visual and textual elements remains a challenging task. Existing methods often distort character geometry when applying creative styles and struggle to support multilingual text generation without additional training. To address these challenges, we propose LogoDiffuser, a training-free method that synthesizes multilingual logo designs using the multimodal diffusion transformer. Instead of using textual prompts, we input the target characters as images, enabling robust character structure control regardless of language. We first analyze the joint attention mechanism to identify core tokens, which are tokens that strongly respond to textual structures. With this observation, our method integrates character structure and visual design by injecting the most informative attention maps. Furthermore, we perform layer-wise aggregation of attention maps to mitigate attention shifts across layers and obtain consistent core tokens. Extensive experiments and user studies demonstrate that our method achieves state-of-the-art performance in multilingual logo generation.

标签

多语言 Logo生成 扩散模型 注意力机制 免训练

arXiv 分类

cs.CV