GlyphBanana: Advancing Precise Text Rendering Through Agentic Workflows
AI 摘要
GlyphBanana通过agentic workflow和glyph模板注入,提升文本渲染的精确度,尤其在复杂字符和公式渲染方面。
主要贡献
- 提出了 GlyphBanana,一个用于精确文本渲染的agentic workflow
- 设计了专门用于复杂字符和公式渲染的 benchmark
- 提出一种训练-自由的方法,可应用于各种 Text-to-Image 模型
方法论
利用agentic workflow,将辅助工具集成到latent space和attention map中,通过glyph模板迭代改进生成图像的精确度。
原文摘要
Despite recent advances in generative models driving significant progress in text rendering, accurately generating complex text and mathematical formulas remains a formidable challenge. This difficulty primarily stems from the limited instruction-following capabilities of current models when encountering out-of-distribution prompts. To address this, we introduce GlyphBanana, alongside a corresponding benchmark specifically designed for rendering complex characters and formulas. GlyphBanana employs an agentic workflow that integrates auxiliary tools to inject glyph templates into both the latent space and attention maps, facilitating the iterative refinement of generated images. Notably, our training-free approach can be seamlessly applied to various Text-to-Image (T2I) models, achieving superior precision compared to existing baselines. Extensive experiments demonstrate the effectiveness of our proposed workflow. Associated code is publicly available at https://github.com/yuriYanZeXuan/GlyphBanana.