Adaptive Prompt Elicitation for Text-to-Image Generation
AI 摘要
APE通过视觉查询交互式地帮助用户优化文本到图像生成的提示词,提升图像与用户意图的对齐。
主要贡献
- 提出了自适应提示词诱导(APE)技术
- 利用信息论框架形式化交互式意图推理
- 证明了APE在对齐性和效率方面的优越性
方法论
APE使用语言模型先验表示潜在意图的特征需求,自适应生成视觉查询,并将提取的需求编译成有效的提示。
原文摘要
Aligning text-to-image generation with user intent remains challenging, for users who provide ambiguous inputs and struggle with model idiosyncrasies. We propose Adaptive Prompt Elicitation (APE), a technique that adaptively asks visual queries to help users refine prompts without extensive writing. Our technical contribution is a formulation of interactive intent inference under an information-theoretic framework. APE represents latent intent as interpretable feature requirements using language model priors, adaptively generates visual queries, and compiles elicited requirements into effective prompts. Evaluation on IDEA-Bench and DesignBench shows that APE achieves stronger alignment with improved efficiency. A user study with challenging user-defined tasks demonstrates 19.8% higher alignment without workload overhead. Our work contributes a principled approach to prompting that, for general users, offers an effective and efficient complement to the prevailing prompt-based interaction paradigm with text-to-image models.