Multimodal Learning 相关度: 9/10

Adaptive Prompt Elicitation for Text-to-Image Generation

Xinyi Wen, Lena Hegemann, Xiaofu Jin, Shuai Ma, Antti Oulasvirta
arXiv: 2602.04713v1 发布: 2026-02-04 更新: 2026-02-04

AI 摘要

APE通过视觉查询交互式地帮助用户优化文本到图像生成的提示词,提升图像与用户意图的对齐。

主要贡献

  • 提出了自适应提示词诱导(APE)技术
  • 利用信息论框架形式化交互式意图推理
  • 证明了APE在对齐性和效率方面的优越性

方法论

APE使用语言模型先验表示潜在意图的特征需求,自适应生成视觉查询,并将提取的需求编译成有效的提示。

原文摘要

Aligning text-to-image generation with user intent remains challenging, for users who provide ambiguous inputs and struggle with model idiosyncrasies. We propose Adaptive Prompt Elicitation (APE), a technique that adaptively asks visual queries to help users refine prompts without extensive writing. Our technical contribution is a formulation of interactive intent inference under an information-theoretic framework. APE represents latent intent as interpretable feature requirements using language model priors, adaptively generates visual queries, and compiles elicited requirements into effective prompts. Evaluation on IDEA-Bench and DesignBench shows that APE achieves stronger alignment with improved efficiency. A user study with challenging user-defined tasks demonstrates 19.8% higher alignment without workload overhead. Our work contributes a principled approach to prompting that, for general users, offers an effective and efficient complement to the prevailing prompt-based interaction paradigm with text-to-image models.

标签

文本到图像生成 提示工程 交互式系统 信息论

arXiv 分类

cs.HC cs.AI cs.CV