Multimodal Learning 相关度: 9/10

PosterIQ: A Design Perspective Benchmark for Poster Understanding and Generation

Yuheng Feng, Wen Zhang, Haodong Duan, Xingxing Zou

arXiv: 2603.24078v1 发布: 2026-03-25 更新: 2026-03-25

下载 PDF arXiv 页面

AI 摘要

PosterIQ是一个海报理解与生成的设计驱动型基准，涵盖海报的结构、排版和语义意图。

主要贡献

构建海报理解与生成基准数据集PosterIQ
定义了布局解析、文本-图像对应等任务
评估了现有MLLM和扩散模型在设计方面的不足

方法论

通过构建包含真实、专业和合成海报的数据集，并定义多个设计相关任务，评估现有模型的能力。

原文摘要

We present PosterIQ, a design-driven benchmark for poster understanding and generation, annotated across composition structure, typographic hierarchy, and semantic intent. It includes 7,765 image-annotation instances and 822 generation prompts spanning real, professional, and synthetic cases. To bridge visual design cognition and generative modeling, we define tasks for layout parsing, text-image correspondence, typography/readability and font perception, design quality assessment, and controllable, composition-aware generation with metaphor. We evaluate state-of-the-art MLLMs and diffusion-based generators, finding persistent gaps in visual hierarchy, typographic semantics, saliency control, and intention communication; commercial models lead on high-level reasoning but act as insensitive automatic raters, while generators render text well yet struggle with composition-aware synthesis. Extensive analyses show PosterIQ is both a quantitative benchmark and a diagnostic tool for design reasoning, offering reproducible, task-specific metrics. We aim to catalyze models' creativity and integrate human-centred design principles into generative vision-language systems.

arXiv 分类

cs.CV

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类