Multimodal Learning 相关度: 9/10

Too Vivid to Be Real? Benchmarking and Calibrating Generative Color Fidelity

Zhengyao Fang, Zexi Jia, Yijia Zhong, Pengcheng Luo, Jinchao Zhang, Guangming Lu, Jun Yu, Wenjie Pei
arXiv: 2603.10990v1 发布: 2026-03-11 更新: 2026-03-11

AI 摘要

该论文针对文本到图像生成中的颜色逼真度问题,提出了数据集、评估指标和优化方法。

主要贡献

  • 提出了用于评估颜色逼真度的Color Fidelity Dataset (CFD)
  • 提出了用于客观评估颜色逼真度的Color Fidelity Metric (CFM)
  • 提出了无需训练的Color Fidelity Refinement (CFR)来增强颜色真实性

方法论

构建数据集CFD,使用多模态编码器学习颜色逼真度指标CFM,并利用CFR在生成过程中调整空间-时间引导尺度。

原文摘要

Recent advances in text-to-image (T2I) generation have greatly improved visual quality, yet producing images that appear visually authentic to real-world photography remains challenging. This is partly due to biases in existing evaluation paradigms: human ratings and preference-trained metrics often favor visually vivid images with exaggerated saturation and contrast, which make generations often too vivid to be real even when prompted for realistic-style images. To address this issue, we present Color Fidelity Dataset (CFD) and Color Fidelity Metric (CFM) for objective evaluation of color fidelity in realistic-style generations. CFD contains over 1.3M real and synthetic images with ordered levels of color realism, while CFM employs a multimodal encoder to learn perceptual color fidelity. In addition, we propose a training-free Color Fidelity Refinement (CFR) that adaptively modulates spatial-temporal guidance scale in generation, thereby enhancing color authenticity. Together, CFD supports CFM for assessment, whose learned attention further guides CFR to refine T2I fidelity, forming a progressive framework for assessing and improving color fidelity in realistic-style T2I generation. The dataset and code are available at https://github.com/ZhengyaoFang/CFM.

标签

text-to-image color fidelity image generation evaluation metric dataset

arXiv 分类

cs.CV