R4-CGQA: Retrieval-based Vision Language Models for Computer Graphics Image Quality Assessment
AI 摘要
针对CG图像质量评估,提出基于检索增强的VLM框架R4-CGQA,提升VLM对CG图像质量的评估能力。
主要贡献
- 构建了包含CG图像及质量描述的数据集
- 提出了基于检索增强的双流框架R4-CGQA
- 验证了该方法能够有效提升VLM的CG质量评估性能
方法论
构建数据集,采用检索增强生成方法,构建双流检索框架,利用视觉相似图像的描述来提升VLM理解。
原文摘要
Immersive Computer Graphics (CGs) rendering has become ubiquitous in modern daily life. However, comprehensively evaluating CG quality remains challenging for two reasons: First, existing CG datasets lack systematic descriptions of rendering quality; and second existing CG quality assessment methods cannot provide reasonable text-based explanations. To address these issues, we first identify six key perceptual dimensions of CG quality from the user perspective and construct a dataset of 3500 CG images with corresponding quality descriptions. Each description covers CG style, content, and perceived quality along the selected dimensions. Furthermore, we use a subset of the dataset to build several question-answer benchmarks based on the descriptions in order to evaluate the responses of existing Vision Language Models (VLMs). We find that current VLMs are not sufficiently accurate in judging fine-grained CG quality, but that descriptions of visually similar images can significantly improve a VLM's understanding of a given CG image. Motivated by this observation, we adopt retrieval-augmented generation and propose a two-stream retrieval framework that effectively enhances the CG quality assessment capabilities of VLMs. Experiments on several representative VLMs demonstrate that our method substantially improves their performance on CG quality assessment.