Multimodal Learning 相关度: 9/10

ArtifactLens: Hundreds of Labels Are Enough for Artifact Detection with VLMs

James Burgess, Rameen Abdal, Dan Stoddart, Sergey Tulyakov, Serena Yeung-Levy, Kuan-Chieh Jackson Wang
arXiv: 2602.09475v1 发布: 2026-02-10 更新: 2026-02-10

AI 摘要

ArtifactLens利用少量标注数据,解锁预训练VLM的伪影检测能力,在AIGC领域实现SOTA。

主要贡献

  • 提出 ArtifactLens 系统,用少量标注数据实现高效伪影检测。
  • 在多个伪影数据集上取得了最先进的结果。
  • 通过多组件架构、上下文学习和文本指令优化,提升了泛化能力。

方法论

利用预训练VLM,通过上下文学习和文本指令优化,构建多组件架构的伪影检测系统。

原文摘要

Modern image generators produce strikingly realistic images, where only artifacts like distorted hands or warped objects reveal their synthetic origin. Detecting these artifacts is essential: without detection, we cannot benchmark generators or train reward models to improve them. Current detectors fine-tune VLMs on tens of thousands of labeled images, but this is expensive to repeat whenever generators evolve or new artifact types emerge. We show that pretrained VLMs already encode the knowledge needed to detect artifacts - with the right scaffolding, this capability can be unlocked using only a few hundred labeled examples per artifact category. Our system, ArtifactLens, achieves state-of-the-art on five human artifact benchmarks (the first evaluation across multiple datasets) while requiring orders of magnitude less labeled data. The scaffolding consists of a multi-component architecture with in-context learning and text instruction optimization, with novel improvements to each. Our methods generalize to other artifact types - object morphology, animal anatomy, and entity interactions - and to the distinct task of AIGC detection.

标签

伪影检测 VLM 少量样本学习 AIGC

arXiv 分类

cs.CV cs.AI cs.LG