Naïve Exposure of Generative AI Capabilities Undermines Deepfake Detection
AI 摘要
利用生成式AI的图像优化能力,可有效绕过现有深度伪造检测方法。
主要贡献
- 证明了生成式AI的语义保持图像优化能力可以欺骗深度伪造检测器。
- 揭示了商业AI系统比开源模型带来更大的安全风险,因为前者更易使用且效果更好。
- 指出现有检测框架的威胁模型与现实生成式AI能力不匹配。
方法论
通过设计良性提示词,利用商业生成式AI系统进行图像优化,并评估优化后图像的深度伪造检测效果和身份保留情况。
原文摘要
Generative AI systems increasingly expose powerful reasoning and image refinement capabilities through user-facing chatbot interfaces. In this work, we show that the naïve exposure of such capabilities fundamentally undermines modern deepfake detectors. Rather than proposing a new image manipulation technique, we study a realistic and already-deployed usage scenario in which an adversary uses only benign, policy-compliant prompts and commercial generative AI systems. We demonstrate that state-of-the-art deepfake detection methods fail under semantic-preserving image refinement. Specifically, we show that generative AI systems articulate explicit authenticity criteria and inadvertently externalize them through unrestricted reasoning, enabling their direct reuse as refinement objectives. As a result, refined images simultaneously evade detection, preserve identity as verified by commercial face recognition APIs, and exhibit substantially higher perceptual quality. Importantly, we find that widely accessible commercial chatbot services pose a significantly greater security risk than open-source models, as their superior realism, semantic controllability, and low-barrier interfaces enable effective evasion by non-expert users. Our findings reveal a structural mismatch between the threat models assumed by current detection frameworks and the actual capabilities of real-world generative AI. While detection baselines are largely shaped by prior benchmarks, deployed systems expose unrestricted authenticity reasoning and refinement despite stringent safety controls in other domains.