LLM Reasoning 相关度: 8/10

Naïve Exposure of Generative AI Capabilities Undermines Deepfake Detection

Sunpill Kim, Chanwoo Hwang, Minsu Kim, Jae Hong Seo
arXiv: 2603.10504v1 发布: 2026-03-11 更新: 2026-03-11

AI 摘要

利用生成式AI的图像优化能力,可有效绕过现有深度伪造检测方法。

主要贡献

  • 证明了生成式AI的语义保持图像优化能力可以欺骗深度伪造检测器。
  • 揭示了商业AI系统比开源模型带来更大的安全风险,因为前者更易使用且效果更好。
  • 指出现有检测框架的威胁模型与现实生成式AI能力不匹配。

方法论

通过设计良性提示词,利用商业生成式AI系统进行图像优化,并评估优化后图像的深度伪造检测效果和身份保留情况。

原文摘要

Generative AI systems increasingly expose powerful reasoning and image refinement capabilities through user-facing chatbot interfaces. In this work, we show that the naïve exposure of such capabilities fundamentally undermines modern deepfake detectors. Rather than proposing a new image manipulation technique, we study a realistic and already-deployed usage scenario in which an adversary uses only benign, policy-compliant prompts and commercial generative AI systems. We demonstrate that state-of-the-art deepfake detection methods fail under semantic-preserving image refinement. Specifically, we show that generative AI systems articulate explicit authenticity criteria and inadvertently externalize them through unrestricted reasoning, enabling their direct reuse as refinement objectives. As a result, refined images simultaneously evade detection, preserve identity as verified by commercial face recognition APIs, and exhibit substantially higher perceptual quality. Importantly, we find that widely accessible commercial chatbot services pose a significantly greater security risk than open-source models, as their superior realism, semantic controllability, and low-barrier interfaces enable effective evasion by non-expert users. Our findings reveal a structural mismatch between the threat models assumed by current detection frameworks and the actual capabilities of real-world generative AI. While detection baselines are largely shaped by prior benchmarks, deployed systems expose unrestricted authenticity reasoning and refinement despite stringent safety controls in other domains.

标签

Deepfake Detection Generative AI Adversarial Attacks

arXiv 分类

cs.CR cs.AI cs.CV