Multimodal Learning 相关度: 9/10

Fake-HR1: Rethinking reasoning of vision language model for synthetic image detection

Changjiang Jiang, Xinkuan Sha, Fengchang Yu, Jingjing Liu, Jian Liu, Mingqi Fang, Chenfeng Zhang, Wei Lu

arXiv: 2602.10042v1 发布: 2026-02-10 更新: 2026-02-10

下载 PDF arXiv 页面

AI 摘要

Fake-HR1自适应地进行推理，提升了图像合成检测的效率和性能。

主要贡献

提出了Fake-HR1混合推理模型
设计了两阶段训练框架HFT和HGRPO
实现了自适应推理并提高了检测效率

方法论

通过HFT冷启动初始化，再用HGRPO进行在线强化学习，隐式地学习何时选择合适的推理模式。

原文摘要

Recent studies have demonstrated that incorporating Chain-of-Thought (CoT) reasoning into the detection process can enhance a model's ability to detect synthetic images. However, excessively lengthy reasoning incurs substantial resource overhead, including token consumption and latency, which is particularly redundant when handling obviously generated forgeries. To address this issue, we propose Fake-HR1, a large-scale hybrid-reasoning model that, to the best of our knowledge, is the first to adaptively determine whether reasoning is necessary based on the characteristics of the generative detection task. To achieve this, we design a two-stage training framework: we first perform Hybrid Fine-Tuning (HFT) for cold-start initialization, followed by online reinforcement learning with Hybrid-Reasoning Grouped Policy Optimization (HGRPO) to implicitly learn when to select an appropriate reasoning mode. Experimental results show that Fake-HR1 adaptively performs reasoning across different types of queries, surpassing existing LLMs in both reasoning ability and generative detection performance, while significantly improving response efficiency.

arXiv 分类

cs.CV cs.AI

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类