LLM Reasoning 相关度: 9/10

Self-Verification Dilemma: Experience-Driven Suppression of Overused Checking in LLM Reasoning

Quanyu Long, Kai Jie Jiang, Jianda Chen, Xu Guo, Leilei Gan, Wenya Wang
arXiv: 2602.03485v1 发布: 2026-02-03 更新: 2026-02-03

AI 摘要

论文发现LLM推理中过度自验证现象,提出经验驱动框架抑制无效自验证,减少token使用并保持甚至提升准确率。

主要贡献

  • 发现LLM推理中过度自验证问题
  • 提出经验驱动的自验证抑制框架
  • 实验证明该方法能减少token使用并维持/提升准确率

方法论

通过检测LLM的自验证行为,检索历史经验池判断是否需要验证,如果经验表明不必要,则抑制验证。

原文摘要

Large Reasoning Models (LRMs) achieve strong performance by generating long reasoning traces with reflection. Through a large-scale empirical analysis, we find that a substantial fraction of reflective steps consist of self-verification (recheck) that repeatedly confirm intermediate results. These rechecks occur frequently across models and benchmarks, yet the vast majority are confirmatory rather than corrective, rarely identifying errors and altering reasoning outcomes. This reveals a mismatch between how often self-verification is activated and how often it is actually useful. Motivated by this, we propose a novel, experience-driven test-time framework that reduces the overused verification. Our method detects the activation of recheck behavior, consults an offline experience pool of past verification outcomes, and estimates whether a recheck is likely unnecessary via efficient retrieval. When historical experience suggests unnecessary, a suppression signal redirects the model to proceed. Across multiple model and benchmarks, our approach reduces token usage up to 20.3% while maintaining the accuracy, and in some datasets even yields accuracy improvements.

标签

LLM Reasoning Self-Verification Efficiency Experience-Driven

arXiv 分类

cs.CL cs.AI cs.LG