LLM Reasoning 相关度: 9/10

LLMs Encode Their Failures: Predicting Success from Pre-Generation Activations

William Lugoloobi, Thomas Foster, William Bankes, Chris Russell
arXiv: 2602.09924v1 发布: 2026-02-10 更新: 2026-02-10

AI 摘要

论文研究了LLM在生成前从内部表征预测成功率,并利用此信号提升推理效率。

主要贡献

  • 提出了一种从LLM生成前激活中预测成功率的方法
  • 证明了LLM编码了与人类认知不同的、模型特定的难度概念
  • 展示了基于该预测信号的模型池路由方法,可以在降低推理成本的同时提升性能

方法论

训练线性探针预测数学和代码任务中特定策略的成功率,并利用E2H-AMC数据集进行验证和分析。

原文摘要

Running LLMs with extended reasoning on every problem is expensive, but determining which inputs actually require additional compute remains challenging. We investigate whether their own likelihood of success is recoverable from their internal representations before generation, and if this signal can guide more efficient inference. We train linear probes on pre-generation activations to predict policy-specific success on math and coding tasks, substantially outperforming surface features such as question length and TF-IDF. Using E2H-AMC, which provides both human and model performance on identical problems, we show that models encode a model-specific notion of difficulty that is distinct from human difficulty, and that this distinction increases with extended reasoning. Leveraging these probes, we demonstrate that routing queries across a pool of models can exceed the best-performing model whilst reducing inference cost by up to 70\% on MATH, showing that internal representations enable practical efficiency gains even when they diverge from human intuitions about difficulty. Our code is available at: https://github.com/KabakaWilliam/llms_know_difficulty

标签

LLM 推理效率 线性探针 模型路由

arXiv 分类

cs.CL cs.AI cs.LG