Therefore I am. I Think
AI 摘要
大语言模型在推理前已做出决策,推理过程倾向于合理化既定选择。
主要贡献
- 揭示了决策在推理过程中的提前编码现象
- 通过激活操控验证了决策对推理过程的因果影响
- 分析了模型如何合理化决策反转
方法论
利用线性探针解码预生成激活中的决策信息,并使用激活操控干预决策方向,观察模型行为变化。
原文摘要
We consider the question: when a large language reasoning model makes a choice, did it think first and then decide to, or decide first and then think? In this paper, we present evidence that detectable, early-encoded decisions shape chain-of-thought in reasoning models. Specifically, we show that a simple linear probe successfully decodes tool-calling decisions from pre-generation activations with very high confidence, and in some cases, even before a single reasoning token is produced. Activation steering supports this causally: perturbing the decision direction leads to inflated deliberation, and flips behavior in many examples (between 7 - 79% depending on model and benchmark). We also show through behavioral analysis that, when steering changes the decision, the chain-of-thought process often rationalizes the flip rather than resisting it. Together, these results suggest that reasoning models can encode action choices before they begin to deliberate in text.