LLM Reasoning 相关度: 9/10

Reasoning Shift: How Context Silently Shortens LLM Reasoning

Gleb Rodionov

arXiv: 2604.01161v1 发布: 2026-04-01 更新: 2026-04-01

下载 PDF arXiv 页面

AI 摘要

研究表明，LLM在不同上下文环境中，推理过程会显著缩短，影响自我验证行为。

主要贡献

揭示了LLM推理长度受上下文影响的现象（Reasoning Shift）
分析了推理过程缩短与自我验证行为减少的关联
评估了不同上下文对LLM推理能力的影响，指出上下文管理的重要性

方法论

通过在不同上下文（冗余上下文、多轮对话、子任务）下，系统性评估多个推理模型在同一问题上的推理表现。

原文摘要

Large language models (LLMs) exhibiting test-time scaling behavior, such as extended reasoning traces and self-verification, have demonstrated remarkable performance on complex, long-term reasoning tasks. However, the robustness of these reasoning behaviors remains underexplored. To investigate this, we conduct a systematic evaluation of multiple reasoning models across three scenarios: (1) problems augmented with lengthy, irrelevant context; (2) multi-turn conversational settings with independent tasks; and (3) problems presented as a subtask within a complex task. We observe an interesting phenomenon: reasoning models tend to produce much shorter reasoning traces (up to 50%) for the same problem under different context conditions compared to the traces produced when the problem is presented in isolation. A finer-grained analysis reveals that this compression is associated with a decrease in self-verification and uncertainty management behaviors, such as double-checking. While this behavioral shift does not compromise performance on straightforward problems, it might affect performance on more challenging tasks. We hope our findings draw additional attention to both the robustness of reasoning models and the problem of context management for LLMs and LLM-based agents.

arXiv 分类

cs.LG

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类