Think Anywhere in Code Generation
AI 摘要
提出Think-Anywhere,一种在代码生成过程中按需进行推理的新机制,提升了LLM的性能和可解释性。
主要贡献
- 提出了Think-Anywhere推理机制,允许LLM在代码生成过程中随时进行推理。
- 通过冷启动训练和基于结果的强化学习,实现了Think-Anywhere的自适应推理能力。
- 在多个代码生成基准测试中取得了SOTA性能,并展示了泛化能力。
方法论
通过模仿推理模式进行冷启动训练,并使用强化学习优化推理的触发时机和位置。
原文摘要
Recent advances in reasoning Large Language Models (LLMs) have primarily relied on upfront thinking, where reasoning occurs before final answer. However, this approach suffers from critical limitations in code generation, where upfront thinking is often insufficient as problems' full complexity only reveals itself during code implementation. Moreover, it cannot adaptively allocate reasoning effort throughout the code generation process where difficulty varies significantly. In this paper, we propose Think-Anywhere, a novel reasoning mechanism that enables LLMs to invoke thinking on-demand at any token position during code generation. We achieve Think-Anywhere by first teaching LLMs to imitate the reasoning patterns through cold-start training, then leveraging outcome-based RL rewards to drive the model's autonomous exploration of when and where to invoke reasoning. Extensive experiments on four mainstream code generation benchmarks (i.e., LeetCode, LiveCodeBench, HumanEval, and MBPP) show that Think-Anywhere achieves state-of-the-art performance over both existing reasoning methods and recent post-training approaches, while demonstrating consistent generalization across diverse LLMs. Our analysis further reveals that Think-Anywhere enables the model to adaptively invoke reasoning at high-entropy positions, providing enhanced interpretability.