CodeScout: An Effective Recipe for Reinforcement Learning of Code Search Agents
AI 摘要
CodeScout使用简单的Unix终端和强化学习,在代码搜索任务上达到SOTA性能。
主要贡献
- 证明了简单工具配合有效的强化学习方法可以实现强大的代码搜索能力。
- 提出了针对代码搜索任务的强化学习训练方法,包括环境复用、奖励设计和优化。
- 发布了CodeScout模型家族和相关代码数据,促进社区发展。
方法论
利用强化学习训练一个配备标准Unix终端的智能体,通过奖励函数引导其进行有效的代码搜索。
原文摘要
A prerequisite for coding agents to perform tasks on large repositories is code localization - the identification of relevant files, classes, and functions to work on. While repository-level code localization has been performed using embedding-based retrieval approaches such as vector search, recent work has focused on developing agents to localize relevant code either as a standalone precursor to or interleaved with performing actual work. Most prior methods on agentic code search equip the agent with complex, specialized tools, such as repository graphs derived from static analysis. In this paper, we demonstrate that, with an effective reinforcement learning recipe, a coding agent equipped with nothing more than a standard Unix terminal can be trained to achieve strong results. Our experiments on three benchmarks (SWE-Bench Verified, Pro, and Lite) reveal that our models consistently achieve superior or competitive performance over 2-18x larger base and post-trained LLMs and sometimes approach performance provided by closed models like Claude Sonnet, even when using specialized scaffolds. Our work particularly focuses on techniques for re-purposing existing coding agent environments for code search, reward design, and RL optimization. We release the resulting model family, CodeScout, along with all our code and data for the community to build upon.