Agent Tuning & Optimization 相关度: 9/10

SWE-Protégé: Learning to Selectively Collaborate With an Expert Unlocks Small Language Models as Software Engineering Agents

Patrick Tser Jern Kon, Archana Pradeep, Ang Chen, Alexander P. Ellis, Warren Hunt, Zijian Wang, John Yang, Samuel Thompson

arXiv: 2602.22124v1 发布: 2026-02-25 更新: 2026-02-25

下载 PDF arXiv 页面

AI 摘要

SWE-Protégé框架提升了小语言模型在软件工程任务上的性能，通过模仿专家协作。

主要贡献

提出SWE-Protégé框架
利用专家增强的轨迹进行监督微调
使用强化学习抑制循环和低效协作

方法论

采用后训练框架，结合监督微调和强化学习，让SLM选择性地向专家寻求指导。

原文摘要

Small language models (SLMs) offer compelling advantages in cost, latency, and adaptability, but have so far lagged behind larger models on long-horizon software engineering tasks such as SWE-bench, where they suffer from pervasive action looping and low resolution rates. We introduce SWE-Protégé, a post-training framework that reframes software repair as an expert-protégé collaboration problem. In SWE-Protégé, an SLM remains the sole decision-maker while learning to selectively seek guidance from a strong expert model, recognize stalled states, and follow through on expert feedback. Our approach combines supervised fine-tuning on expert-augmented trajectories with agentic reinforcement learning that explicitly discourages degenerative looping and unproductive expert collaboration. We lightly post-train Qwen2.5-Coder-7B-Instruct to achieve 42.4% Pass@1 on SWE-bench Verified, a +25.4% improvement over the prior SLM state of the art, while using expert assistance sparsely (~4 calls per task and 11% of total tokens).

arXiv 分类

cs.SE cs.AI cs.CL cs.LG

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类