Agent Tuning & Optimization 相关度: 9/10

SWE-Protégé: Learning to Selectively Collaborate With an Expert Unlocks Small Language Models as Software Engineering Agents

Patrick Tser Jern Kon, Archana Pradeep, Ang Chen, Alexander P. Ellis, Warren Hunt, Zijian Wang, John Yang, Samuel Thompson
arXiv: 2602.22124v1 发布: 2026-02-25 更新: 2026-02-25

AI 摘要

SWE-Protégé框架提升了小语言模型在软件工程任务上的性能,通过模仿专家协作。

主要贡献

  • 提出SWE-Protégé框架
  • 利用专家增强的轨迹进行监督微调
  • 使用强化学习抑制循环和低效协作

方法论

采用后训练框架,结合监督微调和强化学习,让SLM选择性地向专家寻求指导。

原文摘要

Small language models (SLMs) offer compelling advantages in cost, latency, and adaptability, but have so far lagged behind larger models on long-horizon software engineering tasks such as SWE-bench, where they suffer from pervasive action looping and low resolution rates. We introduce SWE-Protégé, a post-training framework that reframes software repair as an expert-protégé collaboration problem. In SWE-Protégé, an SLM remains the sole decision-maker while learning to selectively seek guidance from a strong expert model, recognize stalled states, and follow through on expert feedback. Our approach combines supervised fine-tuning on expert-augmented trajectories with agentic reinforcement learning that explicitly discourages degenerative looping and unproductive expert collaboration. We lightly post-train Qwen2.5-Coder-7B-Instruct to achieve 42.4% Pass@1 on SWE-bench Verified, a +25.4% improvement over the prior SLM state of the art, while using expert assistance sparsely (~4 calls per task and 11% of total tokens).

标签

软件工程 小语言模型 专家系统 强化学习

arXiv 分类

cs.SE cs.AI cs.CL cs.LG