Evolutionary System Prompt Learning can Facilitate Reinforcement Learning for LLMs
AI 摘要
提出E-SPL方法,结合强化学习和进化系统提示学习,提升LLM在推理和Agent任务中的性能和泛化能力。
主要贡献
- 提出Evolutionary System Prompt Learning (E-SPL) 方法
- 结合强化学习更新模型权重和进化算法优化系统提示
- 验证了E-SPL在推理和Agent任务中的有效性
方法论
E-SPL并行选择系统提示进行rollout,根据性能更新模型权重和系统提示的TrueSkill评分,并通过LLM驱动的变异和交叉进化系统提示。
原文摘要
Building agentic systems that can autonomously self-improve from experience is a longstanding goal of AI. Large language models (LLMs) today primarily self-improve via two mechanisms: self-reflection for context updates, and reinforcement learning (RL) for weight updates. In this work, we propose Evolutionary System Prompt Learning (E-SPL), a method for jointly improving model contexts and model weights. In each RL iteration, E-SPL selects multiple system prompts and runs rollouts with each in parallel. It applies RL updates to model weights conditioned on each system prompt, and evolutionary updates to the system prompt population via LLM-driven mutation and crossover. Each system prompt has a TrueSkill rating for evolutionary selection, updated from relative performance within each RL iteration batch. E-SPL encourages a natural division between declarative knowledge encoded in prompts and procedural knowledge encoded in weights, resulting in improved performance across reasoning and agentic tasks. For instance, in an easy-to-hard (AIME $\rightarrow$ BeyondAIME) generalization setting, E-SPL improves RL success rate from 38.8% $\rightarrow$ 45.1% while also outperforming reflective prompt evolution (40.0%). Overall, our results show that coupling reinforcement learning with system prompt evolution yields consistent gains in sample efficiency and generalization. Code: https://github.com/LunjunZhang/E-SPL