Dynamic Dual-Granularity Skill Bank for Agentic RL
AI 摘要
D2Skill通过双粒度技能库提升Agentic RL表现,动态更新技能并用于策略优化,显著提高任务成功率。
主要贡献
- 提出了双粒度技能库D2Skill,包含任务技能和步骤技能。
- 使用训练时经验,通过性能差距生成后见效用信号,用于技能更新和策略优化。
- 实验证明D2Skill在ALFWorld和WebShop上提高了成功率。
方法论
构建双粒度技能库,利用基线策略和技能注入策略的性能差距,进行技能更新和策略优化。
原文摘要
Agentic reinforcement learning (RL) can benefit substantially from reusable experience, yet existing skill-based methods mainly extract trajectory-level guidance and often lack principled mechanisms for maintaining an evolving skill memory. We propose D2Skill, a dynamic dual-granularity skill bank for agentic RL that organizes reusable experience into task skills for high-level guidance and step skills for fine-grained decision support and error correction. D2Skill jointly trains the policy and skill bank through paired baseline and skill-injected rollouts under the same policy, using their performance gap to derive hindsight utility signals for both skill updating and policy optimization. Built entirely from training-time experience, the skill bank is continuously expanded through reflection and maintained with utility-aware retrieval and pruning. Experiments on ALFWorld and WebShop with Qwen2.5-7B-Instruct and Qwen3-4B-Instruct-2507 show that D2Skill consistently improves success rates over skill-free baselines by 10-20 points. Further ablations and analyses show that both dual-granularity skill modeling and dynamic skill maintenance are critical to these gains, while the learned skills exhibit higher utility, transfer across evaluation settings, and introduce only modest training overhead.