Agent Tuning & Optimization 相关度: 9/10

Internalizing Agency from Reflective Experience

Rui Ge, Yichao Fu, Yuyang Qian, Junda Su, Yiming Zhao, Peng Zhao, Hao Zhang

arXiv: 2603.16843v1 发布: 2026-03-17 更新: 2026-03-17

下载 PDF arXiv 页面

AI 摘要

LEAFE框架通过反思经验学习反馈驱动的代理能力，提升LLM在复杂交互任务中的问题解决能力。

主要贡献

提出LEAFE框架，从反思经验中学习代理能力
利用环境反馈进行经验总结和行为修正
在交互式编码和代理任务上验证了LEAFE的有效性

方法论

LEAFE通过探索、总结反馈、回溯和行为修正，将经验知识提炼并蒸馏到模型中，提升模型未来的交互表现。

原文摘要

Large language models are increasingly deployed as autonomous agents that must plan, act, and recover from mistakes through long-horizon interaction with environments that provide rich feedback. However, prevailing outcome-driven post-training methods (e.g., RL with verifiable rewards) primarily optimize final success signals, leaving rich environment feedback underutilized. Consequently, they often lead to distribution sharpening: the policy becomes better at reproducing a narrow set of already-successful behaviors, while failing to improve the feedback-grounded agency needed to expand problem-solving capacity (e.g., Pass@k) in long-horizon settings. To address this, we propose LEAFE (Learning Feedback-Grounded Agency from Reflective Experience), a framework that internalizes recovery agency from reflective experience. Specifically, during exploration, the agent summarizes environment feedback into actionable experience, backtracks to earlier decision points, and explores alternative branches with revised actions. We then distill these experience-guided corrections into the model through supervised fine-tuning, enabling the policy to recover more effectively in future interactions. Across a diverse set of interactive coding and agentic tasks under fixed interaction budgets, LEAFE consistently improves Pass@1 over the base model and achieves higher Pass@k than outcome-driven baselines (GRPO) and experience-based methods such as Early Experience, with gains of up to 14% on Pass@128.

arXiv 分类

cs.AI

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类