AI Agents 相关度: 9/10

CODE-SHARP: Continuous Open-ended Discovery and Evolution of Skills as Hierarchical Reward Programs

Richard Bornemann, Pierluigi Vito Amadori, Antoine Cully

arXiv: 2602.10085v1 发布: 2026-02-10 更新: 2026-02-10

下载 PDF arXiv 页面

AI 摘要

CODE-SHARP提出利用基础模型自动发现和进化技能的框架，用于解决复杂任务。

主要贡献

提出了CODE-SHARP框架，用于持续开放地发现和进化技能。
利用基础模型扩展和细化分层技能档案，该档案被组织为代码中的可执行奖励函数的有向图。
展示了使用发现的技能训练的goal-conditioned agent在Craftax环境中解决长程目标的能力。

方法论

利用基础模型自动生成奖励函数，构建技能库，并通过训练goal-conditioned agent学习和组合技能。

原文摘要

Developing agents capable of open-endedly discovering and learning novel skills is a grand challenge in Artificial Intelligence. While reinforcement learning offers a powerful framework for training agents to master complex skills, it typically relies on hand-designed reward functions. This is infeasible for open-ended skill discovery, where the set of meaningful skills is not known a priori. While recent methods have shown promising results towards automating reward function design, they remain limited to refining rewards for pre-defined tasks. To address this limitation, we introduce Continuous Open-ended Discovery and Evolution of Skills as Hierarchical Reward Programs (CODE-SHARP), a novel framework leveraging Foundation Models (FM) to open-endedly expand and refine a hierarchical skill archive, structured as a directed graph of executable reward functions in code. We show that a goal-conditioned agent trained exclusively on the rewards generated by the discovered SHARP skills learns to solve increasingly long-horizon goals in the Craftax environment. When composed by a high-level FM-based planner, the discovered skills enable a single goal-conditioned agent to solve complex, long-horizon tasks, outperforming both pretrained agents and task-specific expert policies by over $134$% on average. We will open-source our code and provide additional videos $\href{https://sites.google.com/view/code-sharp/homepage}{here}$.

arXiv 分类

cs.AI

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类