Agent Tuning & Optimization 相关度: 7/10

Reward-Conditioned Reinforcement Learning

Michal Nauman, Marek Cygan, Pieter Abbeel

arXiv: 2603.05066v1 发布: 2026-03-05 更新: 2026-03-05

下载 PDF arXiv 页面

AI 摘要

提出奖励条件强化学习RCRL，通过条件策略学习多个奖励目标，提升鲁棒性和适应性。

主要贡献

提出RCRL框架，训练单个agent优化奖励家族
利用共享回放数据离线学习多个奖励目标
提高nominal奖励参数下的性能，并能有效适应新的参数化

方法论

RCRL基于奖励参数化条件化agent，通过共享回放数据离线学习多种奖励目标。

原文摘要

RL agents are typically trained under a single, fixed reward function, which makes them brittle to reward misspecification and limits their ability to adapt to changing task preferences. We introduce Reward-Conditioned Reinforcement Learning (RCRL), a framework that trains a single agent to optimize a family of reward specifications while collecting experience under only one nominal objective. RCRL conditions the agent on reward parameterizations and learns multiple reward objectives from a shared replay data entirely off-policy, enabling a single policy to represent reward-specific behaviors. Across single-task, multi-task, and vision-based benchmarks, we show that RCRL not only improves performance under the nominal reward parameterization, but also enables efficient adaptation to new parameterizations. Our results demonstrate that RCRL provides a scalable mechanism for learning robust, steerable policies without sacrificing the simplicity of single-task training.

arXiv 分类

cs.LG

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类