Agent Tuning & Optimization 相关度: 7/10

Reward-Conditioned Reinforcement Learning

Michal Nauman, Marek Cygan, Pieter Abbeel
arXiv: 2603.05066v1 发布: 2026-03-05 更新: 2026-03-05

AI 摘要

提出奖励条件强化学习RCRL,通过条件策略学习多个奖励目标,提升鲁棒性和适应性。

主要贡献

  • 提出RCRL框架,训练单个agent优化奖励家族
  • 利用共享回放数据离线学习多个奖励目标
  • 提高nominal奖励参数下的性能,并能有效适应新的参数化

方法论

RCRL基于奖励参数化条件化agent,通过共享回放数据离线学习多种奖励目标。

原文摘要

RL agents are typically trained under a single, fixed reward function, which makes them brittle to reward misspecification and limits their ability to adapt to changing task preferences. We introduce Reward-Conditioned Reinforcement Learning (RCRL), a framework that trains a single agent to optimize a family of reward specifications while collecting experience under only one nominal objective. RCRL conditions the agent on reward parameterizations and learns multiple reward objectives from a shared replay data entirely off-policy, enabling a single policy to represent reward-specific behaviors. Across single-task, multi-task, and vision-based benchmarks, we show that RCRL not only improves performance under the nominal reward parameterization, but also enables efficient adaptation to new parameterizations. Our results demonstrate that RCRL provides a scalable mechanism for learning robust, steerable policies without sacrificing the simplicity of single-task training.

标签

强化学习 奖励工程 多目标学习 离线学习

arXiv 分类

cs.LG