Self-adapting Robotic Agents through Online Continual Reinforcement Learning with World Model Feedback
AI 摘要
提出一种基于世界模型反馈的在线持续强化学习框架,实现机器人自主适应。
主要贡献
- 提出基于世界模型预测残差的OOD事件检测方法
- 设计无需外部监督的自适应收敛评估机制
- 在四足机器人和真实模型车上验证框架有效性
方法论
利用DreamerV3,通过在线持续强化学习和世界模型反馈实现机器人控制器的自适应调整。
原文摘要
As learning-based robotic controllers are typically trained offline and deployed with fixed parameters, their ability to cope with unforeseen changes during operation is limited. Biologically inspired, this work presents a framework for online Continual Reinforcement Learning that enables automated adaptation during deployment. Building on DreamerV3, a model-based Reinforcement Learning algorithm, the proposed method leverages world model prediction residuals to detect out-of-distribution events and automatically trigger finetuning. Adaptation progress is monitored using both task-level performance signals and internal training metrics, allowing convergence to be assessed without external supervision and domain knowledge. The approach is validated on a variety of contemporary continuous control problems, including a quadruped robot in high-fidelity simulation, and a real-world model vehicle. Relevant metrics and their interpretation are presented and discussed, as well as resulting trade-offs described. The results sketch out how autonomous robotic agents could once move beyond static training regimes toward adaptive systems capable of self-reflection and -improvement during operation, just like their biological counterparts.