AI Agents 相关度: 7/10

HoRD: Robust Humanoid Control via History-Conditioned Reinforcement Learning and Online Distillation

Puyue Wang, Jiawei Hu, Yan Gao, Junyan Wang, Yu Zhang, Gillian Dobbie, Tao Gu, Wafa Johal, Ting Dang, Hong Jia

arXiv: 2602.04412v1 发布: 2026-02-04 更新: 2026-02-04

下载 PDF arXiv 页面

AI 摘要

HoRD提出一种两阶段学习框架，通过历史条件强化学习和在线蒸馏实现鲁棒的人形机器人控制。

主要贡献

提出了一种历史条件强化学习方法，使策略能够在线适应不同的动力学随机化。
利用在线蒸馏将教师策略的鲁棒控制能力转移到基于Transformer的学生策略。
实现了在未见过的领域和外部扰动下零样本适应的鲁棒控制。

方法论

采用两阶段学习框架：历史条件强化学习训练教师策略，在线蒸馏训练基于Transformer的学生策略。

原文摘要

Humanoid robots can suffer significant performance drops under small changes in dynamics, task specifications, or environment setup. We propose HoRD, a two-stage learning framework for robust humanoid control under domain shift. First, we train a high-performance teacher policy via history-conditioned reinforcement learning, where the policy infers latent dynamics context from recent state--action trajectories to adapt online to diverse randomized dynamics. Second, we perform online distillation to transfer the teacher's robust control capabilities into a transformer-based student policy that operates on sparse root-relative 3D joint keypoint trajectories. By combining history-conditioned adaptation with online distillation, HoRD enables a single policy to adapt zero-shot to unseen domains without per-domain retraining. Extensive experiments show HoRD outperforms strong baselines in robustness and transfer, especially under unseen domains and external perturbations. Code and project page are available at \href{https://tonywang-0517.github.io/hord/}{https://tonywang-0517.github.io/hord/}.

arXiv 分类

cs.RO cs.LG

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类