Multimodal Learning 相关度: 8/10

ULTRA: Unified Multimodal Control for Autonomous Humanoid Whole-Body Loco-Manipulation

Xialin He, Sirui Xu, Xinyao Li, Runpei Dong, Liuyu Bian, Yu-Xiong Wang, Liang-Yan Gui
arXiv: 2603.03279v1 发布: 2026-03-03 更新: 2026-03-03

AI 摘要

ULTRA提出了一个统一框架,实现人形机器人自主全身运动操作,提升了感知和任务规范下的泛化能力。

主要贡献

  • 提出物理驱动的神经重定向算法
  • 学习统一的多模态控制器
  • 通过强化学习微调提高鲁棒性

方法论

首先使用神经重定向算法将动作捕捉数据转化为物理可信的人形机器人动作,然后学习一个统一的控制器,最后通过强化学习进行微调。

原文摘要

Achieving autonomous and versatile whole-body loco-manipulation remains a central barrier to making humanoids practically useful. Yet existing approaches are fundamentally constrained: retargeted data are often scarce or low-quality; methods struggle to scale to large skill repertoires; and, most importantly, they rely on tracking predefined motion references rather than generating behavior from perception and high-level task specifications. To address these limitations, we propose ULTRA, a unified framework with two key components. First, we introduce a physics-driven neural retargeting algorithm that translates large-scale motion capture to humanoid embodiments while preserving physical plausibility for contact-rich interactions. Second, we learn a unified multimodal controller that supports both dense references and sparse task specifications, under sensing ranging from accurate motion-capture state to noisy egocentric visual inputs. We distill a universal tracking policy into this controller, compress motor skills into a compact latent space, and apply reinforcement learning finetuning to expand coverage and improve robustness under out-of-distribution scenarios. This enables coordinated whole-body behavior from sparse intent without test-time reference motions. We evaluate ULTRA in simulation and on a real Unitree G1 humanoid. Results show that ULTRA generalizes to autonomous, goal-conditioned whole-body loco-manipulation from egocentric perception, consistently outperforming tracking-only baselines with limited skills.

标签

人形机器人 全身运动操作 多模态学习 强化学习

arXiv 分类

cs.RO cs.CV