Multimodal Learning 相关度: 8/10

Grounding Sim-to-Real Generalization in Dexterous Manipulation: An Empirical Study with Vision-Language-Action Models

Ruixing Jin, Zicheng Zhu, Ruixiang Ouyang, Sheng Xu, Bo Yue, Zhizheng Wu, Guiliang Liu
arXiv: 2603.22876v1 发布: 2026-03-24 更新: 2026-03-24

AI 摘要

研究了灵巧操作中Sim-to-Real泛化的关键因素,并提出了评估协议。

主要贡献

  • 评估了多层次域随机化等因素对Sim-to-Real的影响
  • 设计了全面的灵巧操作评估协议
  • 发布了机器人平台和评估协议以促进研究

方法论

通过在四个维度(域随机化、渲染、物理建模、RL更新)上进行大量真实世界实验来分析Sim-to-Real。

原文摘要

Learning a generalist control policy for dexterous manipulation typically relies on large-scale datasets. Given the high cost of real-world data collection, a practical alternative is to generate synthetic data through simulation. However, the resulting synthetic data often exhibits a significant gap from real-world distributions. While many prior studies have proposed algorithms to bridge the Sim-to-Real discrepancy, there remains a lack of principled research that grounds these methods in real-world manipulation tasks, particularly their performance on generalist policies such as Vision-Language-Action (VLA) models. In this study, we empirically examine the primary determinants of Sim-to-Real generalization across four dimensions: multi-level domain randomization, photorealistic rendering, physics-realistic modeling, and reinforcement learning updates. To support this study, we design a comprehensive evaluation protocol to quantify the real-world performance of manipulation tasks. The protocol accounts for key variations in background, lighting, distractors, object types, and spatial features. Through experiments involving over 10k real-world trials, we derive critical insights into Sim-to-Real transfer. To inform and advance future studies, we release both the robotic platforms and the evaluation protocol for public access to facilitate independent verification, thereby establishing a realistic and standardized benchmark for dexterous manipulation policies.

标签

Sim-to-Real 灵巧操作 视觉语言动作模型 领域随机化

arXiv 分类

cs.RO cs.AI