AI Agents 相关度: 9/10

ManipArena: Comprehensive Real-world Evaluation of Reasoning-Oriented Generalist Robot Manipulation

Yu Sun, Meng Cao, Ping Yang, Rongtao Xu, Yunxiao Yan, Runze Xu, Liang Ma, Roy Gan, Andy Zhai, Qingxuan Chen, Zunnan Xu, Hao Wang, Jincheng Yu, Lucy Liang, Qian Wang, Ivan Laptev, Ian D Reid, Xiaodan Liang
arXiv: 2603.28545v1 发布: 2026-03-30 更新: 2026-03-30

AI 摘要

提出了ManipArena,一个用于评估推理型通用机器人操作的标准化真实世界评估框架。

主要贡献

  • 提出了ManipArena评估框架,弥合模拟与真实世界执行的差距
  • 包含20个多样化任务,强调需要语义和空间推理的操作
  • 提供丰富的感官诊断,包括低级电机信号和同步的真实-模拟环境

方法论

构建包含多种任务和感官诊断的真实-模拟环境,用于评估VLA和世界模型在机器人操作中的性能。

原文摘要

Vision-Language-Action (VLA) models and world models have recently emerged as promising paradigms for general-purpose robotic intelligence, yet their progress is hindered by the lack of reliable evaluation protocols that reflect real-world deployment. Existing benchmarks are largely simulator-centric, which provide controllability but fail to capture the reality gap caused by perception noise, complex contact dynamics, hardware constraints, and system latency. Moreover, fragmented real-world evaluations across different robot platforms prevent fair and reproducible comparison. To address these challenges, we introduce ManipArena, a standardized evaluation framework designed to bridge simulation and real-world execution. ManipArena comprises 20 diverse tasks across 10,812 expert trajectories emphasizing reasoning-oriented manipulation tasks requiring semantic and spatial reasoning, supports multi-level generalization through controlled out-of-distribution settings, and incorporates long-horizon mobile manipulation beyond tabletop scenarios. The framework further provides rich sensory diagnostics, including low-level motor signals, and synchronized real-to-sim environments constructed via high-quality 3D scanning. Together, these features enable fair, realistic, and reproducible evaluation for both VLA and world model approaches, providing a scalable foundation for diagnosing and advancing embodied intelligence systems.

标签

robotics manipulation evaluation VLA world model

arXiv 分类

cs.RO cs.CV