Multimodal Learning 相关度: 7/10

ManiTwin: Scaling Data-Generation-Ready Digital Object Dataset to 100K

Kaixuan Wang, Tianxing Chen, Jiawei Liu, Honghao Su, Shaolong Zhu, Minxuan Wang, Zixuan Li, Yue Chen, Huan-ang Gao, Yusen Qin, Jiawei Wang, Qixuan Zhang, Lan Xu, Jingyi Yu, Yao Mu, Ping Luo
arXiv: 2603.16866v1 发布: 2026-03-17 更新: 2026-03-17

AI 摘要

ManiTwin提出了一种自动化流程,高效生成大规模、高质量的机器人操作数据。

主要贡献

  • 构建了包含10万个高质量3D资产的ManiTwin-100K数据集
  • 提出一种高效的数据生成流程,可将单张图像转换为仿真可用的3D资产
  • 数据集包含物理属性、语言描述、功能注释和操作建议

方法论

该方法将单张图像转换为仿真可用的、带语义标注的3D资产,从而实现大规模机器人操作数据生成。

原文摘要

Learning in simulation provides a useful foundation for scaling robotic manipulation capabilities. However, this paradigm often suffers from a lack of data-generation-ready digital assets, in both scale and diversity. In this work, we present ManiTwin, an automated and efficient pipeline for generating data-generation-ready digital object twins. Our pipeline transforms a single image into simulation-ready and semantically annotated 3D asset, enabling large-scale robotic manipulation data generation. Using this pipeline, we construct ManiTwin-100K, a dataset containing 100K high-quality annotated 3D assets. Each asset is equipped with physical properties, language descriptions, functional annotations, and verified manipulation proposals. Experiments demonstrate that ManiTwin provides an efficient asset synthesis and annotation workflow, and that ManiTwin-100K offers high-quality and diverse assets for manipulation data generation, random scene synthesis, and VQA data generation, establishing a strong foundation for scalable simulation data synthesis and policy learning. Our webpage is available at https://manitwin.github.io/.

标签

机器人操作 数据生成 3D资产 仿真

arXiv 分类

cs.RO cs.AI cs.GR cs.LG cs.SE