Multimodal Learning 相关度: 9/10

Benchmarking Affordance Generalization with BusyBox

Dean Fortier, Timothy Adamson, Tess Hellebrekers, Teresa LaScala, Kofi Ennin, Michael Murray, Andrey Kolobov, Galen Mullins
arXiv: 2602.05441v1 发布: 2026-02-05 更新: 2026-02-05

AI 摘要

提出了BusyBox,一个评估VLA模型在操作具有熟悉物理特征的新物体时泛化能力的物理基准。

主要贡献

  • 提出了BusyBox基准,用于评估VLA模型的affordance generalization能力
  • BusyBox由可互换和旋转的模块组成,可创建具有不同外观但相同 affordance 的变体
  • 发布了 CAD 文件、材料清单和语言注释演示数据集,方便复现和研究

方法论

通过物理基准测试,系统性地评估VLA模型在不同BusyBox变体上的操作能力,衡量其affordance generalization性能。

原文摘要

Vision-Language-Action (VLA) models have been attracting the attention of researchers and practitioners thanks to their promise of generalization. Although single-task policies still offer competitive performance, VLAs are increasingly able to handle commands and environments unseen in their training set. While generalization in vision and language space is undoubtedly important for robust versatile behaviors, a key meta-skill VLAs need to possess is affordance generalization -- the ability to manipulate new objects with familiar physical features. In this work, we present BusyBox, a physical benchmark for systematic semi-automatic evaluation of VLAs' affordance generalization. BusyBox consists of 6 modules with switches, sliders, wires, buttons, a display, and a dial. The modules can be swapped and rotated to create a multitude of BusyBox variations with different visual appearances but the same set of affordances. We empirically demonstrate that generalization across BusyBox variants is highly challenging even for strong open-weights VLAs such as $π_{0.5}$ and GR00T-N1.6. To encourage the research community to evaluate their own VLAs on BusyBox and to propose new affordance generalization experiments, we have designed BusyBox to be easy to build in most robotics labs. We release the full set of CAD files for 3D-printing its parts as well as a bill of materials for (optionally) assembling its electronics. We also publish a dataset of language-annotated demonstrations that we collected using the common bimanual Mobile Aloha robot on the canonical BusyBox configuration. All of the released materials are available at https://microsoft.github.io/BusyBox.

标签

Vision-Language-Action Affordance Generalization Robotics

arXiv 分类

cs.RO cs.AI