Multimodal Learning 相关度: 8/10

OpenFrontier: General Navigation with Visual-Language Grounded Frontiers

Esteban Padilla, Boyang Sun, Marc Pollefeys, Hermann Blum
arXiv: 2603.05377v1 发布: 2026-03-05 更新: 2026-03-05

AI 摘要

OpenFrontier提出了一种免训练的视觉语言导航框架,利用语义先验实现高效的零样本导航。

主要贡献

  • 提出OpenFrontier框架,无需训练即可实现视觉语言导航
  • 将导航问题转化为稀疏子目标识别和到达问题
  • 利用视觉语言先验模型,以导航边界作为语义锚点

方法论

选择导航边界作为语义锚点,集成视觉语言先验模型,实现高效的零样本目标条件导航。

原文摘要

Open-world navigation requires robots to make decisions in complex everyday environments while adapting to flexible task requirements. Conventional navigation approaches often rely on dense 3D reconstruction and hand-crafted goal metrics, which limits their generalization across tasks and environments. Recent advances in vision--language navigation (VLN) and vision--language--action (VLA) models enable end-to-end policies conditioned on natural language, but typically require interactive training, large-scale data collection, or task-specific fine-tuning with a mobile agent. We formulate navigation as a sparse subgoal identification and reaching problem and observe that providing visual anchoring targets for high-level semantic priors enables highly efficient goal-conditioned navigation. Based on this insight, we select navigation frontiers as semantic anchors and propose OpenFrontier, a training-free navigation framework that seamlessly integrates diverse vision--language prior models. OpenFrontier enables efficient navigation with a lightweight system design, without dense 3D mapping, policy training, or model fine-tuning. We evaluate OpenFrontier across multiple navigation benchmarks and demonstrate strong zero-shot performance, as well as effective real-world deployment on a mobile robot.

标签

视觉语言导航 零样本学习 机器人导航 语义理解

arXiv 分类

cs.RO cs.CV