Multimodal Learning 相关度: 8/10

OpenFrontier: General Navigation with Visual-Language Grounded Frontiers

Esteban Padilla, Boyang Sun, Marc Pollefeys, Hermann Blum

arXiv: 2603.05377v1 发布: 2026-03-05 更新: 2026-03-05

下载 PDF arXiv 页面

AI 摘要

OpenFrontier提出了一种免训练的视觉语言导航框架，利用语义先验实现高效的零样本导航。

主要贡献

提出OpenFrontier框架，无需训练即可实现视觉语言导航
将导航问题转化为稀疏子目标识别和到达问题
利用视觉语言先验模型，以导航边界作为语义锚点

方法论

选择导航边界作为语义锚点，集成视觉语言先验模型，实现高效的零样本目标条件导航。

原文摘要

Open-world navigation requires robots to make decisions in complex everyday environments while adapting to flexible task requirements. Conventional navigation approaches often rely on dense 3D reconstruction and hand-crafted goal metrics, which limits their generalization across tasks and environments. Recent advances in vision--language navigation (VLN) and vision--language--action (VLA) models enable end-to-end policies conditioned on natural language, but typically require interactive training, large-scale data collection, or task-specific fine-tuning with a mobile agent. We formulate navigation as a sparse subgoal identification and reaching problem and observe that providing visual anchoring targets for high-level semantic priors enables highly efficient goal-conditioned navigation. Based on this insight, we select navigation frontiers as semantic anchors and propose OpenFrontier, a training-free navigation framework that seamlessly integrates diverse vision--language prior models. OpenFrontier enables efficient navigation with a lightweight system design, without dense 3D mapping, policy training, or model fine-tuning. We evaluate OpenFrontier across multiple navigation benchmarks and demonstrate strong zero-shot performance, as well as effective real-world deployment on a mobile robot.

arXiv 分类

cs.RO cs.CV

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类