Multimodal Learning 相关度: 8/10

R2F: Repurposing Ray Frontiers for LLM-free Object Navigation

Francesco Argenziano, John Mark Alexis Marcelo, Michele Brienza, Abdel Hakim Drid, Emanuele Musumeci, Daniele Nardi, Domenico D. Bloisi, Vincenzo Suriani
arXiv: 2603.08475v1 发布: 2026-03-09 更新: 2026-03-09

AI 摘要

提出一种无需LLM的实时目标导航方法R2F,显著提升导航效率。

主要贡献

  • 重新利用ray frontiers进行目标导航
  • 提出R2F-VLN,扩展到自由形式语言指令
  • 实现实时零样本导航,性能优于VLM方法

方法论

重构ray frontiers为方向条件语义假设,结合嵌入式frontier评分和经典地图规划,无需迭代LLM推理。

原文摘要

Zero-shot open-vocabulary object navigation has progressed rapidly with the emergence of large Vision-Language Models (VLMs) and Large Language Models (LLMs), now widely used as high-level decision-makers instead of end-to-end policies. Although effective, such systems often rely on iterative large-model queries at inference time, introducing latency and computational overhead that limit real-time deployment. To address this problem, we repurpose ray frontiers (R2F), a recently proposed frontier-based exploration paradigm, to develop an LLM-free framework for indoor open-vocabulary object navigation. While ray frontiers were originally used to bias exploration using semantic cues carried along rays, we reinterpret frontier regions as explicit, direction-conditioned semantic hypotheses that serve as navigation goals. Language-aligned features accumulated along out-of-range rays are stored sparsely at frontiers, where each region maintains multiple directional embeddings encoding plausible unseen content. In this way, navigation then reduces to embedding-based frontier scoring and goal tracking within a classical mapping and planning pipeline, eliminating iterative large-model reasoning. We further introduce R2F-VLN, a lightweight extension for free-form language instructions using syntactic parsing and relational verification without additional VLM or LLM components. Experiments in Habitat-sim and on a real robotic platform demonstrate competitive state-of-the-art zero-shot performance with real-time execution, achieving up to 6 times faster runtime than VLM-based alternatives.

标签

目标导航 机器人 视觉语言 零样本学习 实时系统

arXiv 分类

cs.RO cs.AI