AI Agents 相关度: 9/10

GSMem: 3D Gaussian Splatting as Persistent Spatial Memory for Zero-Shot Embodied Exploration and Reasoning

Yiren Lu, Yi Du, Disheng Liu, Yunlai Zhou, Chen Wang, Yu Yin
arXiv: 2603.19137v1 发布: 2026-03-19 更新: 2026-03-19

AI 摘要

GSMem利用3DGS构建持久空间记忆,实现零样本具身探索和推理。

主要贡献

  • 提出GSMem框架,基于3DGS构建空间记忆
  • 引入Spatial Recollection,实现新视角的逼真渲染
  • 结合场景图和语言字段的检索机制
  • 结合VLM语义评分和3DGS覆盖目标的混合探索策略

方法论

GSMem使用3DGS作为空间记忆,结合场景图和语言字段进行检索,并使用混合探索策略。

原文摘要

Effective embodied exploration requires agents to accumulate and retain spatial knowledge over time. However, existing scene representations, such as discrete scene graphs or static view-based snapshots, lack \textit{post-hoc re-observability}. If an initial observation misses a target, the resulting memory omission is often irrecoverable. To bridge this gap, we propose \textbf{GSMem}, a zero-shot embodied exploration and reasoning framework built upon 3D Gaussian Splatting (3DGS). By explicitly parameterizing continuous geometry and dense appearance, 3DGS serves as a persistent spatial memory that endows the agent with \textit{Spatial Recollection}: the ability to render photorealistic novel views from optimal, previously unoccupied viewpoints. To operationalize this, GSMem employs a retrieval mechanism that simultaneously leverages parallel object-level scene graphs and semantic-level language fields. This complementary design robustly localizes target regions, enabling the agent to ``hallucinate'' optimal views for high-fidelity Vision-Language Model (VLM) reasoning. Furthermore, we introduce a hybrid exploration strategy that combines VLM-driven semantic scoring with a 3DGS-based coverage objective, balancing task-aware exploration with geometric coverage. Extensive experiments on embodied question answering and lifelong navigation demonstrate the robustness and effectiveness of our framework

标签

3D Gaussian Splatting Embodied Exploration Vision-Language Model Spatial Memory Reasoning

arXiv 分类

cs.CV cs.RO