Multimodal Learning 相关度: 8/10

ReLaGS: Relational Language Gaussian Splatting

Yaxu Xie, Abdalla Arafa, Alireza Javanmardi, Christen Millerdurai, Jia Cheng Hu, Shaoxiang Wang, Alain Pagani, Didier Stricker
arXiv: 2603.17605v1 发布: 2026-03-18 更新: 2026-03-18

AI 摘要

ReLaGS构建了分层语言蒸馏高斯场景和3D语义场景图,用于开放词汇3D感知和推理。

主要贡献

  • 提出了一种无需场景特定训练的3D场景构建框架
  • 引入了高斯裁剪机制和多视角语言对齐策略
  • 构建了基于视觉语言的开放词汇3D场景图,并结合GNN进行关系推理

方法论

通过高斯裁剪优化几何,多视角语言对齐融合特征,构建分层场景图,使用GNN进行关系推理。

原文摘要

Achieving unified 3D perception and reasoning across tasks such as segmentation, retrieval, and relation understanding remains challenging, as existing methods are either object-centric or rely on costly training for inter-object reasoning. We present a novel framework that constructs a hierarchical language-distilled Gaussian scene and its 3D semantic scene graph without scene-specific training. A Gaussian pruning mechanism refines scene geometry, while a robust multi-view language alignment strategy aggregates noisy 2D features into accurate 3D object embeddings. On top of this hierarchy, we build an open-vocabulary 3D scene graph with Vision Language derived annotations and Graph Neural Network-based relational reasoning. Our approach enables efficient and scalable open-vocabulary 3D reasoning by jointly modeling hierarchical semantics and inter/intra-object relationships, validated across tasks including open-vocabulary segmentation, scene graph generation, and relation-guided retrieval. Project page: https://dfki-av.github.io/ReLaGS/

标签

3D Perception Scene Graph Gaussian Splatting Vision Language Reasoning

arXiv 分类

cs.CV