Multimodal Learning 相关度: 8/10

ReLaGS: Relational Language Gaussian Splatting

Yaxu Xie, Abdalla Arafa, Alireza Javanmardi, Christen Millerdurai, Jia Cheng Hu, Shaoxiang Wang, Alain Pagani, Didier Stricker

arXiv: 2603.17605v1 发布: 2026-03-18 更新: 2026-03-18

下载 PDF arXiv 页面

AI 摘要

ReLaGS构建了分层语言蒸馏高斯场景和3D语义场景图，用于开放词汇3D感知和推理。

主要贡献

提出了一种无需场景特定训练的3D场景构建框架
引入了高斯裁剪机制和多视角语言对齐策略
构建了基于视觉语言的开放词汇3D场景图，并结合GNN进行关系推理

方法论

通过高斯裁剪优化几何，多视角语言对齐融合特征，构建分层场景图，使用GNN进行关系推理。

原文摘要

Achieving unified 3D perception and reasoning across tasks such as segmentation, retrieval, and relation understanding remains challenging, as existing methods are either object-centric or rely on costly training for inter-object reasoning. We present a novel framework that constructs a hierarchical language-distilled Gaussian scene and its 3D semantic scene graph without scene-specific training. A Gaussian pruning mechanism refines scene geometry, while a robust multi-view language alignment strategy aggregates noisy 2D features into accurate 3D object embeddings. On top of this hierarchy, we build an open-vocabulary 3D scene graph with Vision Language derived annotations and Graph Neural Network-based relational reasoning. Our approach enables efficient and scalable open-vocabulary 3D reasoning by jointly modeling hierarchical semantics and inter/intra-object relationships, validated across tasks including open-vocabulary segmentation, scene graph generation, and relation-guided retrieval. Project page: https://dfki-av.github.io/ReLaGS/

arXiv 分类

cs.CV

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类