Semantic-Guided 3D Gaussian Splatting for Transient Object Removal
AI 摘要
提出语义引导的3D高斯溅射方法,有效去除多视角重建中的瞬态物体,提升重建质量。
主要贡献
- 提出基于视觉-语言模型的语义过滤框架
- 利用CLIP相似度进行高斯 opacity 正则化和剪枝
- 解决了基于运动方法中的视差歧义问题
方法论
利用CLIP计算渲染视图与干扰文本prompt的相似度,根据阈值调整高斯 opacity并进行剪枝。
原文摘要
Transient objects in casual multi-view captures cause ghosting artifacts in 3D Gaussian Splatting (3DGS) reconstruction. Existing solutions relied on scene decomposition at significant memory cost or on motion-based heuristics that were vulnerable to parallax ambiguity. A semantic filtering framework was proposed for category-aware transient removal using vision-language models. CLIP similarity scores between rendered views and distractor text prompts were accumulated per-Gaussian across training iterations. Gaussians exceeding a calibrated threshold underwent opacity regularization and periodic pruning. Unlike motion-based approaches, semantic classification resolved parallax ambiguity by identifying object categories independently of motion patterns. Experiments on the RobustNeRF benchmark demonstrated consistent improvement in reconstruction quality over vanilla 3DGS across four sequences, while maintaining minimal memory overhead and real-time rendering performance. Threshold calibration and comparisons with baselines validated semantic guidance as a practical strategy for transient removal in scenarios with predictable distractor categories.