Multimodal Learning 相关度: 8/10

WildDepth: A Multimodal Dataset for 3D Wildlife Perception and Depth Estimation

Muhammad Aamir, Naoya Muramatsu, Sangyun Shin, Matthew Wijers, Jiaxing Jhong, Xinyu Hou, Amir Patel, Andrew Markham

arXiv: 2603.16816v1 发布: 2026-03-17 更新: 2026-03-17

下载 PDF arXiv 页面

AI 摘要

WildDepth是一个用于动物3D感知和深度估计的多模态数据集，包含RGB和LiDAR数据。

主要贡献

构建了大规模的动物RGB-LiDAR同步数据集WildDepth
提出了基于多模态数据融合的深度估计和3D重建方法
验证了多模态数据在提升深度估计和3D重建精度上的有效性

方法论

通过同步RGB和LiDAR数据，构建数据集，并利用多模态数据融合进行深度估计和3D重建。

原文摘要

Depth estimation and 3D reconstruction have been extensively studied as core topics in computer vision. Starting from rigid objects with relatively simple geometric shapes, such as vehicles, the research has expanded to address general objects, including challenging deformable objects, such as humans and animals. However, for the animal, in particular, the majority of existing models are trained based on datasets without metric scale, which can help validate image-only models. To address this limitation, we present WildDepth, a multimodal dataset and benchmark suite for depth estimation, behavior detection, and 3D reconstruction from diverse categories of animals ranging from domestic to wild environments with synchronized RGB and LiDAR. Experimental results show that the use of multi-modal data improves depth reliability by up to 10% RMSE, while RGB-LiDAR fusion enhances 3D reconstruction fidelity by 12% in Chamfer distance. By releasing WildDepth and its benchmarks, we aim to foster robust multimodal perception systems that generalize across domains.

arXiv 分类

cs.CV cs.DL

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类