Multimodal Learning 相关度: 7/10

EKF-Based Depth Camera and Deep Learning Fusion for UAV-Person Distance Estimation and Following in SAR Operations

Luka Šiktar, Branimir Ćaran, Bojan Šekoranja, Marko Švaco
arXiv: 2602.20958v1 发布: 2026-02-24 更新: 2026-02-24

AI 摘要

论文提出一种基于EKF融合深度相机和深度学习的无人机人员距离估计和跟随系统。

主要贡献

  • 融合深度相机和单目相机信息进行人员距离估计
  • 使用YOLO-pose进行深度学习滤波和相机-人体距离估计
  • 基于EKF算法进行实时深度信息融合

方法论

使用YOLO-pose进行人员关键点检测,通过EKF融合深度相机测量值和单目视觉估计距离,实现精确跟踪和跟随。

原文摘要

Search and rescue (SAR) operations require rapid responses to save lives or property. Unmanned Aerial Vehicles (UAVs) equipped with vision-based systems support these missions through prior terrain investigation or real-time assistance during the mission itself. Vision-based UAV frameworks aid human search tasks by detecting and recognizing specific individuals, then tracking and following them while maintaining a safe distance. A key safety requirement for UAV following is the accurate estimation of the distance between camera and target object under real-world conditions, achieved by fusing multiple image modalities. UAVs with deep learning-based vision systems offer a new approach to the planning and execution of SAR operations. As part of the system for automatic people detection and face recognition using deep learning, in this paper we present the fusion of depth camera measurements and monocular camera-to-body distance estimation for robust tracking and following. Deep learning-based filtering of depth camera data and estimation of camera-to-body distance from a monocular camera are achieved with YOLO-pose, enabling real-time fusion of depth information using the Extended Kalman Filter (EKF) algorithm. The proposed subsystem, designed for use in drones, estimates and measures the distance between the depth camera and the human body keypoints, to maintain the safe distance between the drone and the human target. Our system provides an accurate estimated distance, which has been validated against motion capture ground truth data. The system has been tested in real time indoors, where it reduces the average errors, root mean square error (RMSE) and standard deviations of distance estimation up to 15,3\% in three tested scenarios.

标签

UAV 深度学习 扩展卡尔曼滤波 目标跟踪 目标检测

arXiv 分类

cs.RO cs.AI