Multimodal Learning 相关度: 8/10

LEO: Graph Attention Network based Hybrid Multi Sensor Extended Object Fusion and Tracking for Autonomous Driving Applications

Mayank Mayank, Bharanidhar Duraisamy, Florian Geiss
arXiv: 2604.02206v1 发布: 2026-04-02 更新: 2026-04-02

AI 摘要

LEO利用图注意力网络融合多传感器数据,实现动态目标的形状和轨迹估计。

主要贡献

  • 提出LEO:一个基于图注意力网络的时空模型,用于扩展对象的感知。
  • 融合多模态传感器数据,学习自适应融合权重,提升感知精度。
  • 通过平行四边形建模复杂几何形状,并验证了在真实数据集上的有效性。

方法论

使用时空图注意力网络,融合多传感器数据,学习自适应权重,并使用平行四边形表示对象形状。

原文摘要

Accurate shape and trajectory estimation of dynamic objects is essential for reliable automated driving. Classical Bayesian extended-object models offer theoretical robustness and efficiency but depend on completeness of a-priori and update-likelihood functions, while deep learning methods bring adaptability at the cost of dense annotations and high compute. We bridge these strengths with LEO (Learned Extension of Objects), a spatio-temporal Graph Attention Network that fuses multi-modal production-grade sensor tracks to learn adaptive fusion weights, ensure temporal consistency, and represent multi-scale shapes. Using a task-specific parallelogram ground-truth formulation, LEO models complex geometries (e.g. articulated trucks and trailers) and generalizes across sensor types, configurations, object classes, and regions, remaining robust for challenging and long-range targets. Evaluations on the Mercedes-Benz DRIVE PILOT SAE L3 dataset demonstrate real-time computational efficiency suitable for production systems; additional validation on public datasets such as View of Delft (VoD) further confirms cross-dataset generalization.

标签

自动驾驶 多传感器融合 图注意力网络 目标跟踪

arXiv 分类

cs.LG cs.AI