Multimodal Learning 相关度: 8/10

LEO: Graph Attention Network based Hybrid Multi Sensor Extended Object Fusion and Tracking for Autonomous Driving Applications

Mayank Mayank, Bharanidhar Duraisamy, Florian Geiss

arXiv: 2604.02206v1 发布: 2026-04-02 更新: 2026-04-02

下载 PDF arXiv 页面

AI 摘要

LEO利用图注意力网络融合多传感器数据，实现动态目标的形状和轨迹估计。

主要贡献

提出LEO：一个基于图注意力网络的时空模型，用于扩展对象的感知。
融合多模态传感器数据，学习自适应融合权重，提升感知精度。
通过平行四边形建模复杂几何形状，并验证了在真实数据集上的有效性。

方法论

使用时空图注意力网络，融合多传感器数据，学习自适应权重，并使用平行四边形表示对象形状。

原文摘要

Accurate shape and trajectory estimation of dynamic objects is essential for reliable automated driving. Classical Bayesian extended-object models offer theoretical robustness and efficiency but depend on completeness of a-priori and update-likelihood functions, while deep learning methods bring adaptability at the cost of dense annotations and high compute. We bridge these strengths with LEO (Learned Extension of Objects), a spatio-temporal Graph Attention Network that fuses multi-modal production-grade sensor tracks to learn adaptive fusion weights, ensure temporal consistency, and represent multi-scale shapes. Using a task-specific parallelogram ground-truth formulation, LEO models complex geometries (e.g. articulated trucks and trailers) and generalizes across sensor types, configurations, object classes, and regions, remaining robust for challenging and long-range targets. Evaluations on the Mercedes-Benz DRIVE PILOT SAE L3 dataset demonstrate real-time computational efficiency suitable for production systems; additional validation on public datasets such as View of Delft (VoD) further confirms cross-dataset generalization.

arXiv 分类

cs.LG cs.AI

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类