Multimodal Learning 相关度: 9/10

From Steering to Pedalling: Do Autonomous Driving VLMs Generalize to Cyclist-Assistive Spatial Perception and Planning?

Krishna Kanth Nakka, Vedasri Nakka

arXiv: 2602.10771v1 发布: 2026-02-11 更新: 2026-02-11

下载 PDF arXiv 页面

AI 摘要

论文提出了CyclingVQA基准测试，评估VLMs在自行车辅助空间感知和规划中的泛化能力。

主要贡献

提出了CyclingVQA基准测试，用于评估VLMs在自行车辅助场景下的性能
评估了31+个VLMs在CyclingVQA上的表现，揭示了现有模型的不足
分析了模型的错误模式，为开发更有效的自行车辅助系统提供了指导

方法论

设计了CyclingVQA基准，包含感知、时空理解和交通规则推理等任务。对比评估了不同VLMs在基准上的性能。

原文摘要

Cyclists often encounter safety-critical situations in urban traffic, highlighting the need for assistive systems that support safe and informed decision-making. Recently, vision-language models (VLMs) have demonstrated strong performance on autonomous driving benchmarks, suggesting their potential for general traffic understanding and navigation-related reasoning. However, existing evaluations are predominantly vehicle-centric and fail to assess perception and reasoning from a cyclist-centric viewpoint. To address this gap, we introduce CyclingVQA, a diagnostic benchmark designed to probe perception, spatio-temporal understanding, and traffic-rule-to-lane reasoning from a cyclist's perspective. Evaluating 31+ recent VLMs spanning general-purpose, spatially enhanced, and autonomous-driving-specialized models, we find that current models demonstrate encouraging capabilities, while also revealing clear areas for improvement in cyclist-centric perception and reasoning, particularly in interpreting cyclist-specific traffic cues and associating signs with the correct navigational lanes. Notably, several driving-specialized models underperform strong generalist VLMs, indicating limited transfer from vehicle-centric training to cyclist-assistive scenarios. Finally, through systematic error analysis, we identify recurring failure modes to guide the development of more effective cyclist-assistive intelligent systems.

arXiv 分类

cs.CV cs.RO

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类