Multimodal Learning 相关度: 9/10

MindDriver: Introducing Progressive Multimodal Reasoning for Autonomous Driving

Lingjun Zhang, Yujian Yuan, Changjie Wu, Xinyuan Chang, Xin Cai, Shuang Zeng, Linzhe Shi, Sijin Wang, Hang Zhang, Mu Xu

arXiv: 2602.21952v1 发布: 2026-02-25 更新: 2026-02-25

下载 PDF arXiv 页面

AI 摘要

MindDriver通过渐进式多模态推理，提升VLM在自动驾驶中的规划能力，并提出数据标注和强化微调方法。

主要贡献

提出了渐进式多模态推理框架MindDriver
开发了反馈引导的自动数据标注流程
设计了渐进式强化微调方法

方法论

MindDriver模拟人类思维，进行语义理解、语义到物理空间想象和物理空间轨迹规划，并通过强化学习优化对齐。

原文摘要

Vision-Language Models (VLM) exhibit strong reasoning capabilities, showing promise for end-to-end autonomous driving systems. Chain-of-Thought (CoT), as VLM's widely used reasoning strategy, is facing critical challenges. Existing textual CoT has a large gap between text semantic space and trajectory physical space. Although the recent approach utilizes future image to replace text as CoT process, it lacks clear planning-oriented objective guidance to generate images with accurate scene evolution. To address these, we innovatively propose MindDriver, a progressive multimodal reasoning framework that enables VLM to imitate human-like progressive thinking for autonomous driving. MindDriver presents semantic understanding, semantic-to-physical space imagination, and physical-space trajectory planning. To achieve aligned reasoning processes in MindDriver, we develop a feedback-guided automatic data annotation pipeline to generate aligned multimodal reasoning training data. Furthermore, we develop a progressive reinforcement fine-tuning method to optimize the alignment through progressive high- level reward-based learning. MindDriver demonstrates superior performance in both nuScences open-loop and Bench2Drive closed-loop evaluation. Codes are available at https://github.com/hotdogcheesewhite/MindDriver.

arXiv 分类

cs.CV

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类