Multimodal Learning 相关度: 8/10

EvoDriveVLA: Evolving Autonomous Driving Vision-Language-Action Model via Collaborative Perception-Planning Distillation

Jiajun Cao, Xiaoan Zhang, Xiaobao Wei, Liyuqiu Huang, Wang Zijian, Hanzhen Zhang, Zhengyu Jia, Wei Mao, Hao Wang, Xianming Liu, Shuchang Zhou Liu, Yang Wang, Shanghang Zhang
arXiv: 2603.09465v1 发布: 2026-03-10 更新: 2026-03-10

AI 摘要

EvoDriveVLA通过协同感知-规划蒸馏,提升自动驾驶视觉-语言-动作模型的性能和稳定性。

主要贡献

  • 提出了一种协同感知-规划蒸馏框架EvoDriveVLA
  • 利用自锚定视觉蒸馏,通过轨迹引导的关键区域感知来正则化学生网络表示
  • 采用oracle引导的轨迹蒸馏,通过未来感知的oracle教师来生成高质量轨迹候选

方法论

通过自锚定视觉蒸馏和oracle引导轨迹蒸馏,在感知和规划两个层面进行协同优化,提升模型性能。

原文摘要

Vision-Language-Action models have shown great promise for autonomous driving, yet they suffer from degraded perception after unfreezing the visual encoder and struggle with accumulated instability in long-term planning. To address these challenges, we propose EvoDriveVLA-a novel collaborative perception-planning distillation framework that integrates self-anchored perceptual constraints and oracle-guided trajectory optimization. Specifically, self-anchored visual distillation leverages self-anchor teacher to deliver visual anchoring constraints, regularizing student representations via trajectory-guided key-region awareness. In parallel, oracle-guided trajectory distillation employs a future-aware oracle teacher with coarse-to-fine trajectory refinement and Monte Carlo dropout sampling to produce high-quality trajectory candidates, thereby selecting the optimal trajectory to guide the student's prediction. EvoDriveVLA achieves SOTA performance in open-loop evaluation and significantly enhances performance in closed-loop evaluation. Our code is available at: https://github.com/hey-cjj/EvoDriveVLA.

标签

autonomous driving vision-language-action model knowledge distillation

arXiv 分类

cs.CV cs.AI