EvoDriveVLA: Evolving Autonomous Driving Vision-Language-Action Model via Collaborative Perception-Planning Distillation
AI 摘要
EvoDriveVLA通过协同感知-规划蒸馏,提升自动驾驶视觉-语言-动作模型的性能和稳定性。
主要贡献
- 提出了一种协同感知-规划蒸馏框架EvoDriveVLA
- 利用自锚定视觉蒸馏,通过轨迹引导的关键区域感知来正则化学生网络表示
- 采用oracle引导的轨迹蒸馏,通过未来感知的oracle教师来生成高质量轨迹候选
方法论
通过自锚定视觉蒸馏和oracle引导轨迹蒸馏,在感知和规划两个层面进行协同优化,提升模型性能。
原文摘要
Vision-Language-Action models have shown great promise for autonomous driving, yet they suffer from degraded perception after unfreezing the visual encoder and struggle with accumulated instability in long-term planning. To address these challenges, we propose EvoDriveVLA-a novel collaborative perception-planning distillation framework that integrates self-anchored perceptual constraints and oracle-guided trajectory optimization. Specifically, self-anchored visual distillation leverages self-anchor teacher to deliver visual anchoring constraints, regularizing student representations via trajectory-guided key-region awareness. In parallel, oracle-guided trajectory distillation employs a future-aware oracle teacher with coarse-to-fine trajectory refinement and Monte Carlo dropout sampling to produce high-quality trajectory candidates, thereby selecting the optimal trajectory to guide the student's prediction. EvoDriveVLA achieves SOTA performance in open-loop evaluation and significantly enhances performance in closed-loop evaluation. Our code is available at: https://github.com/hey-cjj/EvoDriveVLA.