DreamerAD: Efficient Reinforcement Learning via Latent World Model for Autonomous Driving
AI 摘要
DreamerAD通过潜在世界模型加速自动驾驶强化学习,显著提升效率并保持视觉可解释性。
主要贡献
- 提出DreamerAD框架,加速扩散采样80倍
- 引入递归多分辨率步骤压缩的快捷方式强制
- 设计基于潜在表示的自回归密集奖励模型
方法论
利用视频生成模型的潜在特征,结合快捷方式强制、密集奖励模型和高斯词汇抽样,提升RL效率。
原文摘要
We introduce DreamerAD, the first latent world model framework that enables efficient reinforcement learning for autonomous driving by compressing diffusion sampling from 100 steps to 1 - achieving 80x speedup while maintaining visual interpretability. Training RL policies on real-world driving data incurs prohibitive costs and safety risks. While existing pixel-level diffusion world models enable safe imagination-based training, they suffer from multi-step diffusion inference latency (2s/frame) that prevents high-frequency RL interaction. Our approach leverages denoised latent features from video generation models through three key mechanisms: (1) shortcut forcing that reduces sampling complexity via recursive multi-resolution step compression, (2) an autoregressive dense reward model operating directly on latent representations for fine-grained credit assignment, and (3) Gaussian vocabulary sampling for GRPO that constrains exploration to physically plausible trajectories. DreamerAD achieves 87.7 EPDMS on NavSim v2, establishing state-of-the-art performance and demonstrating that latent-space RL is effective for autonomous driving.