VPWEM: Non-Markovian Visuomotor Policy with Working and Episodic Memory
AI 摘要
VPWEM利用工作记忆和情景记忆,提升视觉运动策略在非马尔可夫任务中的表现。
主要贡献
- 提出VPWEM,一种具备工作记忆和情景记忆的非马尔可夫视觉运动策略
- 引入基于Transformer的上下文记忆压缩器,递归地将观测转化为情景记忆
- VPWEM在MIKASA和MoMaRT等记忆密集型任务中表现优于现有方法
方法论
VPWEM通过滑动窗口保留短期工作记忆,利用Transformer压缩器将历史观测转化为情景记忆,并联合训练策略。
原文摘要
Imitation learning from human demonstrations has achieved significant success in robotic control, yet most visuomotor policies still condition on single-step observations or short-context histories, making them struggle with non-Markovian tasks that require long-term memory. Simply enlarging the context window incurs substantial computational and memory costs and encourages overfitting to spurious correlations, leading to catastrophic failures under distribution shift and violating real-time constraints in robotic systems. By contrast, humans can compress important past experiences into long-term memories and exploit them to solve tasks throughout their lifetime. In this paper, we propose VPWEM, a non-Markovian visuomotor policy equipped with working and episodic memories. VPWEM retains a sliding window of recent observation tokens as short-term working memory, and introduces a Transformer-based contextual memory compressor that recursively converts out-of-window observations into a fixed number of episodic memory tokens. The compressor uses self-attention over a cache of past summary tokens and cross-attention over a cache of historical observations, and is trained jointly with the policy. We instantiate VPWEM on diffusion policies to exploit both short-term and episode-wide information for action generation with nearly constant memory and computation per step. Experiments demonstrate that VPWEM outperforms state-of-the-art baselines including diffusion policies and vision-language-action (VLA) models by more than 20% on the memory-intensive manipulation tasks in MIKASA and achieves an average 5% improvement on the mobile manipulation benchmark MoMaRT. Code is available at https://github.com/HarryLui98/code_vpwem.