Contextual Latent World Models for Offline Meta Reinforcement Learning
AI 摘要
提出上下文潜在世界模型,通过任务条件的时序一致性学习更具表达性的任务表征。
主要贡献
- 提出上下文潜在世界模型(Contextual Latent World Models)
- 联合训练任务表征编码器和潜在世界模型
- 在多个离线元强化学习基准测试中验证了方法的有效性
方法论
将潜在世界模型与任务表征相结合,利用任务条件的时序一致性学习任务表征,并在离线数据上进行训练。
原文摘要
Offline meta-reinforcement learning seeks to learn policies that generalize across related tasks from fixed datasets. Context-based methods infer a task representation from transition histories, but learning effective task representations without supervision remains a challenge. In parallel, latent world models have demonstrated strong self-supervised representation learning through temporal consistency. We introduce contextual latent world models, which condition latent world models on inferred task representations and train them jointly with the context encoder. This enforces task-conditioned temporal consistency, yielding task representations that capture task-dependent dynamics rather than merely discriminating between tasks. Our method learns more expressive task representations and significantly improves generalization to unseen tasks across MuJoCo, Contextual-DeepMind Control, and Meta-World benchmarks.