A Controlled Study of Double DQN and Dueling DQN Under Cross-Environment Transfer
AI 摘要
研究了DDQN和Dueling DQN在跨环境迁移学习中的表现差异,发现DDQN更稳定。
主要贡献
- 对比了DDQN和Dueling DQN在跨环境迁移学习中的表现
- 发现DDQN在迁移学习中表现更稳定,避免负迁移
- 揭示了架构归纳偏置与跨环境迁移鲁棒性的关联
方法论
使用CartPole作为源任务,LunarLander作为目标任务,固定层级迁移,对比DDQN和Dueling DQN的迁移效果。
原文摘要
Transfer learning in deep reinforcement learning is often motivated by improved stability and reduced training cost, but it can also fail under substantial domain shift. This paper presents a controlled empirical study examining how architectural differences between Double Deep Q-Networks (DDQN) and Dueling DQN influence transfer behavior across environments. Using CartPole as a source task and LunarLander as a structurally distinct target task, we evaluate a fixed layer-wise representation transfer protocol under identical hyperparameters and training conditions, with baseline agents trained from scratch used to contextualize transfer effects. Empirical results show that DDQN consistently avoids negative transfer under the examined setup and maintains learning dynamics comparable to baseline performance in the target environment. In contrast, Dueling DQN consistently exhibits negative transfer under identical conditions, characterized by degraded rewards and unstable optimization behavior. Statistical analysis across multiple random seeds confirms a significant performance gap under transfer. These findings suggest that architectural inductive bias is strongly associated with robustness to cross-environment transfer in value-based deep reinforcement learning under the examined transfer protocol.