On Computation and Reinforcement Learning
AI 摘要
研究计算资源对强化学习策略的影响,提出计算量可变的最小架构并验证其有效性。
主要贡献
- 形式化了计算量受限的策略
- 证明更多计算资源可解决更复杂的任务并泛化到更长周期任务
- 提出可使用可变计算量的最小架构
方法论
理论推导结合实验验证,在31个任务上对比了不同计算量下的策略性能和泛化能力。
原文摘要
How does the amount of compute available to a reinforcement learning (RL) policy affect its learning? Can policies using a fixed amount of parameters, still benefit from additional compute? The standard RL framework does not provide a language to answer these questions formally. Empirically, deep RL policies are often parameterized as neural networks with static architectures, conflating the amount of compute and the number of parameters. In this paper, we formalize compute bounded policies and prove that policies which use more compute can solve problems and generalize to longer-horizon tasks that are outside the scope of policies with less compute. Building on prior work in algorithmic learning and model-free planning, we propose a minimal architecture that can use a variable amount of compute. Our experiments complement our theory. On a set 31 different tasks spanning online and offline RL, we show that $(1)$ this architecture achieves stronger performance simply by using more compute, and $(2)$ stronger generalization on longer-horizon test tasks compared to standard feedforward networks or deep residual network using up to 5 times more parameters.