Stochastic Resetting Accelerates Policy Convergence in Reinforcement Learning
AI 摘要
随机重置能有效加速强化学习策略收敛,尤其在探索困难和奖励稀疏的环境中。
主要贡献
- 证明了随机重置能加速强化学习策略收敛
- 揭示了随机重置在强化学习中加速收敛的机制
- 提出了一个利用随机重置提升强化学习性能的简单方法
方法论
在表格型网格环境和连续控制任务中使用强化学习算法,并引入随机重置机制进行对比实验。
原文摘要
Stochastic resetting, where a dynamical process is intermittently returned to a fixed reference state, has emerged as a powerful mechanism for optimizing first-passage properties. Existing theory largely treats static, non-learning processes. Here we ask how stochastic resetting interacts with reinforcement learning, where the underlying dynamics adapt through experience. In tabular grid environments, we find that resetting accelerates policy convergence even when it does not reduce the search time of a purely diffusive agent, indicating a novel mechanism beyond classical first-passage optimization. In a continuous control task with neural-network-based value approximation, we show that random resetting improves deep reinforcement learning when exploration is difficult and rewards are sparse. Unlike temporal discounting, resetting preserves the optimal policy while accelerating convergence by truncating long, uninformative trajectories to enhance value propagation. Our results establish stochastic resetting as a simple, tunable mechanism for accelerating learning, translating a canonical phenomenon of statistical mechanics into an optimization principle for reinforcement learning.