Understanding and Exploiting Weight Update Sparsity for Communication-Efficient Distributed RL
AI 摘要
论文提出PULSE方法,利用权重更新的稀疏性,显著降低分布式RL中的通信开销。
主要贡献
- 系统性地研究了RL权重更新的稀疏性
- 提出了PULSE方法,一种高效的无损权重同步方法
- 实验证明PULSE能大幅减少通信开销,提升分布式RL性能
方法论
通过实验分析权重更新的稀疏性,并据此设计了只传输更新索引和值的PULSE算法。
原文摘要
Reinforcement learning (RL) is a critical component for post-training large language models (LLMs). However, in bandwidth-constrained distributed RL, scalability is often bottlenecked by the synchronization of policy weights from trainers to inference workers, particularly over commodity networks or in decentralized settings. While recent studies suggest that RL updates modify only a small fraction of model parameters, these observations are typically based on coarse checkpoint differences. We present a systematic empirical study of weight-update sparsity at both step-level and multi-step granularities, examining its evolution across training dynamics, off-policy delay, and model scale. We find that update sparsity is consistently high, frequently exceeding 99% across practically relevant settings. Leveraging this structure, we propose PULSE (Patch Updates via Lossless Sparse Encoding), a simple yet highly efficient lossless weight synchronization method that transmits only the indices and values of modified parameters. PULSE is robust to transmission errors and avoids floating-point drift inherent in additive delta schemes. In bandwidth-constrained decentralized environments, our approach achieves over 100x (14 GB to ~108 MB) communication reduction while maintaining bit-identical training dynamics and performance compared to full weight synchronization. By exploiting this structure, PULSE enables decentralized RL training to approach centralized throughput, reducing the bandwidth required for weight synchronization from 20 Gbit/s to 0.2 Gbit/s to maintain high GPU utilization.