Federated Distributional Reinforcement Learning with Distributional Critic Regularization
AI 摘要
提出联邦分布式强化学习框架,通过Wasserstein重心正则化避免信息平均导致的风险。
主要贡献
- 提出联邦分布式强化学习 (FedDistRL)
- 提出基于Wasserstein重心的trust region方法(TR-FedDistRL)
- 实验证明该方法能降低平均信息抹除,提高安全性
方法论
使用分位数价值函数critic进行联邦学习,并通过Wasserstein重心约束critic,避免信息在联邦过程中被平均掉。
原文摘要
Federated reinforcement learning typically aggregates value functions or policies by parameter averaging, which emphasizes expected return and can obscure statistical multimodality and tail behavior that matter in safety-critical settings. We formalize federated distributional reinforcement learning (FedDistRL), where clients parametrize quantile value function critics and federate these networks only. We also propose TR-FedDistRL, which builds a per client, risk-aware Wasserstein barycenter over a temporal buffer. This local barycenter provides a reference region to constrain the parameter averaged critic, ensuring necessary distributional information is not averaged out during the federation process. The distributional trust region is implemented as a shrink-squash step around this reference. Under fixed-policy evaluation, the feasibility map is nonexpansive and the update is contractive in a probe-set Wasserstein metric under evaluation. Experiments on a bandit, multi-agent gridworld, and continuous highway environment show reduced mean-smearing, improved safety proxies (catastrophe/accident rate), and lower critic/policy drift versus mean-oriented and non-federated baselines.