AI Agents 相关度: 8/10

Federated Distributional Reinforcement Learning with Distributional Critic Regularization

David Millard, Cecilia Alm, Rashid Ali, Pengcheng Shi, Ali Baheri
arXiv: 2603.17820v1 发布: 2026-03-18 更新: 2026-03-18

AI 摘要

提出联邦分布式强化学习框架,通过Wasserstein重心正则化避免信息平均导致的风险。

主要贡献

  • 提出联邦分布式强化学习 (FedDistRL)
  • 提出基于Wasserstein重心的trust region方法(TR-FedDistRL)
  • 实验证明该方法能降低平均信息抹除,提高安全性

方法论

使用分位数价值函数critic进行联邦学习,并通过Wasserstein重心约束critic,避免信息在联邦过程中被平均掉。

原文摘要

Federated reinforcement learning typically aggregates value functions or policies by parameter averaging, which emphasizes expected return and can obscure statistical multimodality and tail behavior that matter in safety-critical settings. We formalize federated distributional reinforcement learning (FedDistRL), where clients parametrize quantile value function critics and federate these networks only. We also propose TR-FedDistRL, which builds a per client, risk-aware Wasserstein barycenter over a temporal buffer. This local barycenter provides a reference region to constrain the parameter averaged critic, ensuring necessary distributional information is not averaged out during the federation process. The distributional trust region is implemented as a shrink-squash step around this reference. Under fixed-policy evaluation, the feasibility map is nonexpansive and the update is contractive in a probe-set Wasserstein metric under evaluation. Experiments on a bandit, multi-agent gridworld, and continuous highway environment show reduced mean-smearing, improved safety proxies (catastrophe/accident rate), and lower critic/policy drift versus mean-oriented and non-federated baselines.

标签

联邦学习 分布式强化学习 风险感知 Wasserstein距离

arXiv 分类

cs.LG