AI Agents 相关度: 8/10

Federated Distributional Reinforcement Learning with Distributional Critic Regularization

David Millard, Cecilia Alm, Rashid Ali, Pengcheng Shi, Ali Baheri

arXiv: 2603.17820v1 发布: 2026-03-18 更新: 2026-03-18

下载 PDF arXiv 页面

AI 摘要

提出联邦分布式强化学习框架，通过Wasserstein重心正则化避免信息平均导致的风险。

主要贡献

提出联邦分布式强化学习 (FedDistRL)
提出基于Wasserstein重心的trust region方法(TR-FedDistRL)
实验证明该方法能降低平均信息抹除，提高安全性

方法论

使用分位数价值函数critic进行联邦学习，并通过Wasserstein重心约束critic，避免信息在联邦过程中被平均掉。

原文摘要

Federated reinforcement learning typically aggregates value functions or policies by parameter averaging, which emphasizes expected return and can obscure statistical multimodality and tail behavior that matter in safety-critical settings. We formalize federated distributional reinforcement learning (FedDistRL), where clients parametrize quantile value function critics and federate these networks only. We also propose TR-FedDistRL, which builds a per client, risk-aware Wasserstein barycenter over a temporal buffer. This local barycenter provides a reference region to constrain the parameter averaged critic, ensuring necessary distributional information is not averaged out during the federation process. The distributional trust region is implemented as a shrink-squash step around this reference. Under fixed-policy evaluation, the feasibility map is nonexpansive and the update is contractive in a probe-set Wasserstein metric under evaluation. Experiments on a bandit, multi-agent gridworld, and continuous highway environment show reduced mean-smearing, improved safety proxies (catastrophe/accident rate), and lower critic/policy drift versus mean-oriented and non-federated baselines.

arXiv 分类

cs.LG

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类