AI Agents 相关度: 7/10

Taming the Adversary: Stable Minimax Deep Deterministic Policy Gradient via Fractional Objectives

Taeho Lee, Donghwan Lee

arXiv: 2603.12110v1 发布: 2026-03-12 更新: 2026-03-12

下载 PDF arXiv 页面

AI 摘要

该论文提出了一种基于分数目标的MMDDPG算法，用于学习具有抗干扰能力的强化学习策略。

主要贡献

提出了MMDDPG框架
引入了分数目标函数以平衡性能和干扰
实验验证了算法在MuJoCo环境中的鲁棒性

方法论

将训练过程建模为用户策略和对抗干扰策略之间的minimax优化问题，使用分数目标稳定训练。

原文摘要

Reinforcement learning (RL) has achieved remarkable success in a wide range of control and decision-making tasks. However, RL agents often exhibit unstable or degraded performance when deployed in environments subject to unexpected external disturbances and model uncertainties. Consequently, ensuring reliable performance under such conditions remains a critical challenge. In this paper, we propose minimax deep deterministic policy gradient (MMDDPG), a framework for learning disturbance-resilient policies in continuous control tasks. The training process is formulated as a minimax optimization problem between a user policy and an adversarial disturbance policy. In this problem, the user learns a robust policy that minimizes the objective function, while the adversary generates disturbances that maximize it. To stabilize this interaction, we introduce a fractional objective that balances task performance and disturbance magnitude. This objective prevents excessively aggressive disturbances and promotes robust learning. Experimental evaluations in MuJoCo environments demonstrate that the proposed MMDDPG achieves significantly improved robustness against both external force perturbations and model parameter variations.

arXiv 分类

cs.LG cs.AI

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类