AI Agents 相关度: 7/10

Taming the Adversary: Stable Minimax Deep Deterministic Policy Gradient via Fractional Objectives

Taeho Lee, Donghwan Lee
arXiv: 2603.12110v1 发布: 2026-03-12 更新: 2026-03-12

AI 摘要

该论文提出了一种基于分数目标的MMDDPG算法,用于学习具有抗干扰能力的强化学习策略。

主要贡献

  • 提出了MMDDPG框架
  • 引入了分数目标函数以平衡性能和干扰
  • 实验验证了算法在MuJoCo环境中的鲁棒性

方法论

将训练过程建模为用户策略和对抗干扰策略之间的minimax优化问题,使用分数目标稳定训练。

原文摘要

Reinforcement learning (RL) has achieved remarkable success in a wide range of control and decision-making tasks. However, RL agents often exhibit unstable or degraded performance when deployed in environments subject to unexpected external disturbances and model uncertainties. Consequently, ensuring reliable performance under such conditions remains a critical challenge. In this paper, we propose minimax deep deterministic policy gradient (MMDDPG), a framework for learning disturbance-resilient policies in continuous control tasks. The training process is formulated as a minimax optimization problem between a user policy and an adversarial disturbance policy. In this problem, the user learns a robust policy that minimizes the objective function, while the adversary generates disturbances that maximize it. To stabilize this interaction, we introduce a fractional objective that balances task performance and disturbance magnitude. This objective prevents excessively aggressive disturbances and promotes robust learning. Experimental evaluations in MuJoCo environments demonstrate that the proposed MMDDPG achieves significantly improved robustness against both external force perturbations and model parameter variations.

标签

强化学习 鲁棒控制 对抗训练 深度确定性策略梯度 minimax优化

arXiv 分类

cs.LG cs.AI