Agent Tuning & Optimization 相关度: 8/10

Robustness of Agentic AI Systems via Adversarially-Aligned Jacobian Regularization

Furkan Mumcu, Yasin Yilmaz
arXiv: 2603.04378v1 发布: 2026-03-04 更新: 2026-03-04

AI 摘要

论文提出AAJR方法,通过对抗对齐的雅可比正则化提升Agentic AI系统的鲁棒性和稳定性。

主要贡献

  • 提出Adversarially-Aligned Jacobian Regularization (AAJR)方法
  • 证明AAJR比全局约束产生更大的可接受策略类
  • 推导出AAJR控制优化轨迹平滑性和确保内循环稳定性的步长条件

方法论

通过在对抗梯度方向上控制雅可比矩阵的灵敏度,实现对Agent策略的正则化,并推导出相应的理论保证。

原文摘要

As Large Language Models (LLMs) transition into autonomous multi-agent ecosystems, robust minimax training becomes essential yet remains prone to instability when highly non-linear policies induce extreme local curvature in the inner maximization. Standard remedies that enforce global Jacobian bounds are overly conservative, suppressing sensitivity in all directions and inducing a large Price of Robustness. We introduce Adversarially-Aligned Jacobian Regularization (AAJR), a trajectory-aligned approach that controls sensitivity strictly along adversarial ascent directions. We prove that AAJR yields a strictly larger admissible policy class than global constraints under mild conditions, implying a weakly smaller approximation gap and reduced nominal performance degradation. Furthermore, we derive step-size conditions under which AAJR controls effective smoothness along optimization trajectories and ensures inner-loop stability. These results provide a structural theory for agentic robustness that decouples minimax stability from global expressivity restrictions.

标签

AI Agents Robustness Jacobian Regularization Adversarial Training

arXiv 分类

cs.LG cs.AI cs.CR cs.MA