AI Agents 相关度: 9/10

Rationality Measurement and Theory for Reinforcement Learning Agents

Kejiang Qian, Amos Storkey, Fengxiang He
arXiv: 2602.04737v1 发布: 2026-02-04 更新: 2026-02-04

AI 摘要

该论文提出了一套评估强化学习智能体理性的指标和理论框架,并分析了影响理性行为的因素。

主要贡献

  • 提出了理性风险和理性风险差距的定义
  • 将理性风险差距分解为环境偏移和算法泛化性两部分
  • 通过理论分析和实验验证,探讨了正则化方法和环境偏移对智能体理性的影响

方法论

定义理性指标,理论推导风险上界,并结合实验验证理论假设,分析不同因素对智能体理性的影响。

原文摘要

This paper proposes a suite of rationality measures and associated theory for reinforcement learning agents, a property increasingly critical yet rarely explored. We define an action in deployment to be perfectly rational if it maximises the hidden true value function in the steepest direction. The expected value discrepancy of a policy's actions against their rational counterparts, culminating over the trajectory in deployment, is defined to be expected rational risk; an empirical average version in training is also defined. Their difference, termed as rational risk gap, is decomposed into (1) an extrinsic component caused by environment shifts between training and deployment, and (2) an intrinsic one due to the algorithm's generalisability in a dynamic environment. They are upper bounded by, respectively, (1) the $1$-Wasserstein distance between transition kernels and initial state distributions in training and deployment, and (2) the empirical Rademacher complexity of the value function class. Our theory suggests hypotheses on the benefits from regularisers (including layer normalisation, $\ell_2$ regularisation, and weight normalisation) and domain randomisation, as well as the harm from environment shifts. Experiments are in full agreement with these hypotheses. The code is available at https://github.com/EVIEHub/Rationality.

标签

强化学习 理性 泛化性 环境偏移 正则化

arXiv 分类

cs.LG