A Model-Free Universal AI
arXiv: 2602.23242v1
发布: 2026-02-26
更新: 2026-02-26
AI 摘要
提出了一种名为AIQI的无模型通用AI智能体,证明了其在通用强化学习中的渐近最优性。
主要贡献
- 提出了首个被证明在通用强化学习中渐近ε-最优的无模型智能体AIQI
- AIQI通过对分布式的动作值函数进行通用归纳,而非像以往工作那样对策略或环境建模
- 证明了AIQI的强渐近ε-最优性和渐近ε-贝叶斯最优性
方法论
使用Q-Induction,对分布式的动作值函数进行通用归纳,并证明其最优性。
原文摘要
In general reinforcement learning, all established optimal agents, including AIXI, are model-based, explicitly maintaining and using environment models. This paper introduces Universal AI with Q-Induction (AIQI), the first model-free agent proven to be asymptotically $\varepsilon$-optimal in general RL. AIQI performs universal induction over distributional action-value functions, instead of policies or environments like previous works. Under a grain of truth condition, we prove that AIQI is strong asymptotically $\varepsilon$-optimal and asymptotically $\varepsilon$-Bayes-optimal. Our results significantly expand the diversity of known universal agents.