AI Agents 相关度: 9/10

Nonstandard Errors in AI Agents

Ruijiang Gao, Steven Chong Xiao
arXiv: 2603.16744v1 发布: 2026-03-17 更新: 2026-03-17

AI 摘要

AI编码智能体在相同任务下产生显著差异,源于分析选择的不同,且模仿学习是主要收敛方式。

主要贡献

  • 发现AI智能体存在“非标准误差”(NSEs)
  • 揭示不同模型家族具有稳定的“经验风格”
  • 验证了AI同行评审对结果分散影响较小,而范例学习能有效提高结果一致性

方法论

部署150个Claude Code智能体独立测试市场质量假设,通过反馈协议研究不同因素对结果一致性的影响。

原文摘要

We study whether state-of-the-art AI coding agents, given the same data and research question, produce the same empirical results. Deploying 150 autonomous Claude Code agents to independently test six hypotheses about market quality trends in NYSE TAQ data for SPY (2015--2024), we find that AI agents exhibit sizable \textit{nonstandard errors} (NSEs), that is, uncertainty from agent-to-agent variation in analytical choices, analogous to those documented among human researchers. AI agents diverge substantially on measure choice (e.g., autocorrelation vs.\ variance ratio, dollar vs.\ share volume). Different model families (Sonnet 4.6 vs.\ Opus 4.6) exhibit stable ``empirical styles,'' reflecting systematic differences in methodological preferences. In a three-stage feedback protocol, AI peer review (written critiques) has minimal effect on dispersion, whereas exposure to top-rated exemplar papers reduces the interquartile range of estimates by 80--99\% within \textit{converging} measure families. Convergence occurs both through within-family estimation tightening and through agents switching measure families entirely, but convergence reflects imitation rather than understanding. These findings have implications for the growing use of AI in automated policy evaluation and empirical research.

标签

AI Agents Reproducibility Empirical Analysis Nonstandard Errors

arXiv 分类

cs.AI cs.SI