AI Agents 相关度: 7/10

RCTs & Human Uplift Studies: Methodological Challenges and Practical Solutions for Frontier AI Evaluation

Patricia Paskov, Kevin Wei, Shen Zhou Hong, Dan Bateyko, Xavier Roberts-Gaal, Carson Ezell, Gailius Praninskas, Valerie Chen, Umang Bhatt, Ella Guest
arXiv: 2603.11001v1 发布: 2026-03-11 更新: 2026-03-11

AI 摘要

分析了前沿AI人效提升研究的RCT方法挑战,并提出了实用解决方案。

主要贡献

  • 指出了前沿AI人效提升研究中RCT方法的局限性
  • 总结了快速演进的AI系统、基线变化等因素对研究有效性的影响
  • 提出了应对这些挑战的实践性解决方案

方法论

通过访谈16位专家,分析其在生物安全、网络安全等领域进行人效提升研究的经验。

原文摘要

Human uplift studies - or studies that measure AI effects on human performance relative to a status quo, typically using randomized controlled trial (RCT) methodology - are increasingly used to inform deployment, governance, and safety decisions for frontier AI systems. While the methods underlying these studies are well-established, their interaction with the distinctive properties of frontier AI systems remains underexamined, particularly when results are used to inform high-stakes decisions. We present findings from interviews with 16 expert practitioners with experience conducting human uplift studies in domains including biosecurity, cybersecurity, education, and labor. Across interviews, experts described a recurring tension between standard causal inference assumptions and the object of study itself. Rapidly evolving AI systems, shifting baselines, heterogeneous and changing user proficiency, and porous real-world settings strain assumptions underlying internal, external, and construct validity, complicating the interpretation and appropriate use of uplift evidence. We synthesize these challenges across key stages of the human uplift research lifecycle and map them to practitioner-reported solutions, clarifying both the limits and the appropriate uses of evidence from human uplift studies in high-stakes decision-making.

标签

AI评估 RCT 人效提升 前沿AI

arXiv 分类

cs.CY cs.AI