Evaluating Stochasticity in Deep Research Agents
AI 摘要
该论文研究了深度研究智能体(DRA)中的随机性问题,并提出了缓解策略。
主要贡献
- 形式化DRA中的随机性研究,将其建模为信息获取MDP
- 提出评估框架量化DRA系统中的随机性
- 识别并分析信息获取、压缩和推理三个来源的随机性
方法论
通过受控实验,研究各模块的随机性如何影响DRA输出方差,并提出结构化输出和集成查询生成等缓解策略。
原文摘要
Deep Research Agents (DRAs) are promising agentic systems that gather and synthesize information to support research across domains such as financial decision-making, medical analysis, and scientific discovery. Despite recent improvements in research quality (e.g., outcome accuracy when ground truth is available), DRA system design often overlooks a critical barrier to real-world deployment: stochasticity. Under identical queries, repeated executions of DRAs can exhibit substantial variability in terms of research outcome, findings, and citations. In this paper, we formalize the study of stochasticity in DRAs by modeling them as information acquisition Markov Decision Processes. We introduce an evaluation framework that quantifies variance in the system and identify three sources of it: information acquisition, information compression, and inference. Through controlled experiments, we investigate how stochasticity from these modules across different decision steps influences the variance of DRA outputs. Our results show that reducing stochasticity can improve research output quality, with inference and early-stage stochasticity contributing the most to DRA output variance. Based on these findings, we propose strategies for mitigating stochasticity while maintaining output quality via structured output and ensemble-based query generation. Our experiments on DeepSearchQA show that our proposed mitigation methods reduce average stochasticity by 22% while maintaining high research quality.