AI Agents 相关度: 9/10

Proactive Agent Research Environment: Simulating Active Users to Evaluate Proactive Assistants

Deepak Nathani, Cheng Zhang, Chang Huan, Jiaming Shan, Yinfei Yang, Alkesh Patel, Zhe Gan, William Yang Wang, Michael Saxon, Xin Eric Wang
arXiv: 2604.00842v1 发布: 2026-04-01 更新: 2026-04-01

AI 摘要

提出了Pare框架,模拟用户与智能体交互,评估智能体的规划、推理和多应用协同能力。

主要贡献

  • 构建了Proactive Agent Research Environment (Pare) 框架
  • 提出了基于有限状态机的用户模拟器,模拟用户在数字环境中的交互
  • 建立了包含143个任务的Pare-Bench基准测试集,用于评估主动智能体

方法论

构建基于有限状态机的用户模拟器,并设计包含多种任务的基准测试集进行评估。

原文摘要

Proactive agents that anticipate user needs and autonomously execute tasks hold great promise as digital assistants, yet the lack of realistic user simulation frameworks hinders their development. Existing approaches model apps as flat tool-calling APIs, failing to capture the stateful and sequential nature of user interaction in digital environments and making realistic user simulation infeasible. We introduce Proactive Agent Research Environment (Pare), a framework for building and evaluating proactive agents in digital environments. Pare models applications as finite state machines with stateful navigation and state-dependent action space for the user simulator, enabling active user simulation. Building on this foundation, we present Pare-Bench, a benchmark of 143 diverse tasks spanning communication, productivity, scheduling, and lifestyle apps, designed to test context observation, goal inference, intervention timing, and multi-app orchestration.

标签

AI Agent User Simulation Benchmark Proactive Agent

arXiv 分类

cs.AI cs.LG cs.MA