Modeling Distinct Human Interaction in Web Agents
AI 摘要
该论文研究人机协作的Web Agent,通过建模人类干预提升Agent的实用性。
主要贡献
- 构建包含人类干预的Web导航数据集CowCorpus
- 识别用户与Agent交互的四种模式
- 训练预测人类干预的语言模型
方法论
收集数据集,分析交互模式,训练语言模型预测干预时机,并在用户研究中验证模型效果。
原文摘要
Despite rapid progress in autonomous web agents, human involvement remains essential for shaping preferences and correcting agent behavior as tasks unfold. However, current agentic systems lack a principled understanding of when and why humans intervene, often proceeding autonomously past critical decision points or requesting unnecessary confirmation. In this work, we introduce the task of modeling human intervention to support collaborative web task execution. We collect CowCorpus, a dataset of 400 real-user web navigation trajectories containing over 4,200 interleaved human and agent actions. We identify four distinct patterns of user interaction with agents -- hands-off supervision, hands-on oversight, collaborative task-solving, and full user takeover. Leveraging these insights, we train language models (LMs) to anticipate when users are likely to intervene based on their interaction styles, yielding a 61.4-63.4% improvement in intervention prediction accuracy over base LMs. Finally, we deploy these intervention-aware models in live web navigation agents and evaluate them in a user study, finding a 26.5% increase in user-rated agent usefulness. Together, our results show structured modeling of human intervention leads to more adaptive, collaborative agents.