EmbeWebAgent: Embedding Web Agents into Any Customized UI
AI 摘要
EmbeWebAgent通过轻量级前端钩子和后端工作流,将智能体嵌入到Web UI中。
主要贡献
- 提出EmbeWebAgent框架,用于将智能体嵌入现有UI
- 使用轻量级前端钩子(ARIA, URL, function registry)
- 支持混合粒度动作(GUI primitives到高层复合动作)
方法论
通过WebSocket连接前端钩子和后端工作流,实现导航、操作和领域特定分析的编排。
原文摘要
Most web agents operate at the human interface level, observing screenshots or raw DOM trees without application-level access, which limits robustness and action expressiveness. In enterprise settings, however, explicit control of both the frontend and backend is available. We present EmbeWebAgent, a framework for embedding agents directly into existing UIs using lightweight frontend hooks (curated ARIA and URL-based observations, and a per-page function registry exposed via a WebSocket) and a reusable backend workflow that performs reasoning and takes actions. EmbeWebAgent is stack-agnostic (e.g., React or Angular), supports mixed-granularity actions ranging from GUI primitives to higher-level composites, and orchestrates navigation, manipulation, and domain-specific analytics via MCP tools. Our demo shows minimal retrofitting effort and robust multi-step behaviors grounded in a live UI setting. Live Demo: https://youtu.be/Cy06Ljee1JQ