Evaluating Privilege Usage of Agents on Real-World Tools
AI 摘要
提出了GrantBox沙箱,用于评估LLM Agent在真实工具环境下的权限使用安全,发现存在高攻击成功率。
主要贡献
- 提出了GrantBox安全评估沙箱
- 评估了LLM Agent在真实工具环境下的权限使用
- 揭示了LLM Agent在面对复杂攻击时的安全漏洞
方法论
构建GrantBox,自动集成真实工具,允许LLM Agent调用真实权限,通过prompt injection攻击评估权限使用。
原文摘要
Equipping LLM agents with real-world tools can substantially improve productivity. However, granting agents autonomy over tool use also transfers the associated privileges to both the agent and the underlying LLM. Improper privilege usage may lead to serious consequences, including information leakage and infrastructure damage. While several benchmarks have been built to study agents' security, they often rely on pre-coded tools and restricted interaction patterns. Such crafted environments differ substantially from the real-world, making it hard to assess agents' security capabilities in critical privilege control and usage. Therefore, we propose GrantBox, a security evaluation sandbox for analyzing agent privilege usage. GrantBox automatically integrates real-world tools and allows LLM agents to invoke genuine privileges, enabling the evaluation of privilege usage under prompt injection attacks. Our results indicate that while LLMs exhibit basic security awareness and can block some direct attacks, they remain vulnerable to more sophisticated attacks, resulting in an average attack success rate of 84.80% in carefully crafted scenarios.