Architecting Secure AI Agents: Perspectives on System-Level Defenses Against Indirect Prompt Injection Attacks
AI 摘要
论文探讨了AI Agent抵抗间接Prompt注入攻击的系统级防御策略。
主要贡献
- 提出动态重规划和安全策略更新的必要性
- 强调在严格约束下使用LLM进行安全决策
- 重视个性化和人机交互在解决模糊问题中的作用
方法论
论文以Position Paper形式,阐述了作者对系统级防御间接Prompt注入攻击的观点和愿景,并讨论了现有基准的局限性。
原文摘要
AI agents, predominantly powered by large language models (LLMs), are vulnerable to indirect prompt injection, in which malicious instructions embedded in untrusted data can trigger dangerous agent actions. This position paper discusses our vision for system-level defenses against indirect prompt injection attacks. We articulate three positions: (1) dynamic replanning and security policy updates are often necessary for dynamic tasks and realistic environments; (2) certain context-dependent security decisions would still require LLMs (or other learned models), but should only be made within system designs that strictly constrain what the model can observe and decide; (3) in inherently ambiguous cases, personalization and human interaction should be treated as core design considerations. In addition to our main positions, we discuss limitations of existing benchmarks that can create a false sense of utility and security. We also highlight the value of system-level defenses, which serve as the skeleton of agentic systems by structuring and controlling agent behaviors, integrating rule-based and model-based security checks, and enabling more targeted research on model robustness and human interaction.