Contextualized Privacy Defense for LLM Agents
AI 摘要
提出了一种基于上下文感知的隐私保护框架CDI,通过强化学习优化指导模型,提升LLM Agent的隐私安全。
主要贡献
- 提出了Contextualized Defense Instructing (CDI) 框架
- 将隐私保护问题转化为强化学习优化问题
- 构建了统一的模拟框架评估隐私保护和有用性的平衡
方法论
利用强化学习训练指导模型,在Agent执行过程中生成上下文相关的隐私保护指导,主动塑造Agent行为。
原文摘要
LLM agents increasingly act on users' personal information, yet existing privacy defenses remain limited in both design and adaptability. Most prior approaches rely on static or passive defenses, such as prompting and guarding. These paradigms are insufficient for supporting contextual, proactive privacy decisions in multi-step agent execution. We propose Contextualized Defense Instructing (CDI), a new privacy defense paradigm in which an instructor model generates step-specific, context-aware privacy guidance during execution, proactively shaping actions rather than merely constraining or vetoing them. Crucially, CDI is paired with an experience-driven optimization framework that trains the instructor via reinforcement learning (RL), where we convert failure trajectories with privacy violations into learning environments. We formalize baseline defenses and CDI as distinct intervention points in a canonical agent loop, and compare their privacy-helpfulness trade-offs within a unified simulation framework. Results show that our CDI consistently achieves a better balance between privacy preservation (94.2%) and helpfulness (80.6%) than baselines, with superior robustness to adversarial conditions and generalization.