Addressing Corpus Knowledge Poisoning Attacks on RAG Using Sparse Attention
AI 摘要
论文提出一种基于稀疏注意力机制的SDAG方法,用于防御RAG中的知识投毒攻击。
主要贡献
- 提出SDAG方法,防御RAG中的知识投毒攻击
- SDAG使用块稀疏注意力机制,限制文档间的交叉注意力
- 实验证明SDAG能有效降低攻击成功率,并能与现有防御方法结合
方法论
设计块稀疏注意力机制,阻止检索文档之间的交叉注意力。通过实验验证SDAG在多种攻击策略下的有效性,并与现有方法结合。
原文摘要
Retrieval Augmented Generation (RAG) is a highly effective paradigm for keeping LLM-based responses up-to-date and reducing the likelihood of hallucinations. Yet, RAG was recently shown to be quite vulnerable to corpus knowledge poisoning: an attacker injects misleading documents to the corpus to steer an LLMs' output to an undesired response. We argue that the standard causal attention mechanism in LLMs enables harmful cross-document interactions, specifically in cases of attacks. Accordingly, we introduce a novel defense approach for RAG: Sparse Document Attention RAG (SDAG). This is a block-sparse attention mechanism that disallows cross-attention between retrieved documents. SDAG requires a minimal inference-time change to the attention mask; furthermore, no fine-tuning or additional architectural changes are needed. We present an empirical evaluation of LLM-based question answering (QA) with a variety of attack strategies on RAG. We show that our SDAG method substantially outperforms the standard causal attention mechanism in terms of attack success rate. We further demonstrate the clear merits of integrating SDAG with state-of-the-art RAG defense methods. Specifically, the integration results in performance that is statistically significantly better than the state-of-the-art.