LLM Memory & RAG 相关度: 9/10

ProGRank: Probe-Gradient Reranking to Defend Dense-Retriever RAG from Corpus Poisoning

Xiangyu Yin, Yi Qi, Chih-hong Cheng
arXiv: 2603.22934v1 发布: 2026-03-24 更新: 2026-03-24

AI 摘要

ProGRank通过扰动梯度分析重排序,有效防御RAG系统的语料库投毒攻击。

主要贡献

  • 提出ProGRank,一种无需训练的后处理防御方法
  • 利用梯度信息识别和降低恶意样本的排名
  • 在多个数据集和模型上验证了ProGRank的有效性和鲁棒性

方法论

ProGRank通过轻微扰动查询-文档对,提取梯度信息,计算稳定性和风险,并进行重排序。

原文摘要

Retrieval-Augmented Generation (RAG) improves the reliability of large language model applications by grounding generation in retrieved evidence, but it also introduces a new attack surface: corpus poisoning. In this setting, an adversary injects or edits passages so that they are ranked into the Top-$K$ results for target queries and then affect downstream generation. Existing defences against corpus poisoning often rely on content filtering, auxiliary models, or generator-side reasoning, which can make deployment more difficult. We propose ProGRank, a post hoc, training-free retriever-side defence for dense-retriever RAG. ProGRank stress-tests each query--passage pair under mild randomized perturbations and extracts probe gradients from a small fixed parameter subset of the retriever. From these signals, it derives two instability signals, representational consistency and dispersion risk, and combines them with a score gate in a reranking step. ProGRank preserves the original passage content, requires no retraining, and also supports a surrogate-based variant when the deployed retriever is unavailable. Extensive experiments across three datasets, three dense retriever backbones, representative corpus poisoning attacks, and both retrieval-stage and end-to-end settings show that ProGRank provides stronger defence performance and a favorable robustness--utility trade-off. It also remains competitive under adaptive evasive attacks.

标签

RAG 安全 语料库投毒 梯度分析 重排序

arXiv 分类

cs.AI