InfoFlow KV: Information-Flow-Aware KV Recomputation for Long Context
AI 摘要
论文提出一种基于信息流的KV缓存重计算方法,优化长文本检索增强生成。
主要贡献
- 提出基于信息流的KV缓存选择方法
- 利用注意力范数可靠识别关键token
- 提出信息流引导的chunk重排序策略
方法论
将KV重计算视为信息流问题,使用query的注意力范数识别token,并重构位置编码和chunk顺序。
原文摘要
Retrieval-augmented generation (RAG) for long-context question answering is bottlenecked by inference-time prefilling over large retrieved contexts. A common strategy is to precompute key-value (KV) caches for individual documents and selectively recompute a small subset of tokens to restore global causal dependencies, but existing methods rely on heuristics or representation discrepancies without modeling whether selected tokens can effectively influence generation. We cast selective KV recomputation as an information flow problem and show that a simple attention-norm signal from the query reliably identifies tokens that are both semantically relevant and structurally positioned to propagate information, when computed under an inference-consistent RoPE geometry. We therefore reconstruct global positional assignments for retrieved chunks and introduce an information-flow-guided chunk reordering strategy. Experiments on LLM and VLM benchmarks demonstrate consistent gains over prior methods under comparable efficiency budgets.