IndexRAG: Bridging Facts for Cross-Document Reasoning at Index Time
AI 摘要
IndexRAG通过离线构建桥接事实,提升跨文档推理的检索增强生成效果,无需额外训练。
主要贡献
- 提出IndexRAG,一种新型跨文档推理的检索增强生成方法
- 将跨文档推理从在线推断转移到离线索引
- 通过桥接实体生成桥接事实作为可检索单元
方法论
IndexRAG识别文档间共享的桥接实体,生成桥接事实,并将其作为独立的检索单元,在检索时进行匹配。
原文摘要
Multi-hop question answering (QA) requires reasoning across multiple documents, yet existing retrieval-augmented generation (RAG) approaches address this either through graph-based methods requiring additional online processing or iterative multi-step reasoning. We present IndexRAG, a novel approach that shifts cross-document reasoning from online inference to offline indexing. IndexRAG identifies bridge entities shared across documents and generates bridging facts as independently retrievable units, requiring no additional training or fine-tuning. Experiments on three widely-used multi-hop QA benchmarks (HotpotQA, 2WikiMultiHopQA, MuSiQue) show that IndexRAG improves F1 over Naive RAG by 4.6 points on average, while requiring only single-pass retrieval and a single LLM call at inference time. When combined with IRCoT, IndexRAG outperforms all graph-based baselines on average, including HippoRAG and FastGraphRAG, while relying solely on flat retrieval. Our code will be released upon acceptance.