From BM25 to Corrective RAG: Benchmarking Retrieval Strategies for Text-and-Table Documents
AI 摘要
该论文系统评估了多种检索策略在文本和表格混合金融文档上的RAG系统性能,并提出了优化建议。
主要贡献
- 系统评估多种检索策略
- 发现BM25在金融文档上优于部分语义检索
- 提供了RAG系统在混合文档上的优化建议
方法论
通过金融QA基准测试,对比了稀疏检索、稠密检索、混合检索等多种检索策略,并评估了检索和生成质量。
原文摘要
Retrieval-Augmented Generation (RAG) systems critically depend on retrieval quality, yet no systematic comparison of modern retrieval methods exists for heterogeneous documents containing both text and tabular data. We benchmark ten retrieval strategies spanning sparse, dense, hybrid fusion, cross-encoder reranking, query expansion, index augmentation, and adaptive retrieval on a challenging financial QA benchmark of 23,088 queries over 7,318 documents with mixed text-and-table content. We evaluate retrieval quality via Recall@k, MRR, and nDCG, and end-to-end generation quality via Number Match, with paired bootstrap significance testing. Our results show that (1) a two-stage pipeline combining hybrid retrieval with neural reranking achieves Recall@5 of 0.816 and MRR@3 of 0.605, outperforming all single-stage methods by a large margin; (2) BM25 outperforms state-of-the-art dense retrieval on financial documents, challenging the common assumption that semantic search universally dominates; and (3) query expansion methods (HyDE, multi-query) and adaptive retrieval provide limited benefit for precise numerical queries, while contextual retrieval yields consistent gains. We provide ablation studies on fusion methods and reranker depth, actionable cost-accuracy recommendations, and release our full benchmark code.