LLM Memory & RAG 相关度: 9/10

Training the Knowledge Base through Evidence Distillation and Write-Back Enrichment

Yuxing Lu, Xukai Zhao, Wei Wu, Jinzhuo Wang
arXiv: 2603.25737v1 发布: 2026-03-26 更新: 2026-03-26

AI 摘要

WriteBack-RAG提出了一种可训练的知识库方法,通过证据提炼和回写增强RAG性能。

主要贡献

  • 提出WriteBack-RAG框架,可训练知识库
  • 利用标注数据提炼知识并增强索引
  • 证明该方法可跨RAG方法迁移

方法论

使用标注数据识别相关文档,提炼知识单元,并将其与原始语料库一起索引。

原文摘要

The knowledge base in a retrieval-augmented generation (RAG) system is typically assembled once and never revised, even though the facts a query requires are often fragmented across documents and buried in irrelevant content. We argue that the knowledge base should be treated as a trainable component and propose WriteBack-RAG, a framework that uses labeled examples to identify where retrieval succeeds, isolate the relevant documents, and distill them into compact knowledge units that are indexed alongside the original corpus. Because the method modifies only the corpus, it can be applied once as an offline preprocessing step and combined with any RAG pipeline. Across four RAG methods, six benchmarks, and two LLM backbones, WriteBack-RAG improves every evaluated setting, with gains averaging +2.14%. Cross-method transfer experiments further show that the distilled knowledge benefits RAG pipelines other than the one used to produce it, confirming that the improvement resides in the corpus itself.

标签

RAG Knowledge Base Evidence Distillation Write-Back Information Retrieval

arXiv 分类

cs.AI cs.CL cs.IR