LLM Memory & RAG 相关度: 9/10

Reinforcement Fine-Tuning for History-Aware Dense Retriever in RAG

Yicheng Zhang, Zhen Qin, Zhaomin Wu, Wenqi Zhang, Shuiguang Deng
arXiv: 2602.03645v1 发布: 2026-02-03 更新: 2026-02-03

AI 摘要

提出了一种基于强化学习的历史感知稠密检索器微调方法,优化RAG管道的检索性能。

主要贡献

  • 提出了基于强化学习的检索器优化方法。
  • 使用随机抽样代替确定性检索,使检索器可以通过RL优化。
  • 引入检索历史到状态中,缓解多跳推理中的状态混叠问题。

方法论

将RAG构建为马尔科夫决策过程,用强化学习优化检索器,并在状态中加入检索历史信息。

原文摘要

Retrieval-augmented generation (RAG) enables large language models (LLMs) to produce evidence-based responses, and its performance hinges on the matching between the retriever and LLMs. Retriever optimization has emerged as an efficient alternative to fine-tuning LLMs. However, existing solutions suffer from objective mismatch between retriever optimization and the goal of RAG pipeline. Reinforcement learning (RL) provides a promising solution to address this limitation, yet applying RL to retriever optimization introduces two fundamental challenges: 1) the deterministic retrieval is incompatible with RL formulations, and 2) state aliasing arises from query-only retrieval in multi-hop reasoning. To address these challenges, we replace deterministic retrieval with stochastic sampling and formulate RAG as a Markov decision process, making retriever optimizable by RL. Further, we incorporate retrieval history into the state at each retrieval step to mitigate state aliasing. Extensive experiments across diverse RAG pipelines, datasets, and retriever scales demonstrate consistent improvements of our approach in RAG performance.

标签

RAG 强化学习 检索器优化 历史感知

arXiv 分类

cs.LG