Improving Parametric Knowledge Access in Reasoning Language Models
AI 摘要
该论文研究如何提升语言模型在推理过程中访问自身参数知识的能力,并提出基于强化学习的训练方法。
主要贡献
- 发现语言模型在访问自身知识时推理能力不足
- 提出通过强化学习训练模型进行参数知识推理的方法
- 验证了该方法在多个QA数据集上的有效性
方法论
使用世界知识问答任务作为可验证的奖励,通过强化学习训练语言模型,使其更好地推理并访问自身参数知识。
原文摘要
We study reasoning for accessing world knowledge stored in a language model's parameters. For example, recalling that Canberra is Australia's capital may benefit from thinking through major cities and the concept of purpose-built capitals. While reasoning language models are trained via reinforcement learning to produce reasoning traces on tasks such as mathematics, they may not reason well for accessing their own world knowledge. We first find that models do not generate their best world knowledge reasoning by default: adding a simple "think step-by-step" cue demonstrates statistically significant improvement in knowledge recall but not math. Motivated by this, we propose training models to reason over their parametric knowledge using world-knowledge question answering as a verifiable reward. After reinforcement learning on TriviaQA (+9.9%), performance also improves on Natural Questions, HotpotQA, SimpleQA, and StrategyQA by 4.2%, 2.1%, 0.6%, and 3.0%, respectively. Reasoning models are under-optimized for parametric knowledge access, but can be easily trained to reason better.