LLM Reasoning 相关度: 9/10

Stop Rewarding Hallucinated Steps: Faithfulness-Aware Step-Level Reinforcement Learning for Small Reasoning Models

Shuo Nie, Hexuan Deng, Chao Wang, Ruiyu Fang, Xuebo Liu, Shuangyong Song, Yu Li, Min Zhang, Xuelong Li
arXiv: 2602.05897v1 发布: 2026-02-05 更新: 2026-02-05

AI 摘要

FaithRL通过引入显式可信度奖励和隐式截断重采样,提升小型推理模型CoT推理的可靠性。

主要贡献

  • 提出FaithRL,一种可信度感知的步骤级别强化学习方法
  • 引入显式可信度奖励,鼓励推理过程的忠实性
  • 采用隐式截断重采样策略,生成对比信号
  • 在Open-Book QA基准测试中验证了FaithRL的有效性

方法论

使用强化学习,结合显式可信度奖励和隐式截断重采样,训练小型推理模型,提升推理步骤的真实性。

原文摘要

As large language models become smaller and more efficient, small reasoning models (SRMs) are crucial for enabling chain-of-thought (CoT) reasoning in resource-constrained settings. However, they are prone to faithfulness hallucinations, especially in intermediate reasoning steps. Existing mitigation methods based on online reinforcement learning rely on outcome-based rewards or coarse-grained CoT evaluation, which can inadvertently reinforce unfaithful reasoning when the final answer is correct. To address these limitations, we propose Faithfulness-Aware Step-Level Reinforcement Learning (FaithRL), introducing step-level supervision via explicit faithfulness rewards from a process reward model, together with an implicit truncated resampling strategy that generates contrastive signals from faithful prefixes. Experiments across multiple SRMs and Open-Book QA benchmarks demonstrate that FaithRL consistently reduces hallucinations in both the CoT and final answers, leading to more faithful and reliable reasoning. Code is available at https://github.com/Easy195/FaithRL.

标签

强化学习 链式思考 可信度 推理模型

arXiv 分类

cs.CL