LLM Reasoning 相关度: 9/10

Stop Rewarding Hallucinated Steps: Faithfulness-Aware Step-Level Reinforcement Learning for Small Reasoning Models

Shuo Nie, Hexuan Deng, Chao Wang, Ruiyu Fang, Xuebo Liu, Shuangyong Song, Yu Li, Min Zhang, Xuelong Li

arXiv: 2602.05897v1 发布: 2026-02-05 更新: 2026-02-05

下载 PDF arXiv 页面

AI 摘要

FaithRL通过引入显式可信度奖励和隐式截断重采样，提升小型推理模型CoT推理的可靠性。

主要贡献

提出FaithRL，一种可信度感知的步骤级别强化学习方法
引入显式可信度奖励，鼓励推理过程的忠实性
采用隐式截断重采样策略，生成对比信号
在Open-Book QA基准测试中验证了FaithRL的有效性

方法论

使用强化学习，结合显式可信度奖励和隐式截断重采样，训练小型推理模型，提升推理步骤的真实性。

原文摘要

As large language models become smaller and more efficient, small reasoning models (SRMs) are crucial for enabling chain-of-thought (CoT) reasoning in resource-constrained settings. However, they are prone to faithfulness hallucinations, especially in intermediate reasoning steps. Existing mitigation methods based on online reinforcement learning rely on outcome-based rewards or coarse-grained CoT evaluation, which can inadvertently reinforce unfaithful reasoning when the final answer is correct. To address these limitations, we propose Faithfulness-Aware Step-Level Reinforcement Learning (FaithRL), introducing step-level supervision via explicit faithfulness rewards from a process reward model, together with an implicit truncated resampling strategy that generates contrastive signals from faithful prefixes. Experiments across multiple SRMs and Open-Book QA benchmarks demonstrate that FaithRL consistently reduces hallucinations in both the CoT and final answers, leading to more faithful and reliable reasoning. Code is available at https://github.com/Easy195/FaithRL.

arXiv 分类

cs.CL

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类