Learning to Reason Faithfully through Step-Level Faithfulness Maximization
AI 摘要
FaithRL通过最大化步骤级忠实度来提升LLM多步推理的可靠性,降低幻觉率。
主要贡献
- 提出了FaithRL框架,直接优化推理忠实度
- 设计了几何奖励机制和忠实度感知的优势调制机制
- 理论证明优化忠实度目标可以缓解过度自信问题
方法论
FaithRL通过几何奖励惩罚不支持的步骤,并使用忠实度感知的优势调制机制分配步骤级信用,从而最大化推理忠实度。
原文摘要
Reinforcement Learning with Verifiable Rewards (RLVR) has markedly improved the performance of Large Language Models (LLMs) on tasks requiring multi-step reasoning. However, most RLVR pipelines rely on sparse outcome-based rewards, providing little supervision over intermediate steps and thus encouraging over-confidence and spurious reasoning, which in turn increases hallucinations. To address this, we propose FaithRL, a general reinforcement learning framework that directly optimizes reasoning faithfulness. We formalize a faithfulness-maximization objective and theoretically show that optimizing it mitigates over-confidence. To instantiate this objective, we introduce a geometric reward design and a faithfulness-aware advantage modulation mechanism that assigns step-level credit by penalizing unsupported steps while preserving valid partial derivations. Across diverse backbones and benchmarks, FaithRL consistently reduces hallucination rates while maintaining (and often improving) answer correctness. Further analysis confirms that FaithRL increases step-wise reasoning faithfulness and generalizes robustly. Our code is available at https://github.com/aintdoin/FaithRL.