LLM Reasoning 相关度: 8/10

Distill and Align Decomposition for Enhanced Claim Verification

Jabez Magomere, Elena Kochkina, Samuel Mensah, Simerjot Kaur, Fernando Acero, Arturo Oncevay, Charese H. Smiley, Xiaomo Liu, Manuela Veloso
arXiv: 2602.21857v1 发布: 2026-02-25 更新: 2026-02-25

AI 摘要

提出一种强化学习方法,联合优化句子分解质量和验证器对齐,提升复杂声明验证性能。

主要贡献

  • 提出基于GRPO的强化学习方法
  • 引入结构化序列推理和知识蒸馏
  • 设计多目标奖励函数

方法论

使用强化学习,通过知识蒸馏和多目标奖励,联合优化分解质量和验证器对齐,提升验证性能。

原文摘要

Complex claim verification requires decomposing sentences into verifiable subclaims, yet existing methods struggle to align decomposition quality with verification performance. We propose a reinforcement learning (RL) approach that jointly optimizes decomposition quality and verifier alignment using Group Relative Policy Optimization (GRPO). Our method integrates: (i) structured sequential reasoning; (ii) supervised finetuning on teacher-distilled exemplars; and (iii) a multi-objective reward balancing format compliance, verifier alignment, and decomposition quality. Across six evaluation settings, our trained 8B decomposer improves downstream verification performance to (71.75%) macro-F1, outperforming prompt-based approaches ((+1.99), (+6.24)) and existing RL methods ((+5.84)). Human evaluation confirms the high quality of the generated subclaims. Our framework enables smaller language models to achieve state-of-the-art claim verification by jointly optimising for verification accuracy and decomposition quality.

标签

claim verification decomposition reinforcement learning knowledge distillation

arXiv 分类

cs.AI cs.CL cs.LG