LLM Reasoning 相关度: 9/10

Answering the Wrong Question: Reasoning Trace Inversion for Abstention in LLMs

Abinitha Gourabathina, Inkit Padhi, Manish Nagireddy, Subhajit Chaudhury, Prasanna Sattigeri
arXiv: 2604.02230v1 发布: 2026-04-02 更新: 2026-04-02

AI 摘要

论文提出Trace Inversion方法,通过比较原始查询和重构查询,提高LLM的拒答能力。

主要贡献

  • 提出Query Misalignment Framework
  • 提出Trace Inversion方法
  • 实验验证了Trace Inversion在提高拒答性能方面的有效性

方法论

首先生成推理轨迹,然后基于轨迹重构最可能的查询,最后比较原始查询和重构查询的相似度。

原文摘要

For Large Language Models (LLMs) to be reliably deployed, models must effectively know when not to answer: abstain. Reasoning models, in particular, have gained attention for impressive performance on complex tasks. However, reasoning models have been shown to have worse abstention abilities. Taking the vulnerabilities of reasoning models into account, we propose our Query Misalignment Framework. Hallucinations resulting in failed abstention can be reinterpreted as LLMs answering the wrong question (rather than answering a question incorrectly). Based on this framework, we develop a new class of state-of-the-art abstention methods called Trace Inversion. First, we generate the reasoning trace of a model. Based on only the trace, we then reconstruct the most likely query that the model responded to. Finally, we compare the initial query with the reconstructed query. Low similarity score between the initial query and reconstructed query suggests that the model likely answered the question incorrectly and is flagged to abstain. Extensive experiments demonstrate that Trace Inversion effectively boosts abstention performance in four frontier LLMs across nine abstention QA datasets, beating competitive baselines in 33 out of 36 settings.

标签

LLM 拒答 推理 Trace Inversion

arXiv 分类

cs.AI