LLM Reasoning 相关度: 9/10

Breaking the Chain: A Causal Analysis of LLM Faithfulness to Intermediate Structures

Oleg Somov, Mikhail Chaichuk, Mikhail Seleznyov, Alexander Panchenko, Elena Tutubalina
arXiv: 2603.16475v1 发布: 2026-03-17 更新: 2026-03-17

AI 摘要

研究发现LLM在schema引导推理中,中间结构对最终输出的因果影响较弱,更多作为上下文信息。

主要贡献

  • 提出了一种因果评估协议,用于衡量LLM对中间结构的忠实度。
  • 发现LLM对中间结构的表观忠实度脆弱,改变中间结构后预测更新失败。
  • 验证了将最终决策外包给外部工具可以提高LLM的忠实度。

方法论

通过控制中间结构的编辑,观察LLM输出的改变,衡量中间结构对最终决策的因果影响。

原文摘要

Schema-guided reasoning pipelines ask LLMs to produce explicit intermediate structures -- rubrics, checklists, verification queries -- before committing to a final decision. But do these structures causally determine the output, or merely accompany it? We introduce a causal evaluation protocol that makes this directly measurable: by selecting tasks where a deterministic function maps intermediate structures to decisions, every controlled edit implies a unique correct output. Across eight models and three benchmarks, models appear self-consistent with their own intermediate structures but fail to update predictions after intervention in up to 60% of cases -- revealing that apparent faithfulness is fragile once the intermediate structure changes. When derivation of the final decision from the structure is delegated to an external tool, this fragility largely disappears; however, prompts which ask to prioritize the intermediate structure over the original input do not materially close the gap. Overall, intermediate structures in schema-guided pipelines function as influential context rather than stable causal mediators.

标签

LLM 因果推理 忠实度 中间结构 Schema引导推理

arXiv 分类

cs.AI