Breaking the Chain: A Causal Analysis of LLM Faithfulness to Intermediate Structures
AI 摘要
研究发现LLM在schema引导推理中,中间结构对最终输出的因果影响较弱,更多作为上下文信息。
主要贡献
- 提出了一种因果评估协议,用于衡量LLM对中间结构的忠实度。
- 发现LLM对中间结构的表观忠实度脆弱,改变中间结构后预测更新失败。
- 验证了将最终决策外包给外部工具可以提高LLM的忠实度。
方法论
通过控制中间结构的编辑,观察LLM输出的改变,衡量中间结构对最终决策的因果影响。
原文摘要
Schema-guided reasoning pipelines ask LLMs to produce explicit intermediate structures -- rubrics, checklists, verification queries -- before committing to a final decision. But do these structures causally determine the output, or merely accompany it? We introduce a causal evaluation protocol that makes this directly measurable: by selecting tasks where a deterministic function maps intermediate structures to decisions, every controlled edit implies a unique correct output. Across eight models and three benchmarks, models appear self-consistent with their own intermediate structures but fail to update predictions after intervention in up to 60% of cases -- revealing that apparent faithfulness is fragile once the intermediate structure changes. When derivation of the final decision from the structure is delegated to an external tool, this fragility largely disappears; however, prompts which ask to prioritize the intermediate structure over the original input do not materially close the gap. Overall, intermediate structures in schema-guided pipelines function as influential context rather than stable causal mediators.