Coherent Without Grounding, Grounded Without Success: Observability and Epistemic Failure
AI 摘要
大型语言模型在低可观测性和高可观测性领域存在能力与解释脱钩的现象,挑战传统认知。
主要贡献
- 提出了双向一致性悖论,揭示LLM能力与解释的解耦现象
- 构建了认知三角模型,分析了先验、信号和领域知识在不同可观测性下的交互
- 论证了行为成功和解释准确性不足以判断LLM是否真正理解
方法论
通过编译器优化和超参数调整实验,分析LLM在不同可观测性下的行为和解释。
原文摘要
When an agent can articulate why something works, we typically take this as evidence of genuine understanding. This presupposes that effective action and correct explanation covary, and that coherent explanation reliably signals both. I argue that this assumption fails for contemporary Large Language Models (LLMs). I introduce what I call the Bidirectional Coherence Paradox: competence and grounding not only dissociate but invert across epistemic conditions. In low-observability domains, LLMs often act successfully while misidentifying the mechanisms that produce their success. In high-observability domains, they frequently generate explanations that accurately track observable causal structure yet fail to translate those diagnoses into effective intervention. In both cases, explanatory coherence remains intact, obscuring the underlying dissociation. Drawing on experiments in compiler optimization and hyperparameter tuning, I develop the Epistemic Triangle, a model of how priors, signals, and domain knowledge interact under varying observability. The results suggest that neither behavioral success nor explanatory accuracy alone suffices for attributing understanding. I argue that evaluating artificial epistemic agents requires a tripartite framework -- coherence, grounding, and a proper basing relation linking explanation to action. The systematic separation of knowing-that and knowing-how in LLMs thus challenges assumptions inherited from both epistemology and current AI evaluation practice.