Multimodal Learning 相关度: 9/10

The Cost of Reasoning: Chain-of-Thought Induces Overconfidence in Vision-Language Models

Robert Welch, Emir Konuk, Kevin Smith

arXiv: 2603.16728v1 发布: 2026-03-17 更新: 2026-03-17

下载 PDF arXiv 页面

AI 摘要

CoT推理会降低VLM的不确定性估计质量，导致模型过度自信，但一致性方法仍然有效。

主要贡献

揭示了CoT推理导致VLM过度自信的现象
指出了隐式答案条件化是造成过度自信的主要原因
提出了基于一致性的方法作为推理VLM中不确定性估计的有效替代方案

方法论

通过实验评估CoT推理对VLM不确定性估计的影响，并分析了隐式答案条件化的作用。

原文摘要

Vision-language models (VLMs) are increasingly deployed in high-stakes settings where reliable uncertainty quantification (UQ) is as important as predictive accuracy. Extended reasoning via chain-of-thought (CoT) prompting or reasoning-trained models has become ubiquitous in modern VLM pipelines, yet its effect on UQ reliability remains poorly understood. We show that reasoning consistently degrades the quality of most uncertainty estimates, even when it improves task accuracy. We identify implicit answer conditioning as the primary mechanism: as reasoning traces converge on a conclusion before the final answer is generated, token probabilities increasingly reflect consistency with the model's own reasoning trace rather than uncertainty about correctness. In effect, the model becomes overconfident in its answer. In contrast, agreement-based consistency remains robust and often improves under reasoning, making it a practical choice for uncertainty estimation in reasoning-enabled VLMs.

arXiv 分类

cs.LG

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类