Reasoning as Compression: Unifying Budget Forcing via the Conditional Information Bottleneck
AI 摘要
论文提出基于条件信息瓶颈(CIB)的LLM推理压缩方法,提升效率并保持精度。
主要贡献
- 将高效推理重构为信息瓶颈下的有损压缩问题
- 提出基于CIB的LLM推理模型,解决了attention机制破坏马尔可夫性的问题
- 引入语义先验,通过语言模型先验测量token成本
方法论
基于条件信息瓶颈(CIB)原则,最大化任务奖励,同时在推理轨迹的先验下压缩completion,并使用强化学习进行优化。
原文摘要
Chain-of-Thought (CoT) prompting improves LLM accuracy on complex tasks but often increases token usage and inference cost. Existing "Budget Forcing" methods reducing cost via fine-tuning with heuristic length penalties, suppress both essential reasoning and redundant filler. We recast efficient reasoning as a lossy compression problem under the Information Bottleneck (IB) principle, and identify a key theoretical gap when applying naive IB to transformers: attention violates the Markov property between prompt, reasoning trace, and response. To resolve this issue, we model CoT generation under the Conditional Information Bottleneck (CIB) principle, where the reasoning trace Z acts as a computational bridge that contains only the information about the response Y that is not directly accessible from the prompt X. This yields a general Reinforcement Learning objective: maximize task reward while compressing completions under a prior over reasoning traces, subsuming common heuristics (e.g., length penalties) as special cases (e.g., uniform priors). In contrast to naive token-counting-based approaches, we introduce a semantic prior that measures token cost by surprisal under a language model prior. Empirically, our CIB objective prunes cognitive bloat while preserving fluency and logic, improving accuracy at moderate compression and enabling aggressive compression with minimal accuracy drop.