Parallel-Probe: Towards Efficient Parallel Thinking via 2D Probing
AI 摘要
提出Parallel-Probe框架,通过2D探测优化并行推理,实现效率与准确率的平衡。
主要贡献
- 提出2D探测方法,揭示并行推理中的宽度-深度动态
- 设计Parallel-Probe控制器,基于共识提前停止和偏差剪枝动态优化并行推理
- 实验证明Parallel-Probe在效率上优于传统多数投票
方法论
通过2D探测分析并行推理的宽度-深度动态,利用共识和偏差信息进行推理深度调节和分支剪枝。
原文摘要
Parallel thinking has emerged as a promising paradigm for reasoning, yet it imposes significant computational burdens. Existing efficiency methods primarily rely on local, per-trajectory signals and lack principled mechanisms to exploit global dynamics across parallel branches. We introduce 2D probing, an interface that exposes the width-depth dynamics of parallel thinking by periodically eliciting intermediate answers from all branches. Our analysis reveals three key insights: non-monotonic scaling across width-depth allocations, heterogeneous reasoning branch lengths, and early stabilization of global consensus. Guided by these insights, we introduce $\textbf{Parallel-Probe}$, a training-free controller designed to optimize online parallel thinking. Parallel-Probe employs consensus-based early stopping to regulate reasoning depth and deviation-based branch pruning to dynamically adjust width. Extensive experiments across three benchmarks and multiple models demonstrate that Parallel-Probe establishes a superior Pareto frontier for test-time scaling. Compared to standard majority voting, it reduces sequential tokens by up to $\textbf{35.8}$% and total token cost by over $\textbf{25.8}$% while maintaining competitive accuracy.