LLM Reasoning 相关度: 9/10

Parallel-Probe: Towards Efficient Parallel Thinking via 2D Probing

Tong Zheng, Chengsong Huang, Runpeng Dai, Yun He, Rui Liu, Xin Ni, Huiwen Bao, Kaishen Wang, Hongtu Zhu, Jiaxin Huang, Furong Huang, Heng Huang
arXiv: 2602.03845v1 发布: 2026-02-03 更新: 2026-02-03

AI 摘要

提出Parallel-Probe框架,通过2D探测优化并行推理,实现效率与准确率的平衡。

主要贡献

  • 提出2D探测方法,揭示并行推理中的宽度-深度动态
  • 设计Parallel-Probe控制器,基于共识提前停止和偏差剪枝动态优化并行推理
  • 实验证明Parallel-Probe在效率上优于传统多数投票

方法论

通过2D探测分析并行推理的宽度-深度动态,利用共识和偏差信息进行推理深度调节和分支剪枝。

原文摘要

Parallel thinking has emerged as a promising paradigm for reasoning, yet it imposes significant computational burdens. Existing efficiency methods primarily rely on local, per-trajectory signals and lack principled mechanisms to exploit global dynamics across parallel branches. We introduce 2D probing, an interface that exposes the width-depth dynamics of parallel thinking by periodically eliciting intermediate answers from all branches. Our analysis reveals three key insights: non-monotonic scaling across width-depth allocations, heterogeneous reasoning branch lengths, and early stabilization of global consensus. Guided by these insights, we introduce $\textbf{Parallel-Probe}$, a training-free controller designed to optimize online parallel thinking. Parallel-Probe employs consensus-based early stopping to regulate reasoning depth and deviation-based branch pruning to dynamically adjust width. Extensive experiments across three benchmarks and multiple models demonstrate that Parallel-Probe establishes a superior Pareto frontier for test-time scaling. Compared to standard majority voting, it reduces sequential tokens by up to $\textbf{35.8}$% and total token cost by over $\textbf{25.8}$% while maintaining competitive accuracy.

标签

并行推理 效率优化 LLM 动态调整 2D探测

arXiv 分类

cs.CL