LLM Reasoning 相关度: 8/10

Goose: Anisotropic Speculation Trees for Training-Free Speculative Decoding

Tao Jin, Phuong Minh Nguyen, Naoya Inoue
arXiv: 2604.02047v1 发布: 2026-04-02 更新: 2026-04-02

AI 摘要

GOOSE提出一种训练无关的自适应spine树结构,用于加速LLM推断中的推测解码。

主要贡献

  • 提出Anisotropic Speculation Trees(GOOSE)框架
  • 证明当token质量存在差距时,非对称树是最优的
  • 在多个LLM和基准测试中实现了显著加速

方法论

通过上下文匹配和统计预测构建候选token树,根据token质量自适应调整树的深度和广度,实现高效的推测解码。

原文摘要

Speculative decoding accelerates large language model inference by drafting multiple candidate tokens and verifying them in a single forward pass. Candidates are organized as a tree: deeper trees accept more tokens per step, but adding depth requires sacrificing breadth (fallback options) under a fixed verification budget. Existing training-free methods draft from a single token source and shape their trees without distinguishing candidate quality across origins. We observe that two common training-free token sources - n-gram matches copied from the input context, and statistical predictions from prior forward passes - differ dramatically in acceptance rate (~6x median gap, range 2-18x across five models and five benchmarks). We prove that when such a quality gap exists, the optimal tree is anisotropic (asymmetric): reliable tokens should form a deep chain while unreliable tokens spread as wide branches, breaking through the depth limit of balanced trees. We realize this structure in GOOSE, a training-free framework that builds an adaptive spine tree - a deep chain of high-acceptance context-matched tokens with wide branches of low-acceptance alternatives at each node. We prove that the number of tokens accepted per step is at least as large as that of either source used alone. On five LLMs (7B-33B) and five benchmarks, GOOSE achieves 1.9-4.3x lossless speedup, outperforming balanced-tree baselines by 12-33% under the same budget.

标签

推测解码 大语言模型 推理加速 训练无关

arXiv 分类

cs.CL cs.AI