Multimodal Learning 相关度: 8/10

Variational Speculative Decoding: Rethinking Draft Training from Token Likelihood to Sequence Acceptance

Xiandong Zou, Jianshu Li, Jing Huang, Pan Zhou
arXiv: 2602.05774v1 发布: 2026-02-05 更新: 2026-02-05

AI 摘要

提出了变分推测解码VSD,通过优化草稿路径来加速LLM和MLLM的推理,提高解码效率。

主要贡献

  • 提出了Variational Speculative Decoding (VSD)框架
  • 使用变分推断优化草稿训练,最大化目标模型接受概率
  • 引入路径级别效用和期望最大化程序,提升草稿质量

方法论

将草稿训练视为变分推断,优化隐变量(草稿路径),使用MCMC采样和自适应拒绝加权等技术。

原文摘要

Speculative decoding accelerates inference for (M)LLMs, yet a training-decoding discrepancy persists: while existing methods optimize single greedy trajectories, decoding involves verifying and ranking multiple sampled draft paths. We propose Variational Speculative Decoding (VSD), formulating draft training as variational inference over latent proposals (draft paths). VSD maximizes the marginal probability of target-model acceptance, yielding an ELBO that promotes high-quality latent proposals while minimizing divergence from the target distribution. To enhance quality and reduce variance, we incorporate a path-level utility and optimize via an Expectation-Maximization procedure. The E-step draws MCMC samples from an oracle-filtered posterior, while the M-step maximizes weighted likelihood using Adaptive Rejection Weighting (ARW) and Confidence-Aware Regularization (CAR). Theoretical analysis confirms that VSD increases expected acceptance length and speedup. Extensive experiments across LLMs and MLLMs show that VSD achieves up to a 9.6% speedup over EAGLE-3 and 7.9% over ViSpec, significantly improving decoding efficiency.

标签

推测解码 变分推断 语言模型加速 MLLM

arXiv 分类

cs.LG cs.AI