LLM Reasoning 相关度: 9/10

Reasoning with Latent Tokens in Diffusion Language Models

Andre He, Sean Welleck, Daniel Fried
arXiv: 2602.03769v1 发布: 2026-02-03 更新: 2026-02-03

AI 摘要

扩散语言模型通过联合预测未知token进行推理,本文探究了隐变量token的作用,并将其引入自回归模型。

主要贡献

  • 揭示了扩散模型中隐变量token对于推理能力的重要性
  • 提出了一种调节隐变量token数量的方法,平衡推理速度和样本质量
  • 将隐变量token引入自回归模型,提升其在推理任务上的表现

方法论

通过消融实验分析联合预测机制,提出调节隐变量token数量方法,并在扩散模型和自回归模型上进行实验验证。

原文摘要

Discrete diffusion models have recently become competitive with autoregressive models for language modeling, even outperforming them on reasoning tasks requiring planning and global coherence, but they require more computation at inference time. We trace this trade-off to a key mechanism: diffusion models are trained to jointly predict a distribution over all unknown tokens, including those that will not actually be decoded in the current step. Ablating this joint prediction yields faster inference but degrades performance, revealing that accurate prediction at the decoded position relies on joint reasoning about the distribution of undecoded tokens. We interpret these as latent tokens and introduce a method for modulating their number, demonstrating empirically that this enables a smooth tradeoff between inference speed and sample quality. Furthermore, we demonstrate that latent tokens can be introduced into autoregressive models through an auxiliary multi-token prediction objective, yielding substantial improvements on the same reasoning tasks where they have traditionally struggled. Our results suggest that latent tokens, while arising naturally in diffusion, represent a general mechanism for improving performance on tasks requiring global coherence or lookahead.

标签

扩散模型 自回归模型 推理 隐变量

arXiv 分类

cs.LG