Discrete Diffusion Models Exploit Asymmetry to Solve Lookahead Planning Tasks
AI 摘要
研究表明,非自回归离散扩散模型通过利用规划任务的不对称性,在lookahead规划任务上表现优于自回归模型。
主要贡献
- 揭示了自回归和非自回归模型在lookahead任务上的不同机制
- 指出了规划任务中forward generation和reverse generation的不对称性
- 证明了非自回归模型可以利用这种不对称性
方法论
通过对比自回归和非自回归模型在lookahead任务中的训练和推理动态,进行机制分析。
原文摘要
While Autoregressive (AR) Transformer-based Generative Language Models are frequently employed for lookahead tasks, recent research suggests a potential discrepancy in their ability to perform planning tasks that require multi-step lookahead. In this work, we investigate the distinct emergent mechanisms that arise when training AR versus Non-Autoregressive (NAR) models, such as Discrete Diffusion Models (dLLMs), on lookahead tasks. By requiring the models to plan ahead to reach the correct conclusion, we analyze how these two paradigms fundamentally differ in their approach to the problem. We identify a critical asymmetry in planning problems: while forward generation requires complex lookahead at branching junctions, reverse generation is often deterministic. This asymmetry creates an opportunity for NAR models. Through mechanistic analysis of training and inference dynamics, we demonstrate that NAR models learn to solve planning tasks by utilizing future tokens to decode backwards, avoiding the need to learn complex traversal mechanisms entirely. Consequently, we report that both AR and NAR models are able to achieve perfect accuracy on the lookahead task. However, NAR models require exponentially fewer training examples and shallower architectures compared to AR models, which often fail to converge without specific curriculum adjustments.