DOS: Dependency-Oriented Sampler for Masked Diffusion Language Models
AI 摘要
提出Dependency-Oriented Sampler (DOS)解码策略,利用token间依赖关系优化Masked Diffusion Language Models的生成。
主要贡献
- 提出Dependency-Oriented Sampler (DOS)
- 利用attention矩阵近似token间依赖关系
- 提升代码生成和数学推理任务性能
方法论
利用Transformer的注意力矩阵捕捉token间依赖,在生成过程中根据依赖关系调整mask位置的token更新。
原文摘要
Masked diffusion language models (MDLMs) have recently emerged as a new paradigm in language modeling, offering flexible generation dynamics and enabling efficient parallel decoding. However, existing decoding strategies for pre-trained MDLMs predominantly rely on token-level uncertainty criteria, while largely overlooking sequence-level information and inter-token dependencies. To address this limitation, we propose Dependency-Oriented Sampler (DOS), a training-free decoding strategy that leverages inter-token dependencies to inform token updates during generation. Specifically, DOS exploits attention matrices from transformer blocks to approximate inter-token dependencies, emphasizing information from unmasked tokens when updating masked positions. Empirical results demonstrate that DOS consistently achieves superior performance on both code generation and mathematical reasoning tasks. Moreover, DOS can be seamlessly integrated with existing parallel sampling methods, leading to improved generation efficiency without sacrificing generation quality.