DSB: Dynamic Sliding Block Scheduling for Diffusion LLMs
AI 摘要
提出动态滑动块调度DSB,优化Diffusion LLM的并行解码质量和效率,并提出DSB Cache加速。
主要贡献
- 分析了Naive Block Scheduling的局限性
- 提出了动态滑动块调度方法DSB
- 提出了针对DSB的KV-cache机制DSB Cache
方法论
提出训练无关的动态块调度方法,根据语义难度调整块大小,结合KV-cache优化效率。
原文摘要
Diffusion large language models (dLLMs) have emerged as a promising alternative for text generation, distinguished by their native support for parallel decoding. In practice, block inference is crucial for avoiding order misalignment in global bidirectional decoding and improving output quality. However, the widely-used fixed, predefined block (naive) schedule is agnostic to semantic difficulty, making it a suboptimal strategy for both quality and efficiency: it can force premature commitments to uncertain positions while delaying easy positions near block boundaries. In this work, we analyze the limitations of naive block scheduling and disclose the importance of dynamically adapting the schedule to semantic difficulty for reliable and efficient inference. Motivated by this, we propose Dynamic Sliding Block (DSB), a training-free block scheduling method that uses a sliding block with a dynamic size to overcome the rigidity of the naive block. To further improve efficiency, we introduce DSB Cache, a training-free KV-cache mechanism tailored to DSB. Extensive experiments across multiple models and benchmarks demonstrate that DSB, together with DSB Cache, consistently improves both generation quality and inference efficiency for dLLMs. Code is released at https://github.com/lizhuo-luo/DSB.