LLM Reasoning 相关度: 9/10

Lookahead Path Likelihood Optimization for Diffusion LLMs

Xuejie Liu, Yap Vit Chun, Yitao Liang, Anji Liu
arXiv: 2602.03496v1 发布: 2026-02-03 更新: 2026-02-03

AI 摘要

提出了一种基于路径似然优化的扩散LLM解码方法,提升推理准确性。

主要贡献

  • 提出了路径对数似然(Path LL)目标
  • 设计了高效的值估计器POKE
  • 提出了基于POKE的序列蒙特卡洛搜索框架POKE-SMC

方法论

引入路径似然作为优化目标,通过值估计器预测未来路径似然,并利用蒙特卡洛搜索寻找最优解码路径。

原文摘要

Diffusion Large Language Models (dLLMs) support arbitrary-order generation, yet their inference performance critically depends on the unmasking order. Existing strategies rely on heuristics that greedily optimize local confidence, offering limited guidance for identifying unmasking paths that are globally consistent and accurate. To bridge this gap, we introduce path log-likelihood (Path LL), a trajectory-conditioned objective that strongly correlates with downstream accuracy and enables principled selection of unmasking paths. To optimize Path LL at inference time, we propose POKE, an efficient value estimator that predicts the expected future Path LL of a partial decoding trajectory. We then integrate this lookahead signal into POKE-SMC, a Sequential Monte Carlo-based search framework for dynamically identifying optimal unmasking paths. Extensive experiments across 6 reasoning tasks show that POKE-SMC consistently improves accuracy, achieving 2%--3% average gains over strong decoding-time scaling baselines at comparable inference overhead on LLaDA models and advancing the accuracy--compute Pareto frontier.

标签

Diffusion LLM Inference Unmasking Order Lookahead Search

arXiv 分类

cs.LG