Self-Aware Markov Models for Discrete Reasoning
AI 摘要
提出自适应马尔可夫模型,通过重掩码和自适应步数提升离散推理能力。
主要贡献
- 引入自感知马尔可夫模型
- 允许token重掩码以纠正错误
- 采用自适应步数,根据问题难度调整计算量
方法论
学习马尔可夫转移核,在自身输出上训练,结合轻量级预测头。
原文摘要
Standard masked discrete diffusion models face limitations in reasoning tasks due to their inability to correct their own mistakes on the masking path. Since they rely on a fixed number of denoising steps, they are unable to adjust their computation to the complexity of a given problem. To address these limitations, we introduce a method based on learning a Markov transition kernel that is trained on its own outputs. This design enables tokens to be remasked, allowing the model to correct its previous mistakes. Furthermore, we do not need a fixed time schedule but use a trained stopping criterion. This allows for adaptation of the number of function evaluations to the difficulty of the reasoning problem. Our adaptation adds two lightweight prediction heads, enabling reuse and fine-tuning of existing pretrained models. On the Sudoku-Extreme dataset we clearly outperform other flow based methods with a validity of 95%. For the Countdown-4 we only need in average of 10 steps to solve almost 96% of them correctly, while many problems can be solved already in 2 steps.