LLM Reasoning 相关度: 7/10

Steering diffusion models with quadratic rewards: a fine-grained analysis

Ankur Moitra, Andrej Risteski, Dhruv Rohatgi

arXiv: 2602.16570v1 发布: 2026-02-18 更新: 2026-02-18

下载 PDF arXiv 页面

AI 摘要

研究了扩散模型在二次奖励函数下的采样问题，并分析了其计算复杂性。

主要贡献

证明了线性奖励倾斜始终可以有效采样
提出了使用Hubbard-Stratonovich变换的低秩正定二次倾斜的有效采样算法
证明了负定二次倾斜问题即使在秩为1的情况下也是难解的

方法论

理论分析，结合Hubbard-Stratonovich变换，研究了不同类型二次奖励函数下的采样算法和计算复杂性。

原文摘要

Inference-time algorithms are an emerging paradigm in which pre-trained models are used as subroutines to solve downstream tasks. Such algorithms have been proposed for tasks ranging from inverse problems and guided image generation to reasoning. However, the methods currently deployed in practice are heuristics with a variety of failure modes -- and we have very little understanding of when these heuristics can be efficiently improved. In this paper, we consider the task of sampling from a reward-tilted diffusion model -- that is, sampling from $p^{\star}(x) \propto p(x) \exp(r(x))$ -- given a reward function $r$ and pre-trained diffusion oracle for $p$. We provide a fine-grained analysis of the computational tractability of this task for quadratic rewards $r(x) = x^\top A x + b^\top x$. We show that linear-reward tilts are always efficiently sampleable -- a simple result that seems to have gone unnoticed in the literature. We use this as a building block, along with a conceptually new ingredient -- the Hubbard-Stratonovich transform -- to provide an efficient algorithm for sampling from low-rank positive-definite quadratic tilts, i.e. $r(x) = x^\top A x$ where $A$ is positive-definite and of rank $O(1)$. For negative-definite tilts, i.e. $r(x) = - x^\top A x$ where $A$ is positive-definite, we prove that the problem is intractable even if $A$ is of rank 1 (albeit with exponentially-large entries).

arXiv 分类

cs.LG cs.DS

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类