Agent Tuning & Optimization 相关度: 6/10

Diamond Maps: Efficient Reward Alignment via Stochastic Flow Maps

Peter Holderrieth, Douglas Chen, Luca Eyring, Ishin Shah, Giri Anantharaman, Yutong He, Zeynep Akata, Tommi Jaakkola, Nicholas Matthew Boffi, Max Simchowitz

arXiv: 2602.05993v1 发布: 2026-02-05 更新: 2026-02-05

下载 PDF arXiv 页面

AI 摘要

Diamond Maps通过随机流图实现高效的奖励对齐，提升生成模型适应性。

主要贡献

提出Diamond Maps，一种新的随机流图模型
Diamond Maps在推理时能高效对齐任意奖励
Diamond Maps优于现有方法，且更具扩展性

方法论

通过从GLASS Flows中蒸馏学习Diamond Maps，利用单步采样器和价值函数估计实现奖励对齐。

原文摘要

Flow and diffusion models produce high-quality samples, but adapting them to user preferences or constraints post-training remains costly and brittle, a challenge commonly called reward alignment. We argue that efficient reward alignment should be a property of the generative model itself, not an afterthought, and redesign the model for adaptability. We propose "Diamond Maps", stochastic flow map models that enable efficient and accurate alignment to arbitrary rewards at inference time. Diamond Maps amortize many simulation steps into a single-step sampler, like flow maps, while preserving the stochasticity required for optimal reward alignment. This design makes search, sequential Monte Carlo, and guidance scalable by enabling efficient and consistent estimation of the value function. Our experiments show that Diamond Maps can be learned efficiently via distillation from GLASS Flows, achieve stronger reward alignment performance, and scale better than existing methods. Our results point toward a practical route to generative models that can be rapidly adapted to arbitrary preferences and constraints at inference time.

arXiv 分类

cs.LG cs.AI

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类