Diamond Maps: Efficient Reward Alignment via Stochastic Flow Maps
AI 摘要
Diamond Maps通过随机流图实现高效的奖励对齐,提升生成模型适应性。
主要贡献
- 提出Diamond Maps,一种新的随机流图模型
- Diamond Maps在推理时能高效对齐任意奖励
- Diamond Maps优于现有方法,且更具扩展性
方法论
通过从GLASS Flows中蒸馏学习Diamond Maps,利用单步采样器和价值函数估计实现奖励对齐。
原文摘要
Flow and diffusion models produce high-quality samples, but adapting them to user preferences or constraints post-training remains costly and brittle, a challenge commonly called reward alignment. We argue that efficient reward alignment should be a property of the generative model itself, not an afterthought, and redesign the model for adaptability. We propose "Diamond Maps", stochastic flow map models that enable efficient and accurate alignment to arbitrary rewards at inference time. Diamond Maps amortize many simulation steps into a single-step sampler, like flow maps, while preserving the stochasticity required for optimal reward alignment. This design makes search, sequential Monte Carlo, and guidance scalable by enabling efficient and consistent estimation of the value function. Our experiments show that Diamond Maps can be learned efficiently via distillation from GLASS Flows, achieve stronger reward alignment performance, and scale better than existing methods. Our results point toward a practical route to generative models that can be rapidly adapted to arbitrary preferences and constraints at inference time.