AI Agents 相关度: 7/10

Cold-Start Personalization via Training-Free Priors from Structured World Models

Avinandan Bose, Shuyue Stella Li, Faeze Brahman, Pang Wei Koh, Simon Shaolei Du, Yulia Tsvetkov, Maryam Fazel, Lin Xiao, Asli Celikyilmaz
arXiv: 2602.15012v1 发布: 2026-02-16 更新: 2026-02-16

AI 摘要

提出Pep框架,通过离线学习结构化世界模型,在线贝叶斯推断实现高效的冷启动个性化推荐。

主要贡献

  • 提出Pep框架,将冷启动推荐分解为离线结构学习和在线贝叶斯推断。
  • 利用结构化世界模型,高效学习用户偏好之间的关联性。
  • 实验证明Pep在多个领域优于强化学习方法,互动次数更少,参数量更低。

方法论

离线学习用户偏好相关性的结构化世界模型,在线使用贝叶斯推断选择信息量大的问题并预测用户偏好。

原文摘要

Cold-start personalization requires inferring user preferences through interaction when no user-specific historical data is available. The core challenge is a routing problem: each task admits dozens of preference dimensions, yet individual users care about only a few, and which ones matter depends on who is asking. With a limited question budget, asking without structure will miss the dimensions that matter. Reinforcement learning is the natural formulation, but in multi-turn settings its terminal reward fails to exploit the factored, per-criterion structure of preference data, and in practice learned policies collapse to static question sequences that ignore user responses. We propose decomposing cold-start elicitation into offline structure learning and online Bayesian inference. Pep (Preference Elicitation with Priors) learns a structured world model of preference correlations offline from complete profiles, then performs training-free Bayesian inference online to select informative questions and predict complete preference profiles, including dimensions never asked about. The framework is modular across downstream solvers and requires only simple belief models. Across medical, mathematical, social, and commonsense reasoning, Pep achieves 80.8% alignment between generated responses and users' stated preferences versus 68.5% for RL, with 3-5x fewer interactions. When two users give different answers to the same question, Pep changes its follow-up 39-62% of the time versus 0-28% for RL. It does so with ~10K parameters versus 8B for RL, showing that the bottleneck in cold-start elicitation is the capability to exploit the factored structure of preference data.

标签

冷启动推荐 个性化推荐 贝叶斯推断 结构化世界模型

arXiv 分类

cs.CL cs.AI cs.LG