Learning User Interests via Reasoning and Distillation for Cross-Domain News Recommendation
AI 摘要
提出了一种基于强化学习和知识蒸馏的跨域新闻推荐方法,提升兴趣建模和推荐性能。
主要贡献
- 提出强化学习框架生成兴趣驱动的新闻搜索查询
- 利用 GRPO 和多重奖励优化查询列表生成策略
- 通过知识蒸馏将策略迁移到轻量级模型,便于部署
方法论
使用强化学习训练大型语言模型生成高质量的兴趣驱动查询列表,并通过知识蒸馏部署到生产环境。
原文摘要
News recommendation plays a critical role in online news platforms by helping users discover relevant content. Cross-domain news recommendation further requires inferring user's underlying information needs from heterogeneous signals that often extend beyond direct news consumption. A key challenge lies in moving beyond surface-level behaviors to capture deeper, reusable user interests while maintaining scalability in large-scale production systems. In this paper, we present a reinforcement learning framework that trains large language models to generate high-quality lists of interest-driven news search queries from cross-domain user signals. We formulate query-list generation as a policy optimization problem and employ GRPO with multiple reward signals. We systematically study two compute dimensions: inference-time sampling and model capacity, and empirically observe consistent improvements with increased compute that exhibit scaling-like behavior. Finally, we perform on-policy distillation to transfer the learned policy from a large, compute-intensive teacher to a compact student model suitable for scalable deployment. Extensive offline experiments, ablation studies and large-scale online A/B tests in a production news recommendation system demonstrate consistent gains in both interest modeling quality and downstream recommendation performance.