LLM Reasoning 相关度: 7/10

Learning User Interests via Reasoning and Distillation for Cross-Domain News Recommendation

Mengdan Zhu, Yufan Zhao, Tao Di, Yulan Yan, Liang Zhao
arXiv: 2602.15005v1 发布: 2026-02-16 更新: 2026-02-16

AI 摘要

提出了一种基于强化学习和知识蒸馏的跨域新闻推荐方法,提升兴趣建模和推荐性能。

主要贡献

  • 提出强化学习框架生成兴趣驱动的新闻搜索查询
  • 利用 GRPO 和多重奖励优化查询列表生成策略
  • 通过知识蒸馏将策略迁移到轻量级模型,便于部署

方法论

使用强化学习训练大型语言模型生成高质量的兴趣驱动查询列表,并通过知识蒸馏部署到生产环境。

原文摘要

News recommendation plays a critical role in online news platforms by helping users discover relevant content. Cross-domain news recommendation further requires inferring user's underlying information needs from heterogeneous signals that often extend beyond direct news consumption. A key challenge lies in moving beyond surface-level behaviors to capture deeper, reusable user interests while maintaining scalability in large-scale production systems. In this paper, we present a reinforcement learning framework that trains large language models to generate high-quality lists of interest-driven news search queries from cross-domain user signals. We formulate query-list generation as a policy optimization problem and employ GRPO with multiple reward signals. We systematically study two compute dimensions: inference-time sampling and model capacity, and empirically observe consistent improvements with increased compute that exhibit scaling-like behavior. Finally, we perform on-policy distillation to transfer the learned policy from a large, compute-intensive teacher to a compact student model suitable for scalable deployment. Extensive offline experiments, ablation studies and large-scale online A/B tests in a production news recommendation system demonstrate consistent gains in both interest modeling quality and downstream recommendation performance.

标签

新闻推荐 跨域推荐 强化学习 知识蒸馏

arXiv 分类

cs.CL cs.IR