Agent Tuning & Optimization 相关度: 8/10

Improving LLM-based Recommendation with Self-Hard Negatives from Intermediate Layers

Bingqian Li, Bowen Zheng, Xiaolei Wang, Long Zhang, Jinpeng Wang, Sheng Chen, Wayne Xin Zhao, Ji-rong Wen

arXiv: 2602.17410v1 发布: 2026-02-19 更新: 2026-02-19

下载 PDF arXiv 页面

AI 摘要

ILRec提出了一种新的LLM推荐框架，利用中间层的自生成困难负样本提升推荐性能。

主要贡献

提出了基于LLM的推荐框架ILRec
引入中间层自生成困难负样本作为负样本
设计了两阶段训练框架进行负样本优化和蒸馏

方法论

通过中间层提取自生成困难负样本，并利用两阶段训练框架（跨层偏好优化和蒸馏）提升模型性能，加入轻量级协同过滤模型缓解假负样本问题。

原文摘要

Large language models (LLMs) have shown great promise in recommender systems, where supervised fine-tuning (SFT) is commonly used for adaptation. Subsequent studies further introduce preference learning to incorporate negative samples into the training process. However, existing methods rely on sequence-level, offline-generated negatives, making them less discriminative and informative when adapting LLMs to recommendation tasks with large negative item spaces. To address these challenges, we propose ILRec, a novel preference fine-tuning framework for LLM-based recommendation, leveraging self-hard negative signals extracted from intermediate layers to improve preference learning. Specifically, we identify self-hard negative tokens from intermediate layers as fine-grained negative supervision that dynamically reflects the model's preference learning process. To effectively integrate these signals into training, we design a two-stage framework comprising cross-layer preference optimization and cross-layer preference distillation, enabling the model to jointly discriminate informative negatives and enhance the quality of negative signals from intermediate layers. In addition, we introduce a lightweight collaborative filtering model to assign token-level rewards for negative signals, mitigating the risk of over-penalizing false negatives. Extensive experiments on three datasets demonstrate ILRec's effectiveness in enhancing the performance of LLM-based recommender systems.

arXiv 分类

cs.IR cs.AI

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类