AI Agents 相关度: 6/10

Variable-Length Semantic IDs for Recommender Systems

Kirill Khrylchenko
arXiv: 2602.16375v1 发布: 2026-02-18 更新: 2026-02-18

AI 摘要

提出一种变长语义ID的推荐系统模型,解决固定长度语义ID的效率和信息不对称问题。

主要贡献

  • 提出变长语义ID用于推荐系统
  • 使用离散变分自编码器学习项目表征
  • 避免了REINFORCE训练的不稳定性和固定长度约束

方法论

使用Gumbel-Softmax重参数化的离散变分自编码器,学习具有自适应长度的项目表征。

原文摘要

Generative models are increasingly used in recommender systems, both for modeling user behavior as event sequences and for integrating large language models into recommendation pipelines. A key challenge in this setting is the extremely large cardinality of item spaces, which makes training generative models difficult and introduces a vocabulary gap between natural language and item identifiers. Semantic identifiers (semantic IDs), which represent items as sequences of low-cardinality tokens, have recently emerged as an effective solution to this problem. However, existing approaches generate semantic identifiers of fixed length, assigning the same description length to all items. This is inefficient, misaligned with natural language, and ignores the highly skewed frequency structure of real-world catalogs, where popular items and rare long-tail items exhibit fundamentally different information requirements. In parallel, the emergent communication literature studies how agents develop discrete communication protocols, often producing variable-length messages in which frequent concepts receive shorter descriptions. Despite the conceptual similarity, these ideas have not been systematically adopted in recommender systems. In this work, we bridge recommender systems and emergent communication by introducing variable-length semantic identifiers for recommendation. We propose a discrete variational autoencoder with Gumbel-Softmax reparameterization that learns item representations of adaptive length under a principled probabilistic framework, avoiding the instability of REINFORCE-based training and the fixed-length constraints of prior semantic ID methods.

标签

推荐系统 语义ID 变分自编码器 生成模型 自然语言处理

arXiv 分类

cs.IR cs.CL cs.LG