LLM Memory & RAG 相关度: 8/10

Reinforced Fast Weights with Next-Sequence Prediction

Hee Seung Hwang, Xindi Wu, Sanghyuk Chun, Olga Russakovsky
arXiv: 2602.16704v1 发布: 2026-02-18 更新: 2026-02-18

AI 摘要

提出REFINE框架,通过强化学习优化Fast Weight模型,提升长文本建模能力。

主要贡献

  • 提出REFINE框架,使用NSP目标训练Fast Weight模型
  • 利用强化学习选择信息量大的token位置并生成多token序列
  • 在LaCT-760M和DeltaNet-1.3B上验证了REFINE的有效性

方法论

REFINE使用强化学习,基于预测熵选择token,生成多token序列,赋予序列级奖励,并使用GRPO优化模型。

原文摘要

Fast weight architectures offer a promising alternative to attention-based transformers for long-context modeling by maintaining constant memory overhead regardless of context length. However, their potential is limited by the next-token prediction (NTP) training paradigm. NTP optimizes single-token predictions and ignores semantic coherence across multiple tokens following a prefix. Consequently, fast weight models, which dynamically update their parameters to store contextual information, learn suboptimal representations that fail to capture long-range dependencies. We introduce REFINE (Reinforced Fast weIghts with Next sEquence prediction), a reinforcement learning framework that trains fast weight models under the next-sequence prediction (NSP) objective. REFINE selects informative token positions based on prediction entropy, generates multi-token rollouts, assigns self-supervised sequence-level rewards, and optimizes the model with group relative policy optimization (GRPO). REFINE is applicable throughout the training lifecycle of pre-trained language models: mid-training, post-training, and test-time training. Our experiments on LaCT-760M and DeltaNet-1.3B demonstrate that REFINE consistently outperforms supervised fine-tuning with NTP across needle-in-a-haystack retrieval, long-context question answering, and diverse tasks in LongBench. REFINE provides an effective and versatile framework for improving long-context modeling in fast weight architectures.

标签

Fast Weight Reinforcement Learning Long Context Modeling Next Sequence Prediction

arXiv 分类

cs.CL