LLM Memory & RAG 相关度: 9/10

Trained Persistent Memory for Frozen Encoder--Decoder LLMs: Six Architectural Methods

Hong Jeong
arXiv: 2603.16413v1 发布: 2026-03-17 更新: 2026-03-17

AI 摘要

提出一种冻结LLM的持续性内存方法,在受限资源下实现对话学习。

主要贡献

  • 证明了在冻结LLM中实现持续性内存的可行性
  • 提出了六种不同的内存架构方法
  • 展示了memory bank可以在推理时持续积累信息

方法论

使用冻结的Flan-T5-XL作为骨干,训练小型可训练适配器,实现对持续性内存的读写操作,并在LoCoMo数据集上进行评估。

原文摘要

Frozen encoder--decoder language models are stateless: the latent representation is discarded after every forward pass, so no information persists across sessions. This paper presents a \textbf{proof-of-concept pilot study} showing that persistent memory in the \emph{continuous latent space} of a frozen LLM is feasible -- even under severe resource constraints (a single frozen Flan-T5-XL backbone, small trainable adapters, a single dataset). We implement six architectural methods spanning three injection points and four write mechanisms; unlike text-level memory systems, every write and read is a differentiable operation on dense vectors. After training only the adapter, the memory bank continues to accumulate at inference time without gradients, enabling \emph{conversational learning}. Under a forgetting-curve evaluation on LoCoMo at two capacity scales (1$\times$ and 10$\times$), the stateless baseline scores exactly zero; at 10$\times$ all six trained adapters produce positive memory-recall curves; at 1$\times$ three methods collapse, revealing capacity as a critical design parameter. Because the memory bank is a compact numerical array, it can be scaled to arbitrarily large capacity without altering the backbone. We argue that full end-to-end training with larger models, larger data, and orders-of-magnitude larger memory will yield substantially stronger results; this pilot study establishes the feasibility baseline and design-space taxonomy that such efforts require.

标签

persistent memory frozen LLM conversational learning

arXiv 分类

cs.LG cs.AI