Stateless Yet Not Forgetful: Implicit Memory as a Hidden Channel in LLMs
AI 摘要
该论文揭示了LLM中一种名为“隐式记忆”的新机制,允许模型跨会话传递信息,并探讨其潜在风险。
主要贡献
- 发现了LLM中隐式记忆的存在,无需显式记忆模块即可跨会话传递信息
- 提出了基于隐式记忆的时间炸弹后门攻击,展示了其潜在危害
- 分析了隐式记忆在跨Agent通信、基准测试污染等方面的广泛影响
方法论
通过prompting或微调的方式,诱导LLM产生具有隐式记忆的行为,并通过实验验证其有效性。
原文摘要
Large language models (LLMs) are commonly treated as stateless: once an interaction ends, no information is assumed to persist unless it is explicitly stored and re-supplied. We challenge this assumption by introducing implicit memory-the ability of a model to carry state across otherwise independent interactions by encoding information in its own outputs and later recovering it when those outputs are reintroduced as input. This mechanism does not require any explicit memory module, yet it creates a persistent information channel across inference requests. As a concrete demonstration, we introduce a new class of temporal backdoors, which we call time bombs. Unlike conventional backdoors that activate on a single trigger input, time bombs activate only after a sequence of interactions satisfies hidden conditions accumulated via implicit memory. We show that such behavior can be induced today through straightforward prompting or fine-tuning. Beyond this case study, we analyze broader implications of implicit memory, including covert inter-agent communication, benchmark contamination, targeted manipulation, and training-data poisoning. Finally, we discuss detection challenges and outline directions for stress-testing and evaluation, with the goal of anticipating and controlling future developments. To promote future research, we release code and data at: https://github.com/microsoft/implicitMemory.