Temporal Dependencies in In-Context Learning: The Role of Induction Heads
AI 摘要
研究发现LLM的上下文学习中,归纳头对时间依赖性处理和序列回忆行为至关重要。
主要贡献
- 揭示了LLM上下文学习中的序列回忆模式
- 证明了归纳头在实现这种模式中的作用
- 验证了去除归纳头会影响序列回忆能力
方法论
通过观察LLM在模仿认知科学的自由回忆范式中的行为,并通过消融实验来分析不同注意头的作用。
原文摘要
Large language models (LLMs) exhibit strong in-context learning capabilities, but how they track and retrieve information from context remains underexplored. Drawing on the free recall paradigm in cognitive science (where participants recall list items in any order), we show that several open-source LLMs consistently display a serial-recall-like pattern, assigning peak probability to tokens that immediately follow a repeated token in the input sequence. Through systematic ablation experiments, we show that induction heads, specialized attention heads that attend to the token following a previous occurrence of the current token, play an important role in this phenomenon. Removing heads with a high induction score substantially reduces the +1 lag bias, whereas ablating random heads does not reproduce the same reduction. We also show that removing heads with high induction scores impairs the performance of models prompted to do serial recall using few-shot learning to a larger extent than removing random heads. Our findings highlight a mechanistically specific connection between induction heads and temporal context processing in transformers, suggesting that these heads are especially important for ordered retrieval and serial-recall-like behavior during in-context learning.