RAM-Net: Expressive Linear Attention with Selectively Addressable Memory
AI 摘要
RAM-Net通过可寻址稀疏内存提升线性注意力模型的表达能力和检索精度,同时保持计算效率。
主要贡献
- 提出了一种名为RAM-Net的新型架构,弥合了全注意力机制和线性模型的差距
- 引入了高维稀疏向量作为显式地址,允许模型选择性地访问大规模内存状态
- 验证了RAM-Net在长程检索任务中的优越性和在语言建模和常识推理任务中的竞争力
方法论
利用输入映射到高维稀疏向量作为地址,实现对大规模内存状态的选择性访问,从而提升模型表达能力。
原文摘要
While linear attention architectures offer efficient inference, compressing unbounded history into a fixed-size memory inherently limits expressivity and causes information loss. To address this limitation, we introduce Random Access Memory Network (RAM-Net), a novel architecture designed to bridge the gap between the representational capacity of full attention and the memory efficiency of linear models. The core of RAM-Net maps inputs to high-dimensional sparse vectors serving as explicit addresses, allowing the model to selectively access a massive memory state. This design enables exponential state size scaling without additional parameters, which significantly mitigates signal interference and enhances retrieval fidelity. Moreover, the inherent sparsity ensures exceptional computational efficiency, as state updates are confined to minimal entries. Extensive experiments demonstrate that RAM-Net consistently surpasses state-of-the-art baselines in fine-grained long-range retrieval tasks and achieves competitive performance in standard language modeling and zero-shot commonsense reasoning benchmarks, validating its superior capability to capture complex dependencies with significantly reduced computational overhead.