LLM Memory & RAG 相关度: 8/10

MANAR: Memory-augmented Attention with Navigational Abstract Conceptual Representation

Zuher Jahshan, Ben Ben Ishay, Leonid Yavits

arXiv: 2603.18676v1 发布: 2026-03-19 更新: 2026-03-19

下载 PDF arXiv 页面

AI 摘要

MANAR通过记忆增强注意力机制和抽象概念表示，实现线性时间复杂度的全局信息整合，提升模型性能。

主要贡献

提出了MANAR架构，结合记忆增强注意力机制和抽象概念表示
实现了线性时间复杂度的注意力机制，解决了传统注意力机制的二次复杂度问题
证明了MANAR能够生成非凸上下文表示，体现了GWT的创造性合成

方法论

MANAR通过可训练的抽象概念记忆和抽象概念表示，模拟全局工作空间理论，实现信息的整合和广播，从而进行上下文建模。

原文摘要

MANAR (Memory-augmented Attention with Navigational Abstract Conceptual Representation), contextualization layer generalizes standard multi-head attention (MHA) by instantiating the principles of Global Workspace Theory (GWT). While MHA enables unconstrained all-to-all communication, it lacks the functional bottleneck and global integration mechanisms hypothesized in cognitive models of consciousness. MANAR addresses this by implementing a central workspace through a trainable memory of abstract concepts and an Abstract Conceptual Representation (ACR). The architecture follows a two-stage logic that maps directly to GWT mechanics: (i) an integration phase, where retrieved memory concepts converge to form a collective "mental image" (the ACR) based on input stimuli; and (ii) a broadcasting phase, where this global state navigates and informs the contextualization of individual local tokens. We demonstrate that efficient linear-time scaling is a fundamental architectural byproduct of instantiating GWT functional bottleneck, as routing global information through a constant-sized ACR resolves the quadratic complexity inherent in standard attention. MANAR is a compatible re-parameterization of MHA with identical semantic roles for its projections, enabling knowledge transfer from pretrained transformers via weight-copy and thus overcoming the adoption barriers of structurally incompatible linear-time alternatives. MANAR enables non-convex contextualization, synthesizing representations that provably lie outside the convex hull of input tokens - a mathematical reflection of the creative synthesis described in GWT. Empirical evaluations confirm that MANAR matches or exceeds strong baselines across language (GLUE score of 85.1), vision (83.9% ImageNet-1K), and speech (2.7% WER on LibriSpeech), positioning it as an efficient and expressive alternative to quadratic attention.

arXiv 分类

cs.AI

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类