LLM Memory & RAG 相关度: 7/10

2Mamba2Furious: Linear in Complexity, Competitive in Accuracy

Gabriel Mongaras, Eric C. Larson
arXiv: 2602.17363v1 发布: 2026-02-19 更新: 2026-02-19

AI 摘要

通过简化和改进Mamba-2,论文提出了一种高效且精度接近softmax attention的模型2Mamba。

主要贡献

  • 简化Mamba-2并确定关键组件
  • 改进A-mask和隐藏状态维度提升精度
  • 提出2Mamba,在长序列上兼顾精度和效率

方法论

通过实验简化Mamba-2,分析其组件影响,然后改进A-mask和隐藏状态维度以提高精度。

原文摘要

Linear attention transformers have become a strong alternative to softmax attention due to their efficiency. However, linear attention tends to be less expressive and results in reduced accuracy compared to softmax attention. To bridge the accuracy gap between softmax attention and linear attention, we manipulate Mamba-2, a very strong linear attention variant. We first simplify Mamba-2 down to its most fundamental and important components, evaluating which specific choices make it most accurate. From this simplified Mamba variant (Mamba-2S), we improve the A-mask and increase the order of the hidden state, resulting in a method, which we call 2Mamba, that is nearly as accurate as softmax attention, yet much more memory efficient for long context lengths. We also investigate elements to Mamba-2 that help surpass softmax attention accuracy. Code is provided for all our experiments

标签

线性注意力 Mamba-2 长序列建模 效率优化 Transformer

arXiv 分类

cs.LG