LLM Memory & RAG 相关度: 7/10

2Mamba2Furious: Linear in Complexity, Competitive in Accuracy

Gabriel Mongaras, Eric C. Larson

arXiv: 2602.17363v1 发布: 2026-02-19 更新: 2026-02-19

下载 PDF arXiv 页面

AI 摘要

通过简化和改进Mamba-2，论文提出了一种高效且精度接近softmax attention的模型2Mamba。

主要贡献

简化Mamba-2并确定关键组件
改进A-mask和隐藏状态维度提升精度
提出2Mamba，在长序列上兼顾精度和效率

方法论

通过实验简化Mamba-2，分析其组件影响，然后改进A-mask和隐藏状态维度以提高精度。

原文摘要

Linear attention transformers have become a strong alternative to softmax attention due to their efficiency. However, linear attention tends to be less expressive and results in reduced accuracy compared to softmax attention. To bridge the accuracy gap between softmax attention and linear attention, we manipulate Mamba-2, a very strong linear attention variant. We first simplify Mamba-2 down to its most fundamental and important components, evaluating which specific choices make it most accurate. From this simplified Mamba variant (Mamba-2S), we improve the A-mask and increase the order of the hidden state, resulting in a method, which we call 2Mamba, that is nearly as accurate as softmax attention, yet much more memory efficient for long context lengths. We also investigate elements to Mamba-2 that help surpass softmax attention accuracy. Code is provided for all our experiments

arXiv 分类

cs.LG

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类