Multimodal Learning 相关度: 7/10

HLA: Hadamard Linear Attention

Hanno Ackermann, Hong Cai, Mohsen Ghafoorian, Amirhossein Habibian
arXiv: 2602.12128v1 发布: 2026-02-12 更新: 2026-02-12

AI 摘要

论文提出Hadamard线性注意力(HLA),旨在以更高阶有理函数近似softmax,提高效率。

主要贡献

  • 提出Hadamard线性注意力(HLA)
  • 使用更高阶有理函数近似softmax
  • 应用于大型扩散transformer模型视频生成

方法论

HLA在计算相似度后引入非线性,并通过高效计算方案,在视频生成扩散模型中验证其有效性。

原文摘要

The attention mechanism is an important reason for the success of transformers. It relies on computing pairwise relations between tokens. To reduce the high computational cost of standard quadratic attention, linear attention has been proposed as an efficient approximation. It employs kernel functions that are applied independently to the inputs before the pairwise similarities are calculated. That allows for an efficient computational procedure which, however, amounts to a low-degree rational function approximating softmax. We propose Hadamard Linear Attention (HLA). Unlike previous works on linear attention, the nonlinearity in HLA is not applied separately to queries and keys, but, analogously to standard softmax attention, after the pairwise similarities have been computed. It will be shown that the proposed nonlinearity amounts to a higher-degree rational function to approximate softmax. An efficient computational scheme for the proposed method is derived that is similar to that of standard linear attention. In contrast to other approaches, no time-consuming tensor reshaping is necessary to apply the proposed algorithm. The effectiveness of the approach is demonstrated by applying it to a large diffusion transformer model for video generation, an application that involves very large amounts of tokens.

标签

attention mechanism linear attention transformer video generation

arXiv 分类

cs.AI