Agent Tuning & Optimization 相关度: 6/10

Hexagon-MLIR: An AI Compilation Stack For Qualcomm's Neural Processing Units (NPUs)

Mohammed Javed Absar, Muthu Baskaran, Abhikrant Sharma, Abhilash Bhandari, Ankit Aggarwal, Arun Rangasamy, Dibyendu Das, Fateme Hosseini, Franck Slama, Iulian Brumar, Jyotsna Verma, Krishnaprasad Bindumadhavan, Mitesh Kothari, Mohit Gupta, Ravishankar Kolachana, Richard Lethin, Samarth Narang, Sanjay Motilal Ladwa, Shalini Jain, Snigdha Suresh Dalvi, Tasmia Rahman, Venkat Rasagna Reddy Komatireddy, Vivek Vasudevbhai Pandya, Xiyue Shi, Zachary Zipper
arXiv: 2602.19762v1 发布: 2026-02-23 更新: 2026-02-23

AI 摘要

Hexagon-MLIR:一个面向高通NPU的开源AI编译栈,统一支持Triton和PyTorch模型。

主要贡献

  • 构建基于MLIR的编译栈
  • 支持Triton内核和PyTorch模型
  • 优化NPU上数据局部性

方法论

采用MLIR框架,通过一系列pass优化NPU架构特性,加速AI工作负载。

原文摘要

In this paper, we present Hexagon-MLIR,an open-source compilation stack that targets Qualcomm Hexagon Neural Processing Unit (NPU) and provides unified support for lowering Triton kernels and PyTorch models . Built using the MLIR framework, our compiler applies a structured sequence of passes to exploit NPU architectural features to accelerate AI workloads. It enables faster deployment of new Triton kernels (hand-written or subgraphs from PyTorch 2.0), for our target by providing automated compilation from kernel to binary. By ingesting Triton kernels, we generate mega-kernels that maximize data locality in the NPU's Tightly Coupled Memory (TCM), reducing the bandwidth bottlenecks inherent in library-based approaches. This initiative complements our commercial toolchains by providing developers with an open-source MLIR-based compilation stack that gives them a path to advance AI compilation capabilities through a more flexible approach. Hexagon-MLIR is a work-in-progress, and we are continuing to add many more optimizations and capabilities in this effort.

标签

MLIR Qualcomm NPU Triton PyTorch Compiler

arXiv 分类

cs.PL cs.AI