HiAP: A Multi-Granular Stochastic Auto-Pruning Framework for Vision Transformers
AI 摘要
HiAP提出了一种多粒度随机自动剪枝框架,用于优化Vision Transformer的效率。
主要贡献
- 提出多粒度剪枝框架HiAP
- 无需手动启发式方法或预定义稀疏性目标
- 在单阶段端到端训练中发现最优子网络
方法论
HiAP引入随机Gumbel-Sigmoid门控,在多个粒度上进行剪枝,并结合结构可行性惩罚和FLOPs损失进行优化。
原文摘要
Vision Transformers require significant computational resources and memory bandwidth, severely limiting their deployment on edge devices. While recent structured pruning methods successfully reduce theoretical FLOPs, they typically operate at a single structural granularity and rely on complex, multi-stage pipelines with post-hoc thresholding to satisfy sparsity budgets. In this paper, we propose Hierarchical Auto-Pruning (HiAP), a continuous relaxation framework that discovers optimal sub-networks in a single end-to-end training phase without requiring manual importance heuristics or predefined per-layer sparsity targets. HiAP introduces stochastic Gumbel-Sigmoid gates at multiple granularities: macro-gates to prune entire attention heads and FFN blocks, and micro-gates to selectively prune intra-head dimensions and FFN neurons. By optimizing both levels simultaneously, HiAP addresses both the memory-bound overhead of loading large matrices and the compute-bound mathematical operations. HiAP naturally converges to stable sub-networks using a loss function that incorporates both structural feasibility penalties and analytical FLOPs. Extensive experiments on ImageNet demonstrate that HiAP organically discovers highly efficient architectures, and achieves a competitive accuracy-efficiency Pareto frontier for models like DeiT-Small, matching the performance of sophisticated multi-stage methods while significantly simplifying the deployment pipeline.