Agent Tuning & Optimization 相关度: 7/10

HiAP: A Multi-Granular Stochastic Auto-Pruning Framework for Vision Transformers

Andy Li, Aiden Durrant, Milan Markovic, Georgios Leontidis

arXiv: 2603.12222v1 发布: 2026-03-12 更新: 2026-03-12

下载 PDF arXiv 页面

AI 摘要

HiAP提出了一种多粒度随机自动剪枝框架，用于优化Vision Transformer的效率。

主要贡献

提出多粒度剪枝框架HiAP
无需手动启发式方法或预定义稀疏性目标
在单阶段端到端训练中发现最优子网络

方法论

HiAP引入随机Gumbel-Sigmoid门控，在多个粒度上进行剪枝，并结合结构可行性惩罚和FLOPs损失进行优化。

原文摘要

Vision Transformers require significant computational resources and memory bandwidth, severely limiting their deployment on edge devices. While recent structured pruning methods successfully reduce theoretical FLOPs, they typically operate at a single structural granularity and rely on complex, multi-stage pipelines with post-hoc thresholding to satisfy sparsity budgets. In this paper, we propose Hierarchical Auto-Pruning (HiAP), a continuous relaxation framework that discovers optimal sub-networks in a single end-to-end training phase without requiring manual importance heuristics or predefined per-layer sparsity targets. HiAP introduces stochastic Gumbel-Sigmoid gates at multiple granularities: macro-gates to prune entire attention heads and FFN blocks, and micro-gates to selectively prune intra-head dimensions and FFN neurons. By optimizing both levels simultaneously, HiAP addresses both the memory-bound overhead of loading large matrices and the compute-bound mathematical operations. HiAP naturally converges to stable sub-networks using a loss function that incorporates both structural feasibility penalties and analytical FLOPs. Extensive experiments on ImageNet demonstrate that HiAP organically discovers highly efficient architectures, and achieves a competitive accuracy-efficiency Pareto frontier for models like DeiT-Small, matching the performance of sophisticated multi-stage methods while significantly simplifying the deployment pipeline.

arXiv 分类

cs.CV cs.LG

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类