Kirin: Improving ANN efficiency with SNN Hybridization
AI 摘要
Kirin提出了一种整数和脉冲混合的SNN,实现了ANN到SNN的无损精度转换,并提高了时间和能源效率。
主要贡献
- 提出了 Spike Matrix Hybridization 策略,降低延迟
- 引入了 Silence Threshold 机制,保持精度
- 实验结果表明在精度接近FP16的情况下,能源消耗降低84.66%,时间步长缩短93.75%
方法论
通过混合整数和脉冲,并引入沉默阈值机制,优化ANN到SNN的转换过程,实现精度保持和效率提升。
原文摘要
Artificial neural networks (ANNs), particularly large language models (LLMs), demonstrate powerful inference capabilities but consume substantial energy. Conversely, spiking neural networks (SNNs) exhibit exceptional energy efficiency due to their binary and event-driven characteristics, thus motivating the study of ANN-to-SNN conversion. In this process, quantization plays a pivotal role, mapping LLMs' floating-point parameters to discrete SNN parameters via the temporal dimension of the time window. However, several challenges remain in the conversion process: (i) converting high bit-width quantization values into binary spikes requires longer time windows, increasing system latency; and (ii) the inherent trade-off between the information loss of single-spike schemes and the energy costs of multi-spike ones in SNN. To address these challenges, we propose Kirin, a integer and spike hybrid based SNN to achieve accuracy lossless ANN-to-SNN conversion with time and energy efficiency. Specifically, we first propose a Spike Matrix Hybridization strategy that encoding low bit-width parameters that leading to small time window size into binary spikes while preserving the rest in integer format, thereby reducing the overall latency of SNN execution. Second, we introduce a silence threshold mechanism to regulate the timing of single-spike firing, ensuring the output is mathematically equivalent to the LLM's output and preserves accuracy. Experimental results demonstrate that Kirin, under a W4A4\&8 quantization setting, achieves near-FP16 accuracy while reducing energy consumption by up to 84.66\% and shortening time steps by 93.75\%.