1-Bit Wonder: Improving QAT Performance in the Low-Bit Regime through K-Means Quantization
AI 摘要
该论文研究了低比特量化感知训练,发现K-Means量化在1比特时性能最佳。
主要贡献
- 证明K-Means量化优于整数格式
- 发现在固定内存预算下,1比特量化权重在生成任务上表现最佳
- 对低比特量化下的QAT进行了实证研究
方法论
通过实验比较不同量化格式和比特宽度下的QAT性能,并在生成下游任务上评估模型表现。
原文摘要
Quantization-aware training (QAT) is an effective method to drastically reduce the memory footprint of LLMs while keeping performance degradation at an acceptable level. However, the optimal choice of quantization format and bit-width presents a challenge in practice. The full design space of quantization is not fully explored in the context of QAT, and the precise trade-off between quantization and downstream performance is poorly understood, as comparisons often rely solely on perplexity-based evaluations. In this work, we address these shortcomings with an empirical study of QAT in the low-bit regime. We show that k-means based weight quantization outperforms integer formats and can be implemented efficiently on standard hardware. Furthermore, we find that, under a fixed inference memory budget, the best performance on generative downstream tasks is achieved with $1$-bit quantized weights.