LLM Memory & RAG 相关度: 8/10

1-Bit Wonder: Improving QAT Performance in the Low-Bit Regime through K-Means Quantization

Sohir Maskey, Constantin Eichenberg, Johannes Messner, Douglas Orr

arXiv: 2602.15563v1 发布: 2026-02-17 更新: 2026-02-17

下载 PDF arXiv 页面

AI 摘要

该论文研究了低比特量化感知训练，发现K-Means量化在1比特时性能最佳。

主要贡献

证明K-Means量化优于整数格式
发现在固定内存预算下，1比特量化权重在生成任务上表现最佳
对低比特量化下的QAT进行了实证研究

方法论

通过实验比较不同量化格式和比特宽度下的QAT性能，并在生成下游任务上评估模型表现。

原文摘要

Quantization-aware training (QAT) is an effective method to drastically reduce the memory footprint of LLMs while keeping performance degradation at an acceptable level. However, the optimal choice of quantization format and bit-width presents a challenge in practice. The full design space of quantization is not fully explored in the context of QAT, and the precise trade-off between quantization and downstream performance is poorly understood, as comparisons often rely solely on perplexity-based evaluations. In this work, we address these shortcomings with an empirical study of QAT in the low-bit regime. We show that k-means based weight quantization outperforms integer formats and can be implemented efficiently on standard hardware. Furthermore, we find that, under a fixed inference memory budget, the best performance on generative downstream tasks is achieved with $1$-bit quantized weights.

arXiv 分类

cs.LG

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类