LLM Reasoning 相关度: 8/10

Regularized Calibration with Successive Rounding for Post-Training Quantization

Seohyeon Cha, Huancheng Chen, Dongjun Kim, Haoran Zhang, Kevin Chan, Gustavo de Veciana, Haris Vikalo
arXiv: 2602.05902v1 发布: 2026-02-05 更新: 2026-02-05

AI 摘要

提出基于正则化非对称校准的PTQ方法,通过连续舍入提高LLM量化性能。

主要贡献

  • 提出了正则化非对称校准目标
  • 设计了连续舍入过程
  • 提出了有界搜索扩展以平衡质量和成本

方法论

通过正则化非对称校准和连续舍入,将预训练权重映射到低比特格式,实现高效推理。

原文摘要

Large language models (LLMs) deliver robust performance across diverse applications, yet their deployment often faces challenges due to the memory and latency costs of storing and accessing billions of parameters. Post-training quantization (PTQ) enables efficient inference by mapping pretrained weights to low-bit formats without retraining, but its effectiveness depends critically on both the quantization objective and the rounding procedure used to obtain low-bit weight representations. In this work, we show that interpolating between symmetric and asymmetric calibration acts as a form of regularization that preserves the standard quadratic structure used in PTQ while providing robustness to activation mismatch. Building on this perspective, we derive a simple successive rounding procedure that naturally incorporates asymmetric calibration, as well as a bounded-search extension that allows for an explicit trade-off between quantization quality and the compute cost. Experiments across multiple LLM families, quantization bit-widths, and benchmarks demonstrate that the proposed bounded search based on a regularized asymmetric calibration objective consistently improves perplexity and accuracy over PTQ baselines, while incurring only modest and controllable additional computational cost.

标签

量化 后训练量化 大语言模型 模型压缩

arXiv 分类

cs.LG cs.AI