Agent Tuning & Optimization 相关度: 7/10

LoRA-Squeeze: Simple and Effective Post-Tuning and In-Tuning Compression of LoRA Modules

Ivan Vulić, Adam Grycner, Quentin de Laroussilhe, Jonas Pfeiffer
arXiv: 2602.10993v1 发布: 2026-02-11 更新: 2026-02-11

AI 摘要

LoRA-Squeeze通过后处理和训练时压缩LoRA模块来提升性能,简化部署。

主要贡献

  • 提出LoRA-Squeeze压缩方法
  • 后处理压缩优于直接训练低秩LoRA
  • 训练时秩退火实现最佳性能-大小权衡

方法论

先训练高秩LoRA,然后通过RSVD重构并压缩到目标秩,可后处理或训练时进行。

原文摘要

Despite its huge number of variants, standard Low-Rank Adaptation (LoRA) is still a dominant technique for parameter-efficient fine-tuning (PEFT). Nonetheless, it faces persistent challenges, including the pre-selection of an optimal rank and rank-specific hyper-parameters, as well as the deployment complexity of heterogeneous-rank modules and more sophisticated LoRA derivatives. In this work, we introduce LoRA-Squeeze, a simple and efficient methodology that aims to improve standard LoRA learning by changing LoRA module ranks either post-hoc or dynamically during training}. Our approach posits that it is better to first learn an expressive, higher-rank solution and then compress it, rather than learning a constrained, low-rank solution directly. The method involves fine-tuning with a deliberately high(er) source rank, reconstructing or efficiently approximating the reconstruction of the full weight update matrix, and then using Randomized Singular Value Decomposition (RSVD) to create a new, compressed LoRA module at a lower target rank. Extensive experiments across 13 text and 10 vision-language tasks show that post-hoc compression often produces lower-rank adapters that outperform those trained directly at the target rank, especially if a small number of fine-tuning steps at the target rank is allowed. Moreover, a gradual, in-tuning rank annealing variant of LoRA-Squeeze consistently achieves the best LoRA size-performance trade-off.

标签

LoRA 参数高效微调 模型压缩 奇异值分解

arXiv 分类

cs.CL cs.AI