Agent Tuning & Optimization 相关度: 7/10

Beyond SGD, Without SVD: Proximal Subspace Iteration LoRA with Diagonal Fractional K-FAC

Abdulla Jasem Almansoori, Maria Ivanova, Andrey Veprikov, Aleksandr Beznosikov, Samuel Horváth, Martin Takáč
arXiv: 2602.16456v1 发布: 2026-02-18 更新: 2026-02-18

AI 摘要

提出了LoRSum方法,通过近端子空间迭代,在避免SVD的情况下高效微调LoRA模型。

主要贡献

  • 提出了LoRSum算法,高效优化LoRA
  • 将LoRA优化视为近端子问题并用ALS解决
  • 使用结构化指标(K-FAC, Shampoo)的对角线,提高内存效率

方法论

将LoRA优化建模为近端子问题,使用交替最小二乘更新(ALS)迭代求解,并结合结构化指标的对角线信息进行预处理。

原文摘要

Low-Rank Adaptation (LoRA) fine-tunes large models by learning low-rank updates on top of frozen weights, dramatically reducing trainable parameters and memory. In this work, we address the gap between training with full steps with low-rank projections (SVDLoRA) and LoRA fine-tuning. We propose LoRSum, a memory-efficient subroutine that closes this gap for gradient descent by casting LoRA optimization as a proximal sub-problem and solving it efficiently with alternating least squares updates, which we prove to be an implicit block power method. We recover several recently proposed preconditioning methods for LoRA as special cases, and show that LoRSum can also be used for updating a low-rank momentum. In order to address full steps with preconditioned gradient descent, we propose a scaled variant of LoRSum that uses structured metrics such as K-FAC and Shampoo, and we show that storing the diagonal of these metrics still allows them to perform well while remaining memory-efficient. Experiments on a synthetic task, CIFAR-100, and language-model fine-tuning on GLUE, SQuAD v2, and WikiText-103, show that our method can match or improve LoRA baselines given modest compute overhead, while avoiding full-matrix SVD projections and retaining LoRA-style parameter efficiency.

标签

LoRA Fine-tuning Low-Rank Adaptation Optimization K-FAC

arXiv 分类

cs.LG