Agent Tuning & Optimization 相关度: 7/10

D2-LoRA: A Synergistic Approach to Differential and Directional Low-Rank Adaptation

Nozomu Fujisawa, Masaaki Kondo
arXiv: 2602.14728v1 发布: 2026-02-16 更新: 2026-02-16

AI 摘要

D2-LoRA是一种参数高效的微调方法,在保证性能的同时,实现了代数可合并性和低推理延迟。

主要贡献

  • 提出D2-LoRA,一种结合符号低秩残差更新和列向投影的微调方法
  • D2-LoRA在问答、阅读理解和生成任务中表现优于LoRA和DoRA
  • 提供了D2-LoRA稳定训练的几何分析和消融研究

方法论

D2-LoRA结合符号低秩残差更新、加性和减性组件,以及保持列范数接近原始值的训练时列向投影。

原文摘要

We systematically investigate the parameter-efficient fine-tuning design space under practical data and compute constraints, and propose D2-LoRA. D2-LoRA achieves 76.4 percent average accuracy across eight question answering and reading comprehension benchmarks using only 5k training samples per task and two epochs, while preserving algebraic mergeability at inference with near-exact numerical equivalence. The method combines signed low-rank residual updates with additive and subtractive components, together with a train-time column-wise projection that keeps each column close to its original norm. After training, the adapter is merged into a single weight matrix, adding zero inference latency. Compared with LoRA, D2-LoRA improves average accuracy by 2.2 percentage points; at matched parameter counts (LoRA rank 2r versus D2-LoRA rank r), the improvement is 1.6 points, indicating gains from architectural design rather than increased parameterization. Compared with DoRA, it matches or exceeds performance on most tasks. Beyond QA and reading comprehension, D2-LoRA improves generative tasks (plus 1.2 ROUGE-L and plus 1.1 percent win rate) and shows 36 percent lower training volatility. The merge preserves numerical fidelity (mean gap about 0.03 percentage points) and recovers about 1.91x evaluation throughput. Training overhead is 19 percent, comparable to DoRA, and decreases with longer input sequences. We provide a geometric analysis explaining how the projection stabilizes training, together with ablation studies isolating the contribution of each design component.

标签

parameter-efficient fine-tuning low-rank adaptation algebraic mergeability numerical fidelity

arXiv 分类

cs.LG