Agent Tuning & Optimization 相关度: 7/10

Excitation: Momentum For Experts

Sagi Shaier
arXiv: 2602.21798v1 发布: 2026-02-25 更新: 2026-02-25

AI 摘要

Excitation提出了一种新的优化框架,通过动态调整专家利用率加速MoE模型的学习。

主要贡献

  • 提出Excitation优化框架,加速MoE学习
  • 解决了深层MoE中的“结构混淆”问题
  • Excitation具有优化器、领域和模型无关性

方法论

Excitation利用batch-level专家利用率动态调制更新,增强高利用率专家并抑制低利用率专家,从而提高路由专业化。

原文摘要

We propose Excitation, a novel optimization framework designed to accelerate learning in sparse architectures such as Mixture-of-Experts (MoEs). Unlike traditional optimizers that treat all parameters uniformly, Excitation dynamically modulates updates using batch-level expert utilization. It introduces a competitive update dynamic that amplifies updates to highly-utilized experts and can selectively suppress low-utilization ones, effectively sharpening routing specialization. Notably, we identify a phenomenon of "structural confusion" in deep MoEs, where standard optimizers fail to establish functional signal paths; Excitation acts as a specialization catalyst, "rescuing" these models and enabling stable training where baselines remain trapped. Excitation is optimizer-, domain-, and model-agnostic, requires minimal integration effort, and introduces neither additional per-parameter optimizer state nor learnable parameters, making it highly viable for memory-constrained settings. Across language and vision tasks, Excitation consistently improves convergence speed and final performance in MoE models, indicating that active update modulation is a key mechanism for effective conditional computation.

标签

Mixture-of-Experts Optimization Sparse Architectures Conditional Computation

arXiv 分类

cs.LG cs.AI