Excitation: Momentum For Experts
AI 摘要
Excitation提出了一种新的优化框架,通过动态调整专家利用率加速MoE模型的学习。
主要贡献
- 提出Excitation优化框架,加速MoE学习
- 解决了深层MoE中的“结构混淆”问题
- Excitation具有优化器、领域和模型无关性
方法论
Excitation利用batch-level专家利用率动态调制更新,增强高利用率专家并抑制低利用率专家,从而提高路由专业化。
原文摘要
We propose Excitation, a novel optimization framework designed to accelerate learning in sparse architectures such as Mixture-of-Experts (MoEs). Unlike traditional optimizers that treat all parameters uniformly, Excitation dynamically modulates updates using batch-level expert utilization. It introduces a competitive update dynamic that amplifies updates to highly-utilized experts and can selectively suppress low-utilization ones, effectively sharpening routing specialization. Notably, we identify a phenomenon of "structural confusion" in deep MoEs, where standard optimizers fail to establish functional signal paths; Excitation acts as a specialization catalyst, "rescuing" these models and enabling stable training where baselines remain trapped. Excitation is optimizer-, domain-, and model-agnostic, requires minimal integration effort, and introduces neither additional per-parameter optimizer state nor learnable parameters, making it highly viable for memory-constrained settings. Across language and vision tasks, Excitation consistently improves convergence speed and final performance in MoE models, indicating that active update modulation is a key mechanism for effective conditional computation.