Agent Tuning & Optimization 相关度: 7/10

Excitation: Momentum For Experts

Sagi Shaier

arXiv: 2602.21798v1 发布: 2026-02-25 更新: 2026-02-25

下载 PDF arXiv 页面

AI 摘要

Excitation提出了一种新的优化框架，通过动态调整专家利用率加速MoE模型的学习。

主要贡献

提出Excitation优化框架，加速MoE学习
解决了深层MoE中的“结构混淆”问题
Excitation具有优化器、领域和模型无关性

方法论

Excitation利用batch-level专家利用率动态调制更新，增强高利用率专家并抑制低利用率专家，从而提高路由专业化。

原文摘要

We propose Excitation, a novel optimization framework designed to accelerate learning in sparse architectures such as Mixture-of-Experts (MoEs). Unlike traditional optimizers that treat all parameters uniformly, Excitation dynamically modulates updates using batch-level expert utilization. It introduces a competitive update dynamic that amplifies updates to highly-utilized experts and can selectively suppress low-utilization ones, effectively sharpening routing specialization. Notably, we identify a phenomenon of "structural confusion" in deep MoEs, where standard optimizers fail to establish functional signal paths; Excitation acts as a specialization catalyst, "rescuing" these models and enabling stable training where baselines remain trapped. Excitation is optimizer-, domain-, and model-agnostic, requires minimal integration effort, and introduces neither additional per-parameter optimizer state nor learnable parameters, making it highly viable for memory-constrained settings. Across language and vision tasks, Excitation consistently improves convergence speed and final performance in MoE models, indicating that active update modulation is a key mechanism for effective conditional computation.

arXiv 分类

cs.LG cs.AI

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类