LLM Reasoning 相关度: 8/10

Variational Routing: A Scalable Bayesian Framework for Calibrated Mixture-of-Experts Transformers

Albus Yizhuo Li, Matthew Wicker

arXiv: 2603.09453v1 发布: 2026-03-10 更新: 2026-03-10

下载 PDF arXiv 页面

AI 摘要

VMoER通过变分推理建模MoE层路由选择的不确定性，提升了模型校准性和鲁棒性。

主要贡献

提出VMoER，一种针对MoE层的结构化贝叶斯不确定性建模方法
验证了VMoER在foundation model上的校准性和鲁棒性提升
证明了VMoER具有良好的扩展性，计算开销小

方法论

通过变分推理，在MoE层的专家选择阶段建模不确定性，并采用摊销变分推理或温度参数推断实现。

原文摘要

Foundation models are increasingly being deployed in contexts where understanding the uncertainty of their outputs is critical to ensuring responsible deployment. While Bayesian methods offer a principled approach to uncertainty quantification, their computational overhead renders their use impractical for training or inference at foundation model scale. State-of-the-art models achieve parameter counts in the trillions through carefully engineered sparsity including Mixture-of-Experts (MoE) layers. In this work, we demonstrate calibrated uncertainty at scale by introducing Variational Mixture-of-Experts Routing (VMoER), a structured Bayesian approach for modelling uncertainty in MoE layers. VMoER confines Bayesian inference to the expert-selection stage which is typically done by a deterministic routing network. We instantiate VMoER using two inference strategies: amortised variational inference over routing logits and inferring a temperature parameter for stochastic expert selection. Across tested foundation models, VMoER improves routing stability under noise by 38\%, reduces calibration error by 94\%, and increases out-of-distribution AUROC by 12\%, while incurring less than 1\% additional FLOPs. These results suggest VMoER offers a scalable path toward robust and uncertainty-aware foundation models.

arXiv 分类

cs.LG cs.AI stat.ML

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类