LLM Reasoning 相关度: 8/10

FlexMoRE: A Flexible Mixture of Rank-heterogeneous Experts for Efficient Federatedly-trained Large Language Models

Annemette Brok Pirchert, Jacob Nielsen, Mogens Henrik From, Lukas Galke Poech, Peter Schneider-Kamp
arXiv: 2602.08818v1 发布: 2026-02-09 更新: 2026-02-09

AI 摘要

FlexMoRE提出了一种灵活的混合专家模型,通过异构秩专家提升联邦训练大语言模型的效率和性能。

主要贡献

  • 提出FlexMoRE,一种灵活的混合秩异构专家模型。
  • 系统性地研究了专家秩与下游任务性能之间的权衡。
  • 实验表明,最优秩的选择与任务类型(推理或知识密集型)相关,并能显著提升效率。

方法论

通过在FlexOlmo基础上,将预训练专家转化为低秩版本,并进行大量混合专家模型的下游任务性能评估和回归分析。

原文摘要

Recent advances in mixture-of-experts architectures have shown that individual experts models can be trained federatedly, i.e., in isolation from other experts by using a common base model to facilitate coordination. However, we hypothesize that full-sized experts may not be necessary for all domains and that instead low-rank adapters may be sufficient. Here, we introduce FlexMoRE, a Flexible Mixture of Rank-heterogenous Experts, which may be either full-sized experts or adapters of a suitable rank. We systematically investigate the trade-off between expert rank and downstream task performance by evaluating $6$ experts with ranks $2^0$ to $2^{14}$ resulting in experiments covering 150 mixtures (96 with 2 experts, 54 with 7 experts) that are evaluated across $120$ tasks. For our experiments, we build on FlexOlmo and turn its pre-trained experts into low-rank versions. Our regression analysis from expert rank to downstream task performance reveals that the best-performing rank is substantially higher for reasoning-heavy benchmarks than for knowledge-heavy benchmarks. These findings on rank sensitivity come with direct implications for memory efficiency: Using optimal ranks, FlexMoRE yields improved downstream task performance (average score $47.18$) compared to the baseline FlexOlmo-style mixture of full-sized experts (average score $45.46$) at less than one third the parameters ($10.75$B for FlexMoRE vs. $33.27$B for FlexOlmo). All code will be made available.

标签

混合专家 联邦学习 低秩分解 大语言模型 模型压缩

arXiv 分类

cs.LG