Agent Tuning & Optimization 相关度: 6/10

Similarity-Aware Mixture-of-Experts for Data-Efficient Continual Learning

Connor Mclaughlin, Nigel Lee, Lili Su
arXiv: 2603.23436v1 发布: 2026-03-24 更新: 2026-03-24

AI 摘要

针对数据稀缺和任务重叠的持续学习问题,提出基于相似性感知的混合专家模型。

主要贡献

  • 提出自适应混合专家框架
  • 引入增量全局池化缓解提示关联噪声
  • 设计实例级提示掩码分解样本

方法论

构建预训练模型上的自适应MoE,通过增量全局池化和实例级提示掩码实现相似性感知和知识迁移。

原文摘要

Machine learning models often need to adapt to new data after deployment due to structured or unstructured real-world dynamics. The Continual Learning (CL) framework enables continuous model adaptation, but most existing approaches either assume each task contains sufficiently many data samples or that the learning tasks are non-overlapping. In this paper, we address the more general setting where each task may have a limited dataset, and tasks may overlap in an arbitrary manner without a priori knowledge. This general setting is substantially more challenging for two reasons. On the one hand, data scarcity necessitates effective contextualization of general knowledge and efficient knowledge transfer across tasks. On the other hand, unstructured task overlapping can easily result in negative knowledge transfer. To address the above challenges, we propose an adaptive mixture-of-experts (MoE) framework over pre-trained models that progressively establishes similarity awareness among tasks. Our design contains two innovative algorithmic components: incremental global pooling and instance-wise prompt masking. The former mitigates prompt association noise through gradual prompt introduction over time. The latter decomposes incoming task samples into those aligning with current prompts (in-distribution) and those requiring new prompts (out-of-distribution). Together, our design strategically leverages potential task overlaps while actively preventing negative mutual interference in the presence of per-task data scarcity. Experiments across varying data volumes and inter-task similarity show that our method enhances sample efficiency and is broadly applicable.

标签

Continual Learning Mixture-of-Experts Data Efficiency Knowledge Transfer

arXiv 分类

cs.LG