Agent Tuning & Optimization 相关度: 6/10

A Theoretical Framework for Modular Learning of Robust Generative Models

Corinna Cortes, Mehryar Mohri, Yutao Zhong
arXiv: 2602.17554v1 发布: 2026-02-19 更新: 2026-02-19

AI 摘要

提出一种模块化生成模型框架,通过组合领域专家模型提升性能和鲁棒性,并提供理论证明和算法。

主要贡献

  • 提出模块化生成模型框架,解决大规模生成模型训练资源消耗问题。
  • 证明了模块化模型在鲁棒性和泛化性方面的优势。
  • 设计了可扩展的随机原始对偶算法和结构蒸馏方法。
  • 实验验证了模块化架构能有效缓解梯度冲突并超越单体模型。

方法论

通过预训练专家模型和门控机制组合,使用minimax博弈寻找鲁棒门控函数,并结合 Kakutani 不动点定理进行理论证明。

原文摘要

Training large-scale generative models is resource-intensive and relies heavily on heuristic dataset weighting. We address two fundamental questions: Can we train Large Language Models (LLMs) modularly-combining small, domain-specific experts to match monolithic performance-and can we do so robustly for any data mixture, eliminating heuristic tuning? We present a theoretical framework for modular generative modeling where a set of pre-trained experts are combined via a gating mechanism. We define the space of normalized gating functions, $G_{1}$, and formulate the problem as a minimax game to find a single robust gate that minimizes divergence to the worst-case data mixture. We prove the existence of such a robust gate using Kakutani's fixed-point theorem and show that modularity acts as a strong regularizer, with generalization bounds scaling with the lightweight gate's complexity. Furthermore, we prove that this modular approach can theoretically outperform models retrained on aggregate data, with the gap characterized by the Jensen-Shannon Divergence. Finally, we introduce a scalable Stochastic Primal-Dual algorithm and a Structural Distillation method for efficient inference. Empirical results on synthetic and real-world datasets confirm that our modular architecture effectively mitigates gradient conflict and can robustly outperform monolithic baselines.

标签

生成模型 模块化学习 鲁棒性 专家模型 门控机制

arXiv 分类

cs.LG stat.ML