Agent Tuning & Optimization 相关度: 5/10

Multi-DNN Inference of Sparse Models on Edge SoCs

Jiawei Luo, Di Wu, Simon Dobson, Blesson Varghese

arXiv: 2603.09642v1 发布: 2026-03-10 更新: 2026-03-10

下载 PDF arXiv 页面

AI 摘要

SparseLoom通过模型缝合技术优化边缘设备上多DNN推理系统，提升效率。

主要贡献

提出模型缝合技术，创建模型变体
设计并实现SparseLoom系统
实验证明能降低SLO违规率、提升吞吐量、降低内存开销

方法论

通过重组稀疏模型的子图创建模型变体，无需重新训练，并部署到SoC进行验证。

原文摘要

Modern edge applications increasingly require multi-DNN inference systems to execute tasks on heterogeneous processors, gaining performance from both concurrent execution and from matching each model to the most suited accelerator. However, existing systems support only a single model (or a few sparse variants) per task, which impedes the efficiency of this matching and results in high Service Level Objective violation rates. We introduce model stitching for multi-DNN inference systems, which creates model variants by recombining subgraphs from sparse models without re-training. We present a demonstrator system, SparseLoom, that shows model stitching can be deployed to SoCs. We show experimentally that SparseLoom reduces SLO violation rates by up to 74%, improves throughput by up to 2.31x, and lowers memory overhead by an average of 28% compared to state-of-the-art multi-DNN inference systems.

arXiv 分类

cs.DC cs.LG cs.PF

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类