InstMeter: An Instruction-Level Method to Predict Energy and Latency of DL Model Inference on MCUs
AI 摘要
InstMeter通过MCU时钟周期预测DL模型在MCU上的能耗和延迟,精度高且所需数据量少。
主要贡献
- 提出InstMeter,一种基于时钟周期的DL模型能耗和延迟预测器
- InstMeter具有强线性性,简单且准确
- InstMeter在不同场景、MCU和软件设置下表现优异
方法论
利用MCU的时钟周期作为基本指标,建立线性模型来预测DL模型的能耗和延迟,并进行大量实验验证。
原文摘要
Deep learning (DL) models can now run on microcontrollers (MCUs). Through neural architecture search (NAS), we can search DL models that meet the constraints of MCUs. Among various constraints, energy and latency costs of the model inference are critical metrics. To predict them, existing research relies on coarse proxies such as multiply-accumulations (MACs) and model's input parameters, often resulting in inaccurate predictions or requiring extensive data collection. In this paper, we propose InstMeter, a predictor leveraging MCUs' clock cycles to accurately estimate the energy and latency of DL models. Clock cycles are fundamental metrics reflecting MCU operations, directly determining energy and latency costs. Furthermore, a unique property of our predictor is its strong linearity, allowing it to be simple and accurate. We thoroughly evaluate InstMeter under different scenarios, MCUs, and software settings. Compared with state-of-the-art studies, InstMeter can reduce the energy and latency prediction errors by $3\times$ and $6.5\times$, respectively, while requiring $100\times$ and $10\times$ less training data. In the NAS scenario, InstMeter can fully exploit the energy budget, identifying optimal DL models with higher inference accuracy. We also evaluate InstMeter's generalization performance through various experiments on three ARM MCUs (Cortex-M4, M7, M33) and one RISC-V-based MCU (ESP32-C3), different compilation options (-Os, -O2), GCC versions (v7.3, v10.3), application scenarios (keyword spotting, image recognition), dynamic voltage and frequency scaling, temperatures (21°C, 43°C), and software settings (TFLMv2.4, TFLMvCI). We will open our source codes and the MCU-specific benchmark datasets.