• Stars
    star
    423
  • Rank 101,836 (Top 3 %)
  • Language
    Python
  • License
    MIT License
  • Created over 2 years ago
  • Updated 6 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

[NeurIPS 2020] MCUNet: Tiny Deep Learning on IoT Devices; [NeurIPS 2021] MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning

MCUNet: Tiny Deep Learning on IoT Devices

This is the official implementation of the MCUNet series.

website | paper | paper (v2) | demo video

demo

News

If you are interested in getting updates, please sign up here to get notified!

Overview

Microcontrollers are low-cost, low-power hardware. They are widely deployed and have wide applications.

teaser

But the tight memory budget (50,000x smaller than GPUs) makes deep learning deployment difficult.

teaser

MCUNet is a system-algorithm co-design framework for tiny deep learning on microcontrollers. It consists of TinyNAS and TinyEngine. They are co-designed to fit the tight memory budgets.

With system-algorithm co-design, we can significantly improve the deep learning performance on the same tiny memory budget.

teaser

Our TinyEngine inference engine could be a useful infrastructure for MCU-based AI applications. It significantly improves the inference speed and reduces the memory usage compared to existing libraries like TF-Lite Micro, CMSIS-NN, MicroTVM, etc. It improves the inference speed by 1.5-3x, and reduces the peak memory by 2.7-4.8x.

teaser

Model Zoo

Usage

You can build the pre-trained PyTorch fp32 model or the int8 quantized model in TF-Lite format.

from mcunet.model_zoo import net_id_list, build_model, download_tflite
print(net_id_list)  # the list of models in the model zoo

# pytorch fp32 model
model, image_size, description = build_model(net_id="mcunet-in3", pretrained=True)  # you can replace net_id with any other option from net_id_list

# download tflite file to tflite_path
tflite_path = download_tflite(net_id="mcunet-in3")

Evaluate

To evaluate the accuracy of PyTorch fp32 models, run:

python eval_torch.py --net_id mcunet-in2 --dataset {imagenet/vww} --data-dir PATH/TO/DATA/val

To evaluate the accuracy of TF-Lite int8 models, run:

python eval_tflite.py --net_id mcunet-in2 --dataset {imagenet/vww} --data-dir PATH/TO/DATA/val

Model List

  • Note that all the latency, SRAM, and Flash usage are profiled with TinyEngine on STM32F746.
  • Here we only provide the int8 quantized modes. int4 quantized models (as shown in the paper) can further push the accuracy-memory trade-off, but lacking a general format support.
  • For accuracy (top1, top-5), we report the accuracy of fp32/int8 models respectively

The ImageNet model list:

net_id MACs #Params SRAM Flash Res. Top-1
(fp32/int8)
Top-5
(fp32/int8)
# baseline models
mbv2-w0.35 23.5M 0.75M 308kB 862kB 144 49.7%/49.0% 74.6%/73.8%
proxyless-w0.3 38.3M 0.75M 292kB 892kB 176 57.0%/56.2% 80.2%/79.7%
# mcunet models
mcunet-in0 6.4M 0.75M 266kB 889kB 48 41.5%/40.4% 66.3%/65.2%
mcunet-in1 12.8M 0.64M 307kB 992kB 96 51.5%/49.9% 75.5%/74.1%
mcunet-in2 67.3M 0.73M 242kB 878kB 160 60.9%/60.3% 83.3%/82.6%
mcunet-in3 81.8M 0.74M 293kB 897kB 176 62.2%/61.8% 84.5%/84.2%
mcunet-in4 125.9M 1.73M 456kB 1876kB 160 68.4%/68.0% 88.4%/88.1%

The VWW model list:

Note that the VWW dataset might be hard to prepare. You can download our pre-built minival set from here, around 380MB.

net_id MACs #Params SRAM Flash Resolution Top-1
(fp32/int8)
mcunet-vww0 6.0M 0.37M 146kB 617kB 64 87.4%/87.3%
mcunet-vww1 11.6M 0.43M 162kB 689kB 80 88.9%/88.9%
mcunet-vww2 55.8M 0.64M 311kB 897kB 144 91.7%/91.8%

For TF-Lite int8 models, we do not use quantization-aware training (QAT), so some results is slightly lower than paper numbers.

Detection Model

We also share the person detection model used in the demo. To visualize the model's prediction on a sample image, please run the following command:

python eval_det.py

It will visualize the prediction here: assets/sample_images/person_det_vis.jpg.

The model takes in a small input resolution of 128x160 to reduce memory usage. It does not achieve state-of-the-art performance due to the limited image and model size but should provide decent performance for tinyML applications (please check the demo for a video recording). We will also release the deployment code in the upcoming TinyEngine release.

Requirement

  • Python 3.6+

  • PyTorch 1.4.0+

  • Tensorflow 1.15 (if you want to test TF-Lite models; CPU support only)

Acknowledgement

We thank MIT-IBM Watson AI Lab, Intel, Amazon, SONY, Qualcomm, NSF for supporting this research.

Citation

If you find the project helpful, please consider citing our paper:

@article{lin2020mcunet,
  title={Mcunet: Tiny deep learning on iot devices},
  author={Lin, Ji and Chen, Wei-Ming and Lin, Yujun and Gan, Chuang and Han, Song},
  journal={Advances in Neural Information Processing Systems},
  volume={33},
  year={2020}
}

@inproceedings{
  lin2021mcunetv2,
  title={MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning},
  author={Lin, Ji and Chen, Wei-Ming and Cai, Han and Gan, Chuang and Han, Song},
  booktitle={Annual Conference on Neural Information Processing Systems (NeurIPS)},
  year={2021}
} 

@article{
  lin2022ondevice, 
  title = {On-Device Training Under 256KB Memory},
  author = {Lin, Ji and Zhu, Ligeng and Chen, Wei-Ming and Wang, Wei-Chen and Gan, Chuang and Han, Song}, 
  journal = {arXiv:2206.15472 [cs]},
  url = {https://arxiv.org/abs/2206.15472},
  year = {2022}
}

Related Projects

On-Device Training Under 256KB Memory (NeurIPS'22)

TinyTL: Reduce Memory, Not Parameters for Efficient On-Device Learning (NeurIPS'20)

Once for All: Train One Network and Specialize it for Efficient Deployment (ICLR'20)

ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware (ICLR'19)

AutoML for Architecting Efficient and Specialized Neural Networks (IEEE Micro)

AMC: AutoML for Model Compression and Acceleration on Mobile Devices (ECCV'18)

HAQ: Hardware-Aware Automated Quantization (CVPR'19, oral)

More Repositories

1

streaming-llm

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
Python
6,323
star
2

bevfusion

[ICRA'23] BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation
Python
2,153
star
3

temporal-shift-module

[ICCV 2019] TSM: Temporal Shift Module for Efficient Video Understanding
Python
2,040
star
4

once-for-all

[ICLR 2020] Once for All: Train One Network and Specialize it for Efficient Deployment
Python
1,860
star
5

llm-awq

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Python
1,687
star
6

proxylessnas

[ICLR 2019] ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware
C++
1,415
star
7

data-efficient-gans

[NeurIPS 2020] Differentiable Augmentation for Data-Efficient GAN Training
Python
1,272
star
8

torchquantum

A PyTorch-based framework for Quantum Classical Simulation, Quantum Machine Learning, Quantum Neural Networks, Parameterized Quantum Circuits with support for easy deployments on real quantum computers.
Jupyter Notebook
1,270
star
9

efficientvit

EfficientViT is a new family of vision models for efficient high-resolution vision.
Python
1,218
star
10

torchsparse

[MICRO'23, MLSys'22] TorchSparse: Efficient Training and Inference Framework for Sparse Convolution on GPUs.
Cuda
1,181
star
11

smoothquant

[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
Python
1,175
star
12

gan-compression

[CVPR 2020] GAN Compression: Efficient Architectures for Interactive Conditional GANs
Python
1,102
star
13

anycost-gan

[CVPR 2021] Anycost GANs for Interactive Image Synthesis and Editing
Python
778
star
14

tinyml

Python
732
star
15

tinyengine

[NeurIPS 2020] MCUNet: Tiny Deep Learning on IoT Devices; [NeurIPS 2021] MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning; [NeurIPS 2022] MCUNetV3: On-Device Training Under 256KB Memory
C
717
star
16

TinyChatEngine

TinyChatEngine: On-Device LLM Inference Library
C++
695
star
17

fastcomposer

[IJCV] FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention
Python
644
star
18

pvcnn

[NeurIPS 2019, Spotlight] Point-Voxel CNN for Efficient 3D Deep Learning
Python
636
star
19

lite-transformer

[ICLR 2020] Lite Transformer with Long-Short Range Attention
Python
589
star
20

spvnas

[ECCV 2020] Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution
Python
577
star
21

distrifuser

[CVPR 2024 Highlight] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
Python
538
star
22

amc

[ECCV 2018] AMC: AutoML for Model Compression and Acceleration on Mobile Devices
Python
422
star
23

tiny-training

On-Device Training Under 256KB Memory [NeurIPS'22]
Python
414
star
24

dlg

[NeurIPS 2019] Deep Leakage From Gradients
Python
375
star
25

offsite-tuning

Offsite-Tuning: Transfer Learning without Full Model
Python
365
star
26

haq

[CVPR 2019, Oral] HAQ: Hardware-Aware Automated Quantization with Mixed Precision
Python
362
star
27

hardware-aware-transformers

[ACL'20] HAT: Hardware-Aware Transformers for Efficient Natural Language Processing
Python
321
star
28

litepose

[CVPR'22] Lite Pose: Efficient Architecture Design for 2D Human Pose Estimation
Python
301
star
29

inter-operator-scheduler

[MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration
C++
189
star
30

amc-models

[ECCV 2018] AMC: AutoML for Model Compression and Acceleration on Mobile Devices
Python
165
star
31

apq

[CVPR 2020] APQ: Joint Search for Network Architecture, Pruning and Quantization Policy
Python
156
star
32

parallel-computing-tutorial

C++
123
star
33

flatformer

[CVPR'23] FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer
Python
119
star
34

patch_conv

Patch convolution to avoid large GPU memory usage of Conv2D
Python
72
star
35

6s965-fall2022

Jupyter Notebook
64
star
36

sparsevit

[CVPR'23] SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer
Python
48
star
37

bnn-icestick

Binary Neural Network on IceStick FPGA.
Jupyter Notebook
47
star
38

e3d

Efficient 3D Deep Learning
46
star
39

neurips-micronet

[JMLR'20] NeurIPS 2019 MicroNet Challenge Efficient Language Modeling, Champion
Jupyter Notebook
40
star
40

spatten-llm

[HPCA'21] SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning
Scala
32
star
41

tinychat-tutorial

C++
28
star
42

pruning-sparsity-publications

14
star
43

iccad-tinyml-open

[ICCAD'22 TinyML Contest] Efficient Heart Stroke Detection on Low-cost Microcontrollers
C
14
star
44

calo-cluster

Jupyter Notebook
5
star
45

ml-blood-pressure

Python
5
star
46

gan-compression-dynamic

Python
3
star
47

data-efficient-gans-dynamic

Python
3
star