Awesome EMDL
Embedded and mobile deep learning research notes.
Papers
Survey
- EfficientDNNs [Repo]
- Awesome ML Model Compression [Repo]
- TinyML Papers and Projects [Repo]
- TinyML Platforms Benchmarking [arXiv '21]
- TinyML: A Systematic Review and Synthesis of Existing Research [ICAIIC '21]
- TinyML Meets IoT: A Comprehensive Survey [Internet of Things '21]
- A review on TinyML: State-of-the-art and prospects [Journal of King Saud Univ. '21]
- TinyML Benchmark: Executing Fully Connected Neural Networks on Commodity Microcontrollers [IEEE '21]
- Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better [arXiv '21]
- Benchmarking TinyML Systems: Challenges and Direction [arXiv '20]
- Model Compression and Hardware Acceleration for Neural Networks: A Comprehensive Survey [IEEE '20]
- The Deep Learning Compiler: A Comprehensive Survey [arXiv '20]
- Recent Advances in Efficient Computation of Deep Convolutional Neural Networks [arXiv '18]
- A Survey of Model Compression and Acceleration for Deep Neural Networks [arXiv '17]
Model
- EtinyNet: Extremely Tiny Network for TinyML [AAAI '21]
- MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning [NeurIPS '21, MIT]
- SkyNet: a Hardware-Efficient Method for Object Detection and Tracking on Embedded Systems [MLSys '20, IBM]
- Model Rubik's Cube: Twisting Resolution, Depth and Width for TinyNets [NeurIPS '20, Huawei]
- MCUNet: Tiny Deep Learning on IoT Devices [NeurIPS '20, MIT]
- GhostNet: More Features from Cheap Operations [CVPR '20, Huawei]
- MicroNet for Efficient Language Modeling [NeurIPS '19, MIT]
- Searching for MobileNetV3 [ICCV '19, Google]
- MobilenetV2: Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation [CVPR '18, Google]
- ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware [arXiv '18, MIT]
- DeepRebirth: Accelerating Deep Neural Network Execution on Mobile Devices [AAAI'18, Samsung]
- NasNet: Learning Transferable Architectures for Scalable Image Recognition [arXiv '17, Google]
- ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices [arXiv '17, Megvii]
- MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications [arXiv '17, Google]
- CondenseNet: An Efficient DenseNet using Learned Group Convolutions [arXiv '17]
System
- BSC: Block-based Stochastic Computing to Enable Accurate and Efficient TinyML [ASP-DAC '22]
- CFU Playground: Full-Stack Open-Source Framework for Tiny Machine Learning (tinyML) Acceleration on FPGAs [arXiv '22, Google]
- UDC: Unified DNAS for Compressible TinyML Models [arXiv '22, Arm]
- AnalogNets: ML-HW Co-Design of Noise-robust TinyML Models and Always-On Analog Compute-in-Memory Accelerator [arXiv '21, Arm]
- TinyTL: Reduce Activations, Not Trainable Parameters for Efficient On-Device Learning [NeurIPS '20, MIT]
- Once for All: Train One Network and Specialize it for Efficient Deployment [ICLR '20, MIT]
- DeepMon: Mobile GPU-based Deep Learning Framework for Continuous Vision Applications [MobiSys '17]
- DeepEye: Resource Efficient Local Execution of Multiple Deep Vision Models using Wearable Commodity Hardware [MobiSys '17]
- MobiRNN: Efficient Recurrent Neural Network Execution on Mobile GPU [EMDL '17]
- fpgaConvNet: A Toolflow for Mapping Diverse Convolutional Neural Networks on Embedded FPGAs [NIPS '17]
- DeepSense: A GPU-based deep convolutional neural network framework on commodity mobile devices [WearSys '16]
- DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices [IPSN '16]
- EIE: Efficient Inference Engine on Compressed Deep Neural Network [ISCA '16]
- MCDNN: An Approximation-Based Execution Framework for Deep Stream Processing Under Resource Constraints [MobiSys '16]
- DXTK: Enabling Resource-efficient Deep Learning on Mobile and Embedded Devices with the DeepX Toolkit [MobiCASE '16]
- Sparsification and Separation of Deep Learning Layers for Constrained Resource Inference on Wearables [SenSys ’16]
- An Early Resource Characterization of Deep Learning on Wearables, Smartphones and Internet-of-Things Devices [IoT-App ’15]
- CNNdroid: GPU-Accelerated Execution of Trained Deep Convolutional Neural Networks on Android [MM '16]
Quantization
- Quantizing deep convolutional networks for efficient inference: A whitepaper [arXiv '18]
- LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks [ECCV'18]
- Training and Inference with Integers in Deep Neural Networks [ICLR'18]
- The ZipML Framework for Training Models with End-to-End Low Precision: The Cans, the Cannots, and a Little Bit of Deep Learning [ICML'17]
- Loss-aware Binarization of Deep Networks [ICLR'17]
- Towards the Limit of Network Quantization [ICLR'17]
- Deep Learning with Low Precision by Half-wave Gaussian Quantization [CVPR'17]
- ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks [arXiv'17]
- Quantized Convolutional Neural Networks for Mobile Devices [CVPR '16]
- Fixed-Point Performance Analysis of Recurrent Neural Networks [ICASSP'16]
- Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations [arXiv'16]
- Compressing Deep Convolutional Networks using Vector Quantization [arXiv'14]
Pruning
- Awesome-Pruning [Repo]
- Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration [CVPR'19]
- To prune, or not to prune: exploring the efficacy of pruning for model compression [ICLR'18]
- Pruning Filters for Efficient ConvNets [ICLR'17]
- Pruning Convolutional Neural Networks for Resource Efficient Inference [ICLR'17]
- Soft Weight-Sharing for Neural Network Compression [ICLR'17]
- Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning [CVPR'17]
- ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression [ICCV'17]
- Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding [ICLR'16]
- Dynamic Network Surgery for Efficient DNNs [NIPS'16]
- Learning both Weights and Connections for Efficient Neural Networks [NIPS'15]
Approximation
- High performance ultra-low-precision convolutions on mobile devices [NIPS'17]
- Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications [ICLR'16]
- Efficient and Accurate Approximations of Nonlinear Convolutional Networks [CVPR'15]
- Accelerating Very Deep Convolutional Networks for Classification and Detection (Extended version of above one)
- Convolutional neural networks with low-rank regularization [arXiv'15]
- Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation [NIPS'14]
Characterization
- A First Look at Deep Learning Apps on Smartphones [WWW'19]
- Machine Learning at Facebook: Understanding Inference at the Edge [HPCA'19]
- NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications [ECCV 2018]
- Latency and Throughput Characterization of Convolutional Neural Networks for Mobile Computer Vision [MMSys’18]
Libraries
Inference Framework
- Alibaba - MNN - is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba.
- Apple - CoreML - is integrate machine learning models into your app. BERT and GPT-2 on iPhone
- Arm - ComputeLibrary - is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies. Intro
- Arm - Arm NN - is the most performant machine learning (ML) inference engine for Android and Linux, accelerating ML on Arm Cortex-A CPUs and Arm Mali GPUs.
- Baidu - Paddle Lite - is multi-platform high performance deep learning inference engine.
- DeepLearningKit - is Open Source Deep Learning Framework for Apple's iOS, OS X and tvOS.
- Edge Impulse - Interactive platform to generate models that can run in microcontrollers. They are also quite active on social netwoks talking about recent news on EdgeAI/TinyML.
- Google - TensorFlow Lite - is an open source deep learning framework for on-device inference.
- Intel - OpenVINO - Comprehensive toolkit to optimize your processes for faster inference.
- JDAI Computer Vision - dabnn - is an accelerated binary neural networks inference framework for mobile platform.
- Meta - PyTorch Mobile - is a new framework for helping mobile developers and machine learning engineers embed PyTorch ML models on-device.
- Microsoft - DeepSpeed - is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
- Microsoft - ELL - allows you to design and deploy intelligent machine-learned models onto resource constrained platforms and small single-board computers, like Raspberry Pi, Arduino, and micro:bit.
- Microsoft - ONNX RUntime - cross-platform, high performance ML inferencing and training accelerator.
- Nvidia - TensorRT - is a C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators.
- OAID - Tengine - is a lite, high performance, modular inference engine for embedded device
- Qualcomm - Neural Processing SDK for AI - Libraries to developers run NN models on Snapdragon mobile platforms taking advantage of the CPU, GPU and/or DSP.
- Tencent - ncnn - is a high-performance neural network inference framework optimized for the mobile platform.
- uTensor - AI inference library based on mbed (an RTOS for ARM chipsets) and TensorFlow.
- XiaoMi - Mace - is a deep learning inference framework optimized for mobile heterogeneous computing platforms.
- xmartlabs - Bender - Easily craft fast Neural Networks on iOS! Use TensorFlow models. Metal under the hood.
Optimization Tools
- Neural Network Distiller - Python package for neural network compression research.
- PocketFlow - An Automatic Model Compression (AutoMC) framework for developing smaller and faster AI applications.
Research Demos
- RSTensorFlow - GPU Accelerated TensorFlow for Commodity Android Devices.
Web
- mil-tokyo/webdnn - Fastest DNN Execution Framework on Web Browser.
General
- Caffe2 AICamera
- TensorFlow Android Camera Demo
- TensorFlow iOS Example
- TensorFlow OpenMV Camera Module
Edge / Tiny MLOps
- Tiny-MLOps: a framework for orchestrating ML applications at the far edge of IoT systems [EAIS '22]
- MLOps for TinyML: Challenges & Directions in Operationalizing TinyML at Scale [TinyML Talks '22]
- TinyMLOps: Operational Challenges for Widespread Edge AI Adoption [arXiv '22]
- A TinyMLaaS Ecosystem for Machine Learning in IoT: Overview and Research Challenges [VLSI-DAT '21]
- SOLIS: The MLOps journey from data acquisition to actionable insights [arXiv '21]
- Edge MLOps: An Automation Framework for AIoT Applications [IC2E '21]
- SensiX++: Bringing MLOPs and Multi-tenant Model Serving to Sensory Edge Devices [arXiv '21, Nokia]
Vulkan
OpenCL
RenderScript
Tutorials
General
- Squeezing Deep Learning Into Mobile Phones
- Deep Learning – Tutorial and Recent Trends
- Tutorial on Hardware Architectures for Deep Neural Networks
- Efficient Convolutional Neural Network Inference on Mobile GPUs
NEON
OpenCL
- ARM® Mali™ GPU OpenCL Developer Guide, pdf
- Optimal Compute on ARM Maliâ„¢ GPUs
- GPU Compute for Mobile Devices
- Compute for Mobile Devices Performance focused
- Hands On OpenCL
- Adreno OpenCL Programming Guide
- Better OpenCL Performance on Qualcomm Adreno GPU
Courses
Tools
GPU
- Bifrost GPU architecture and ARM Mali-G71 GPU
- Midgard GPU Architecture, ARM Mali-T880 GPU
- Mobile GPU market share
Driver
- [Adreno] csarron/qcom_vendor_binaries: Common Proprietary Qualcomm Binaries
- [Mali] Fevax/vendor_samsung_hero2ltexx: Blobs from s7 Edge G935F
Related Repos
- EfficientDNNs by @MingSun-Tse
- Awesome ML Model Compression by @cedrickchee
- Awesome Pruning by @he-y
- Model Compression by @j-marple-dev
- awesome-AutoML-and-Lightweight-Models by @guan-yuan
- knowledge-distillation-papers by @lhyfst
- Awesome-model-compression-and-acceleration by @memoiry
- Embedded Neural Network by @ZhishengWang