• Stars
    star
    151
  • Rank 244,586 (Top 5 %)
  • Language
  • Created about 6 years ago
  • Updated over 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A quick view of high-performance convolution neural networks (CNNs) inference engines on mobile devices.

CNN-Inference-Engine-Quick-View

A quick view of high-performance convolution neural networks (CNNs) inference engines on mobile devices.

Runtime-speed Comparisons

Data-flow / Graph Optimization

FLOAT32-Support

Framework Main Platform Model Compatibility Detection-Support Speed Benchmarks
Bolt CPU (ARM optimized) / x86 / Mali GPU Caffe / Tensorflow / PyTorch / onnx Y Link
TNN CPU (ARM optimized) / Mali Adreno Apple GPU Caffe / Tensorflow / PyTorch Y Link
PPLNN CPU (ARM/x86 optimized) / Nvidia GPU onnx Y Link / Link
Paddle-Light CPU (ARM optimized) / Mali GPU / FPGA / NPU Paddle / Caffe / onnx Y Link
MNN CPU (ARM optimized) / Mali GPU Caffe / Tensorflow / onnx Y Link
NCNN CPU (ARM optimized) / Mali GPU Caffe / PyTorch / mxnet / onnx Y 3rd party Link / Official Link
MACE CPU (ARM optimized) / Mali GPU / DSP Caffe / Tensorflow / onnx Y Link
TEngine CPU (ARM A72 optimized) Caffe / mxnet Y Link
AutoKernel CPU / GPU/ NPU Caffe / mxnet / Tensorflow / PyTorch / Darknet Y Link
Synet CPU (ARM optimized) / x86 Caffe / PyTorch / Tensorflow / mxnet / onnx Y -
MsnhNet CPU (ARM optimized) / Mali GPU / x86 / TensorRT PyTorch Y Link
ONNX-Runtime CPU / Nvidia GPU onnx Y -
HiAI Kirin CPU / NPU Caffe / Tensorflow Y -
NNIE NPU Caffe Y 1TOPs
Intel-Caffe CPU (Intel optimized) Caffe Y Link
FeatherCNN CPU (ARM optimized) Caffe N Link / unofficial Link
Tensorflowlite CPU (Android optimized) Caffe2 / Tensorflow / onnx Y Link
TensorRT GPU (Volta optimized) Caffe / Tensorflow / onnx Y Link
TVM CPU (ARM optimized) / Mali GPU / FPGA onnx Y -
SNPE CPU (Qualcomm optimized) / GPU / DSP Caffe / Caffe2 / Tensorflow/ onnx Y Link
Pocket-Tensor CPU (ARM/x86 optimized) Keras N Link
ZQCNN CPU Caffe / mxnet Y Link
ARM-NEON-to-x86-SSE CPU (Intel optimized) Intrinsics-Level - -
Simd CPU (all platform optimized) Intrinsics-Level - -
clDNN Intelยฎ Processor Graphics / Irisโ„ข Pro Graphics Caffe / Tennsorflow / mxnet / onnx Y Link

FIX16-Support

Framework Main Platform Model Compatibility Detection-Support Speed Benchmarks
Bolt CPU (ARM optimized) / x86 / Mali GPU Caffe / Tensorflow / PyTorch Y Link
ARM32-SGEMM-LIB CPU (ARM optimized) GEMM Library N Link
TNN CPU (ARM optimized) / Mali Adreno Apple GPU Caffe / Tensorflow / PyTorch Y Link
Yolov2-Xilinx-PYNQ FPGA (Xilinx PYNQ) Yolov2-only Y Link

INT8-Support

Framework Main Platform Model Compatibility Detection-Support Speed Benchmarks
Bolt CPU (ARM optimized) / x86 / Mali GPU Caffe / Tensorflow / PyTorch Y Link
Intel-Caffe CPU (Intel Skylake) Caffe Y Link
TNN CPU (ARM optimized) / Mali Adreno Apple GPU Caffe / Tensorflow / PyTorch Y Link
PPLNN Nvidia GPU optimized onnx Y Link
NCNN CPU (ARM optimized) Caffe / pytorch / mxnet / onnx Y Link
Paddle-Light CPU (ARM optimized) / Mali GPU / FPGA Paddle / Caffe / onnx Y Link
MNN CPU (ARM optimized) / Mali GPU Caffe / Tensorflow / onnx Y Link
Tensorflowlite CPU (ARM) Caffe2 / Tensorflow / onnx Y Link
TensorRT GPU (Volta) Caffe / Tensorflow / onnx Y Link
Gemmlowp CPU (ARM / x86) GEMM Library - -
SNPE DSP (Quantized DLC) Caffe / Caffe2 / Tensorflow/ onnx Y Link
MACE CPU (ARM optimized) / Mali GPU / DSP Caffe / Tensorflow / onnx Y Link
TF2 FPGA Caffe / PyTorch / Tensorflow Y Link
TVM CPU (ARM optimized) / Mali GPU / FPGA onnx Y Link

TERNARY-Support

Framework Main Platform Model Compatibility Detection-Support Speed Benchmarks
Gemmbitserial CPU (ARM / x86) GEMM Library - Link

BINARY-Support

Framework Main Platform Model Compatibility Detection-Support Speed Benchmarks
Bolt CPU (ARM optimized) / x86 / Mali GPU Caffe / Tensorflow / PyTorch Y Link
BMXNET CPU (ARM / x86) / GPU mxnet Y Link
DABNN CPU (ARM) Caffe / Tensorflow / onnx N Link
Espresso GPU - N Link
BNN-PYNQ FPGA (Xilinx PYNQ) - N Link
FINN FPGA (Xilinx) - N Link

NLP-Support

Framework Main Platform Model Compatibility Speed Benchmarks
TurboTransformers CPU / Nvidia GPU PyTorch Link
Bolt CPU / Mali GPU Caffe / onnx Link

More Repositories

1

ResNet-18-Caffemodel-on-ImageNet

ResNet-18 Caffemodel @ilsvrc12 shrt 256 with Top-1 69% Top-5 89%
190
star
2

OISR-PyTorch

PyTorch implementation of "ODE-inspired Network Design for Single Image Super-Resolution"(CVPR2019)
Python
134
star
3

ShuffleNet-An-Extremely-Efficient-CNN-for-Mobile-Devices-Caffe-Reimplementation

Caffe re-implementation of ShuffleNet
C++
106
star
4

Training-Tricks-for-Binarized-Neural-Networks

The collection of training tricks of binarized neural networks.
71
star
5

Location-aware-Upsampling-for-Semantic-Segmentation

The Pytorch implementation of "Location-aware Upsampling for Semantic Segmentation" (LaU)
Jupyter Notebook
52
star
6

SNNs-Self-Normalizing-Neural-Networks-Caffe-Reimplementation

Caffe re-implementation of SNNs.
C++
46
star
7

EDSR-ssim

Different SSIM metrics in CNN-based super resolution algorithms (e.g., EDSR CVPRW2017, RDN CVPR2018, MSRN ECCV2018).
38
star
8

Caffe-Computation-Graph-Optimization

Caffe Computation Graph Optimization.
Python
28
star
9

K-Nearest-Neighbors-Hashing

Matlab implementation of "K-Nearest Neighbors Hashing" (CVPR2019)
MATLAB
28
star
10

Compact-Global-Descriptor

Pytorch implementation of "Compact Global Descriptor for Neural Networks" (CGD).
Python
25
star
11

AIM2020-Real-Super-Resolution

Our solution to AIM2020 Real Image Super-Resolution Challenge (x2)
Python
21
star
12

Dynamic-Network-Surgery-Caffe-Reimplementation

Caffe re-implementation of dynamic network surgery.
C++
18
star
13

A-Variation-of-Dice-coefficient-Loss-Caffe-Layer

Compute the variation of dice coefficient loss for real-value regression task.
C++
18
star
14

Zero-shot-Style-Transfer-via-Attention-Rearrangement

Official implementation of the paper "Zโˆ—: Zero-shot Style Transfer via Attention Rearrangement" a.k.a. "Zโˆ—: Zero-shot Style Transfer via Attention Reweighting" (CVPR2024)
Python
16
star
15

Optimal-Ternary-Weights-Approximation

Caffe implementation of Optimal-Ternary-Weights-Approximation in "Two-Step Quantization for Low-bit Neural Networks" (CVPR2018).
Cuda
14
star
16

PyTorch-MixNet-SS

Extremely light-weight MixNet with Top-1 75.7% and 2.5M params
Python
7
star
17

AlexNet-BN-Caffemodel-on-ImageNet

AlexNet-BN Caffemodel @ilsvrc12 shrt 256 with Top-1 60.43% Top-5 82.47%
6
star
18

UCAS-Pattern-Recognition

The implementations of programming homeworks in UCAS PR class. (Autumn 2017-2018).
MATLAB
6
star
19

Label-free-Network-Compression

Caffe implementation of "Learning Compression from Limited Unlabeled Data" (ECCV2018).
Python
6
star
20

News-Spider

Tencent NetEase SouHu etc. news spider. Including date, title, body and comments of the news.
Python
5
star
21

Bias-Variance-Decomposition-for-KL-Divergence

This repository includes some detailed proofs of "Bias Variance Decomposition for KL Divergence".
3
star