• Stars
    star
    296
  • Rank 140,464 (Top 3 %)
  • Language
    Python
  • License
    Other
  • Created almost 4 years ago
  • Updated 11 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Qualcomm Innovation Center, Inc.

Model Zoo for AI Model Efficiency Toolkit

We provide a collection of popular neural network models and compare their floating point and quantized performance. Results demonstrate that quantized models can provide good accuracy, comparable to floating point models. Together with results, we also provide scripts and artifacts for users to quantize floating-point models using the AI Model Efficiency ToolKit (AIMET).

Table of Contents

Introduction

Quantized inference is significantly faster than floating-point inference, and enables models to run in a power-efficient manner on mobile and edge devices. We use AIMET, a library that includes state-of-the-art techniques for quantization, to quantize various models available in PyTorch and TensorFlow frameworks.

An original FP32 source model is quantized either using post-training quantization (PTQ) or Quantization-Aware-Training (QAT) technique available in AIMET. Example scripts for evaluation are provided for each model. When PTQ is needed, the evaluation script performs PTQ before evaluation. Wherever QAT is used, the fine-tuned model checkpoint is also provided.

PyTorch Models

Task Network[1] Model Source[2] Floating Pt (FP32) Model [3] Quantized Model [4] Results [5]
Metric FP32 W8A8[6] W4A8[7]
Image Classification MobileNetV2 GitHub Repo Pretrained Model Quantized Model (ImageNet) Top-1 Accuracy 71.67% 71.14% TBD
Resnet18 Pytorch Torchvision Pytorch Torchvision Quantized Model (ImageNet) Top-1 Accuracy 69.75% 69.54% 69.1%
Resnet50 Pytorch Torchvision Pytorch Torchvision Quantized Model (ImageNet) Top-1 Accuracy 76.14% 75.81% 75.63%
Resnet101 Pytorch Torchvision Pytorch Torchvision Quantized Model (ImageNet) Top-1 Accuracy 77.34% 77.13% TBD
Regnet_x_3_2gf Pytorch Torchvision Pytorch Torchvision Quantized Model (ImageNet) Top-1 Accuracy 78.36% 78.10% 77.70%
ResNeXt101 Pytorch Torchvision Pytorch Torchvision Quantized Model (ImageNet) Top-1 Accuracy 79.23% 78.76% TBD
HRNet_W32 GitHub Repo Pretrained Model Quantized Model (ImageNet) Top-1 Accuracy 78.50% 78.20% TBD
EfficientNet-lite0 GitHub Repo Pretrained Model Quantized Model (ImageNet) Top-1 Accuracy 75.40% 75.36% 74.46%
ViT Repo Prepared Models Quantized Models (ImageNet dataset) Accuracy 81.32 81.57 TBD
MobileViT Repo Prepared Models Quantized Models (ImageNet dataset) Accuracy 78.46 77.59 TBD
GPUNet Repo Prepared Models Quantized Models (ImageNet dataset) Accuracy 78.86 78.42 TBD
Uniformer Repo Prepared Models Quantized Models (ImageNet dataset) Accuracy 82.9 81.9 TBD
Object Detection MobileNetV2-SSD-Lite GitHub Repo Pretrained Model Quantized Model (PascalVOC) mAP 68.7% 68.6% TBD
SSD_Res50 GitHub Repo Pretrained Model Quantized Model (COCO2017val) mAP 0.250 0.248 TBD
YOLOX Github Repo Pretrained Models (2 in total) Quantized Model mAP Results TBD
Pose Estimation Pose Estimation Based on Ref. Based on Ref. Quantized Model (COCO) mAP 0.364 0.359 TBD
(COCO) mAR 0.436 0.432 TBD
HRNET-Posenet Based on Ref. FP32 Model Quantized Model (COCO) mAP 0.765 0.763 0.762
(COCO) mAR 0.793 0.792 0.791
Super Resolution SRGAN GitHub Repo Pretrained Model (older version from here) See Example (BSD100) PSNR / SSIM Detailed Results 25.51 / 0.653 25.5 / 0.648 TBD
Anchor-based Plain Net (ABPN) Based on Ref. See Tarballs See Example Average PSNR Results TBD
Extremely Lightweight Quantization Robust Real-Time Single-Image Super Resolution (XLSR) Based on Ref. See Tarballs See Example Average PSNR Results TBD
Super-Efficient Super Resolution (SESR) Based on Ref. See Tarballs See Example Average PSNR Results TBD
QuickSRNet - See Tarballs See Example Average PSNR Results TBD
Semantic Segmentation DeepLabV3+ GitHub Repo Pretrained Model Quantized Model (PascalVOC) mIOU 72.91% 72.44% 72.18%
HRNet-W48 GitHub Repo Original model weight not available Quantized Model (Cityscapes) mIOU 81.04% 80.65% 80.07%
InverseForm (HRNet-16-Slim-IF) GitHub Repo Pretrained Model See Example (Cityscapes) mIOU 77.81% 77.17% TBD
InverseForm (OCRNet-48) GitHub Repo Pretrained Model See Example (Cityscapes) mIOU 86.31% 86.21% TBD
FFNets Github Repo Prepared Models (5 in total) See Example mIoU Results TBD
RangeNet++ GitHub Repo Pretrained Model Quantized Model (Semantic kitti) mIOU 47.2% 47.1% 46.8%
SalsaNext GitHub Repo Pretrained Model Quantized Model (Semantic kitti) mIOU 55.8% 54.9% 55.1%
SegNet GitHub Repo Pretrained Model Quantized Model (CamVid dataset) mIOU 50.48% 50.59% 50.58%
Video Understanding mmaction2 BMN GitHub Repo Pretrained Model Quantized Model (ActivityNet) auc 67.25 67.05 TBD
Speech Recognition DeepSpeech2 GitHub Repo Pretrained Model See Example (Librispeech Test Clean) WER 9.92% 10.22% TBD
NLP / NLU Bert Repo Prepared Models Quantized Models (GLUE dataset) GLUE score 83.11 82.44 TBD
(SQuAD dataset) F1 score 88.48 87.47 TBD
Detailed Results
MobileBert Repo Prepared Models Quantized Models (GLUE dataset) GLUE score 81.24 81.17 TBD
(SQuAD dataset) F1 score 89.45 88.66 TBD
Detailed Results
MiniLM Repo Prepared Models Quantized Models (GLUE dataset) GLUE score 82.23 82.63 TBD
(SQuAD dataset) F1 score 90.47 89.70 TBD
Detailed Results
Roberta Repo Prepared Models Quantized Models (GLUE dataset) GLUE score 85.11 84.26 TBD
Detailed Results
DistilBert Repo Prepared Models Quantized Models (GLUE dataset) GLUE score 80.71 80.26 TBD
(SQuAD dataset) F1 score 85.42 85.18 TBD
Detailed Results
GPT2 Repo Prepared Models Quantized Models Perplexity 27.67 28.11 TBD

[1] Model usage documentation
[2] Original FP32 model source
[3] FP32 model checkpoint
[4] Quantized Model: For models quantized with post-training technique, refers to FP32 model which can then be quantized using AIMET. For models optimized with QAT, refers to model checkpoint with fine-tuned weights. 8-bit weights and activations are typically used. For some models, 8-bit weights and 16-bit weights are used to further improve performance of post-training quantization.
[5] Results comparing float and quantized performance
[6] W8A8 indicates 8-bit weights, 8-bit activations
[7] W4A8 indicates 4-bit weights, 8-bit activations (Some models include a mix of W4A8 and W8A8 layers).
TBD indicates that support is NOT yet available

Tensorflow Models

Task Network [1] Model Source [2] Floating Pt (FP32) Model [3] Quantized Model [4] TensorFlow Version Results [5]
Metric FP32 W8A8[6] W4A8[7]
Image Classification ResNet-50 (v1) GitHub Repo Pretrained Model See Documentation 1.15 (ImageNet) Top-1 Accuracy 75.21% 74.96% TBD
ResNet-50-tf2 GitHub Repo Pretrained Model Quantized Model 2.4 (ImageNet) Top-1 Accuracy 74.9% 74.8% TBD
MobileNet-v2-1.4 GitHub Repo Pretrained Model Quantized Model 1.15 (ImageNet) Top-1 Accuracy 75% 74.21% TBD
MobileNet-v2-tf2 GitHub Repo Pretrained Model See Example 2.4 (ImageNet) Top-1 Accuracy 71.6% 71.0% TBD
EfficientNet Lite GitHub Repo Pretrained Model Quantized Model 2.4 (ImageNet) Top-1 Accuracy 74.93% 74.99% TBD
Object Detection SSD MobileNet-v2 GitHub Repo Pretrained Model See Example 1.15 (COCO) Mean Avg. Precision (mAP) 0.2469 0.2456 TBD
RetinaNet GitHub Repo Pretrained Model See Example 1.15 (COCO) mAP Detailed Results 0.35 0.349 TBD
MobileDet-EdgeTPU GitHub Repo Pretrained Model See Example 2.4 (COCO) Mean Avg. Precision (mAP) 0.281 0.279 TBD
Pose Estimation Pose Estimation Based on Ref. Based on Ref. Quantized Model 2.4 (COCO) mAP 0.383 0.379 TBD
(COCO) (mAR) 0.452 0.446 TBD
Super Resolution SRGAN GitHub Repo Pretrained Model See Example 2.4 (BSD100) PSNR / SSIM Detailed Results 25.45 / 0.668 24.78 / 0.628 25.41 / 0.666 (INT8W / INT16Act.)
Semantic Segmentation DeeplabV3plus_mbnv2 GitHub Repo Pretrained Model See Example 2.4 (PascalVOC) mIOU 72.28 71.71 TBD
DeeplabV3plus_xception GitHub Repo Pretrained Model See Example 2.4 (PascalVOC) mIOU 87.71 87.21 TBD

[1] Model usage documentation
[2] Original FP32 model source
[3] FP32 model checkpoint
[4] Quantized Model: For models quantized with post-training technique, refers to FP32 model which can then be quantized using AIMET. For models optimized with QAT, refers to model checkpoint with fine-tuned weights. 8-bit weights and activations are typically used. For some models, 8-bit weights and 16-bit activations (INT8W/INT16Act.) are used to further improve performance of post-training quantization.
[5] Results comparing float and quantized performance
[6] W8A8 indicates 8-bit weights, 8-bit activations
[7] W4A8 indicates 4-bit weights, 8-bit activations (Some models include a mix of W4A8 and W8A8 layers).
TBD indicates that support is NOT yet available

Installation and Usage

Install AIMET

Before you can run the evaluation script for a specific model, you need to install the AI Model Efficiency ToolKit (AIMET) software. Please see this Getting Started page for an overview. Then install AIMET and its dependencies using these Installation instructions.

Install AIMET model zoo

Follow the instructions on this page to install the AIMET model zoo python package(s).

Run model evaluation

The evaluation scripts run floating-point and quantized evaluations that demonstrate improved quantized model performance through the use of AIMET techniques. They generate and display the final accuracy results (as documented in the table above). To access the documentation and procedures for a specific model, refer to the relevant .md within the subfolder in TensorFlow or PyTorch folders.

Team

AIMET Model Zoo is a project maintained by Qualcomm Innovation Center, Inc.

License

Please see the LICENSE file for details.

More Repositories

1

aimet

AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
Python
2,115
star
2

sense

Enhance your application with the ability to see and interact with humans using any RGB camera.
Python
733
star
3

ai-hub-models

The Qualcomm® AI Hub Models are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) and ready to deploy on Qualcomm® devices.
Python
448
star
4

gunyah-hypervisor

Gunyah is a Type-1 hypervisor designed for strong security, performance and modularity.
C
302
star
5

sample-apps-for-robotics-platforms

C
120
star
6

AFLTriage

Rust
111
star
7

qidk

C
95
star
8

snapdragon-gsr

GLSL
94
star
9

adreno-gpu-opengl-es-code-sample-framework

This repository contains an OpenGL ES Framework designed to enable developers to get up and running quickly for creating sample content and rapid prototyping. It is designed to be easy to build and have the basic building blocks needed for creating an Android APK with OpenGL ES functionality, input system, as well as other helper utilities for loading resources, etc. This Framework has been extracted and is a subset of the Adreno GPU SDK.
C++
58
star
10

cloud-ai-sdk

Qualcomm Cloud AI SDK (Platform and Apps) enable high performance deep learning inference on Qualcomm Cloud AI platforms delivering high throughput and low latency across Computer Vision, Object Detection, Natural Language Processing and Generative AI models.
Jupyter Notebook
52
star
11

adreno-gpu-vulkan-code-sample-framework

This repository contains a Vulkan Framework designed to enable developers to get up and running quickly for creating sample content and rapid prototyping. It is designed to be easy to build and have the basic building blocks needed for creating an Android APK with Vulkan functionality, input system, as well as other helper utilities for loading resources, etc.
C++
43
star
12

upstream-wifi-fw

42
star
13

efficient-transformers

This library empowers users to seamlessly port pretrained models and checkpoints on the HuggingFace (HF) hub (developed using HF transformers library) into inference-ready formats that run efficiently on Qualcomm Cloud AI 100 accelerators.
Python
39
star
14

qbox

Qbox
C++
35
star
15

ai-hub-apps

The Qualcomm® AI Hub apps are a collection of state-of-the-art machine learning applications ready to deploy on Qualcomm® devices.
Java
31
star
16

qca-sdk-nss-fw

27
star
17

fastrpc

C
21
star
18

sense-iOS

Enhance your iOS app with the ability to see and interact with humans using the RGB camera.
Swift
20
star
19

vasp

VASP is a framework to simulate attacks on V2X networks. It works on top of the VEINS simulator.
C++
19
star
20

toolchain_for_hexagon

Shell
18
star
21

software-kit-for-qualcomm-cloud-ai-100

Software kit for Qualcomm Cloud AI 100
C++
16
star
22

gunyah-resource-manager

A Root VM supporting virtualization with the Gunyah Hypervisor.
C
15
star
23

ai-engine-direct-helper

C++
15
star
24

lid

License Identifier
Python
14
star
25

vdds

Highly-optimized intra-process PubSub library with DDS-like interface
C++
13
star
26

android-on-snapdragon

Sample code for 3rd party developers working on Android On Snapdragon
Java
11
star
27

gunyah-c-runtime

A small C runtime for bare-metal VMs on the Gunyah Hypervisor.
C
11
star
28

comment-filter

A Python library and command-line utility that filters comments from a source file
Python
10
star
29

software-kit-for-qualcomm-cloud-ai-100-cc

Software kit for Qualcomm Cloud AI 100 cc
C++
10
star
30

gunyah-support-scripts

Shell
9
star
31

wos-ai-plugins

C++
9
star
32

iodme

IODME (IO Data Mover Engine) is a library, and some tools, for optimizing typical IO operations that involve copying / moving data between memory and file descriptors.
C++
8
star
33

startupkits

Platform Documentation - a collection of documentations (user guides) for startup-kits published on QDN (https://developer.qualcomm.com/hardware/startup-kits)
7
star
34

autopen

Autopen is an open-source toolkit designed to assist security analysts, manufacturers, and various professionals to detect potential vulnerabilities in vehicles.
Python
7
star
35

qccsdk-qcc711

C
7
star
36

license-text-normalizer

License Text Normalizer
Python
6
star
37

aimet-pages

AIMET GitHub pages documentation
HTML
6
star
38

bstruct-mininet

Python
5
star
39

wifi-commonsys

Java
5
star
40

license-text-normalizer-js

License Text Normalizer (JavaScript)
TypeScript
5
star
41

quic.github.io

Landing page for QuIC GitHub
SCSS
4
star
42

musl

musl libc fork for Hexagon support
C
4
star
43

snapdragon-game-plugins-for-unreal-engine

4
star
44

lockers

The lockers package contains various locking mechanism and building blocks.
Shell
4
star
45

sshash

Library and tools for hashing sensitive strings in ELF libraries and executables
C++
4
star
46

hexagonMVM

Assembly
4
star
47

game-assets-for-adreno-gpu-code-samples

Game assets for Adreno GPU code samples
3
star
48

lsbug

lsbug - A collection of Linux kernel tests for arm64 servers
Python
3
star
49

.github

QuIC GitHub organization action templates and config
C
3
star
50

mink-idl-compiler

Rust
3
star
51

ghe-policy-check

Python
2
star
52

quic-usb-drivers

C
2
star
53

sample-apps-for-qualcomm-linux

C++
2
star
54

vsf-service

Python
2
star
55

tps-location-sdk-android

1
star
56

tps-location-sdk-native

HTML
1
star
57

tps-location-quick-start-android

Java
1
star
58

tps-location-quick-start-native

C++
1
star
59

cloud-ai-sdk-pages

1
star
60

sbom-check

Python library and CLI application that check a provided SPDX SBOM for adherence to the official specification SPDX 2.3 specification and for the presence of a configurable set of required field values.
Python
1
star
61

aic-operator

Go
1
star
62

v4l-video-test-app

C++
1
star