• Stars
    star
    2,115
  • Rank 21,803 (Top 0.5 %)
  • Language
    Python
  • License
    Other
  • Created over 4 years ago
  • Updated 27 days ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.

Qualcomm Innovation Center, Inc.

AIMET on GitHub Pages Documentation Install instructions Discussion Forums What's New

AI Model Efficiency Toolkit (AIMET)

AIMET is a library that provides advanced model quantization and compression techniques for trained neural network models. It provides features that have been proven to improve run-time performance of deep learning neural network models with lower compute and memory requirements and minimal impact to task accuracy.

How AIMET works

AIMET is designed to work with PyTorch and TensorFlow models.

We also host the AIMET Model Zoo - a collection of popular neural network models optimized for 8-bit inference. We also provide recipes for users to quantize floating point models using AIMET.

Table of Contents

Why AIMET?

Benefits of AIMET

  • Supports advanced quantization techniques: Inference using integer runtimes is significantly faster than using floating-point runtimes. For example, models run 5x-15x faster on the Qualcomm Hexagon DSP than on the Qualcomm Kyro CPU. In addition, 8-bit precision models have a 4x smaller footprint than 32-bit precision models. However, maintaining model accuracy when quantizing ML models is often challenging. AIMET solves this using novel techniques like Data-Free Quantization that provide state-of-the-art INT8 results on several popular models.
  • Supports advanced model compression techniques that enable models to run faster at inference-time and require less memory
  • AIMET is designed to automate optimization of neural networks avoiding time-consuming and tedious manual tweaking. AIMET also provides user-friendly APIs that allow users to make calls directly from their TensorFlow or PyTorch pipelines.

Please visit the AIMET on Github Pages for more details.

Supported Features

Quantization

  • Cross-Layer Equalization: Equalize weight tensors to reduce amplitude variation across channels
  • Bias Correction: Corrects shift in layer outputs introduced due to quantization
  • Adaptive Rounding: Learn the optimal rounding given unlabelled data
  • Quantization Simulation: Simulate on-target quantized inference accuracy
  • Quantization-aware Training: Use quantization simulation to train the model further to improve accuracy

Model Compression

  • Spatial SVD: Tensor decomposition technique to split a large layer into two smaller ones
  • Channel Pruning: Removes redundant input channels from a layer and reconstructs layer weights
  • Per-layer compression-ratio selection: Automatically selects how much to compress each layer in the model

Visualization

  • Weight ranges: Inspect visually if a model is a candidate for applying the Cross Layer Equalization technique. And the effect after applying the technique
  • Per-layer compression sensitivity: Visually get feedback about the sensitivity of any given layer in the model to compression

What's New

Some recently added features include

  • Adaptive Rounding (AdaRound): Learn the optimal rounding given unlabelled data
  • Quantization-aware Training (QAT) for recurrent models (including with RNNs, LSTMs and GRUs)

Results

AIMET can quantize an existing 32-bit floating-point model to an 8-bit fixed-point model without sacrificing much accuracy and without model fine-tuning.

DFQ

The DFQ method applied to several popular networks, such as MobileNet-v2 and ResNet-50, result in less than 0.9% loss in accuracy all the way down to 8-bit quantization, in an automated way without any training data.

Models FP32 INT8 Simulation
MobileNet v2 (top1) 71.72% 71.08%
ResNet 50 (top1) 76.05% 75.45%
DeepLab v3 (mIOU) 72.65% 71.91%

AdaRound (Adaptive Rounding)

ADAS Object Detect

For this example ADAS object detection model, which was challenging to quantize to 8-bit precision, AdaRound can recover the accuracy to within 1% of the FP32 accuracy.

Configuration mAP - Mean Average Precision
FP32 82.20%
Nearest Rounding (INT8 weights, INT8 acts) 49.85%
AdaRound (INT8 weights, INT8 acts) 81.21%
DeepLabv3 Semantic Segmentation

For some models like the DeepLabv3 semantic segmentation model, AdaRound can even quantize the model weights to 4-bit precision without a significant drop in accuracy.

Configuration mIOU - Mean intersection over union
FP32 72.94%
Nearest Rounding (INT4 weights, INT8 acts) 6.09%
AdaRound (INT4 weights, INT8 acts) 70.86%

Quantization for Recurrent Models

AIMET supports quantization simulation and quantization-aware training (QAT) for recurrent models (RNN, LSTM, GRU). Using QAT feature in AIMET, a DeepSpeech2 model with bi-directional LSTMs can be quantized to 8-bit precision with minimal drop in accuracy.

DeepSpeech2
(using bi-directional LSTMs)
Word Error Rate
FP32 9.92%
INT8 10.22%

Model Compression

AIMET can also significantly compress models. For popular models, such as Resnet-50 and Resnet-18, compression with spatial SVD plus channel pruning achieves 50% MAC (multiply-accumulate) reduction while retaining accuracy within approx. 1% of the original uncompressed model.

Models Uncompressed model 50% Compressed model
ResNet18 (top1) 69.76% 68.56%
ResNet 50 (top1) 76.05% 75.75%

Installation Instructions

To install and use the pre-built version of the AIMET package, please follow one of the below links:

To build, modify (optionally) and use the latest AIMET source code, please follow one of the below links:

Resources

Contributions

Thanks for your interest in contributing to AIMET! Please read our Contributions Page for more information on contributing features or bug fixes. We look forward to your participation!

Team

AIMET aims to be a community-driven project maintained by Qualcomm Innovation Center, Inc.

License

AIMET is licensed under the BSD 3-clause "New" or "Revised" License. Check out the LICENSE for more details.

More Repositories

1

sense

Enhance your application with the ability to see and interact with humans using any RGB camera.
Python
733
star
2

ai-hub-models

The Qualcomm® AI Hub Models are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) and ready to deploy on Qualcomm® devices.
Python
448
star
3

gunyah-hypervisor

Gunyah is a Type-1 hypervisor designed for strong security, performance and modularity.
C
302
star
4

aimet-model-zoo

Python
296
star
5

sample-apps-for-robotics-platforms

C
120
star
6

AFLTriage

Rust
111
star
7

qidk

C
95
star
8

snapdragon-gsr

GLSL
94
star
9

adreno-gpu-opengl-es-code-sample-framework

This repository contains an OpenGL ES Framework designed to enable developers to get up and running quickly for creating sample content and rapid prototyping. It is designed to be easy to build and have the basic building blocks needed for creating an Android APK with OpenGL ES functionality, input system, as well as other helper utilities for loading resources, etc. This Framework has been extracted and is a subset of the Adreno GPU SDK.
C++
58
star
10

cloud-ai-sdk

Qualcomm Cloud AI SDK (Platform and Apps) enable high performance deep learning inference on Qualcomm Cloud AI platforms delivering high throughput and low latency across Computer Vision, Object Detection, Natural Language Processing and Generative AI models.
Jupyter Notebook
52
star
11

adreno-gpu-vulkan-code-sample-framework

This repository contains a Vulkan Framework designed to enable developers to get up and running quickly for creating sample content and rapid prototyping. It is designed to be easy to build and have the basic building blocks needed for creating an Android APK with Vulkan functionality, input system, as well as other helper utilities for loading resources, etc.
C++
43
star
12

upstream-wifi-fw

42
star
13

efficient-transformers

This library empowers users to seamlessly port pretrained models and checkpoints on the HuggingFace (HF) hub (developed using HF transformers library) into inference-ready formats that run efficiently on Qualcomm Cloud AI 100 accelerators.
Python
39
star
14

qbox

Qbox
C++
35
star
15

ai-hub-apps

The Qualcomm® AI Hub apps are a collection of state-of-the-art machine learning applications ready to deploy on Qualcomm® devices.
Java
31
star
16

qca-sdk-nss-fw

27
star
17

fastrpc

C
21
star
18

sense-iOS

Enhance your iOS app with the ability to see and interact with humans using the RGB camera.
Swift
20
star
19

vasp

VASP is a framework to simulate attacks on V2X networks. It works on top of the VEINS simulator.
C++
19
star
20

toolchain_for_hexagon

Shell
18
star
21

software-kit-for-qualcomm-cloud-ai-100

Software kit for Qualcomm Cloud AI 100
C++
16
star
22

gunyah-resource-manager

A Root VM supporting virtualization with the Gunyah Hypervisor.
C
15
star
23

ai-engine-direct-helper

C++
15
star
24

lid

License Identifier
Python
14
star
25

vdds

Highly-optimized intra-process PubSub library with DDS-like interface
C++
13
star
26

android-on-snapdragon

Sample code for 3rd party developers working on Android On Snapdragon
Java
11
star
27

gunyah-c-runtime

A small C runtime for bare-metal VMs on the Gunyah Hypervisor.
C
11
star
28

comment-filter

A Python library and command-line utility that filters comments from a source file
Python
10
star
29

software-kit-for-qualcomm-cloud-ai-100-cc

Software kit for Qualcomm Cloud AI 100 cc
C++
10
star
30

gunyah-support-scripts

Shell
9
star
31

wos-ai-plugins

C++
9
star
32

iodme

IODME (IO Data Mover Engine) is a library, and some tools, for optimizing typical IO operations that involve copying / moving data between memory and file descriptors.
C++
8
star
33

startupkits

Platform Documentation - a collection of documentations (user guides) for startup-kits published on QDN (https://developer.qualcomm.com/hardware/startup-kits)
7
star
34

autopen

Autopen is an open-source toolkit designed to assist security analysts, manufacturers, and various professionals to detect potential vulnerabilities in vehicles.
Python
7
star
35

qccsdk-qcc711

C
7
star
36

license-text-normalizer

License Text Normalizer
Python
6
star
37

aimet-pages

AIMET GitHub pages documentation
HTML
6
star
38

bstruct-mininet

Python
5
star
39

wifi-commonsys

Java
5
star
40

license-text-normalizer-js

License Text Normalizer (JavaScript)
TypeScript
5
star
41

quic.github.io

Landing page for QuIC GitHub
SCSS
4
star
42

musl

musl libc fork for Hexagon support
C
4
star
43

snapdragon-game-plugins-for-unreal-engine

4
star
44

lockers

The lockers package contains various locking mechanism and building blocks.
Shell
4
star
45

sshash

Library and tools for hashing sensitive strings in ELF libraries and executables
C++
4
star
46

hexagonMVM

Assembly
4
star
47

game-assets-for-adreno-gpu-code-samples

Game assets for Adreno GPU code samples
3
star
48

lsbug

lsbug - A collection of Linux kernel tests for arm64 servers
Python
3
star
49

.github

QuIC GitHub organization action templates and config
C
3
star
50

mink-idl-compiler

Rust
3
star
51

ghe-policy-check

Python
2
star
52

quic-usb-drivers

C
2
star
53

sample-apps-for-qualcomm-linux

C++
2
star
54

vsf-service

Python
2
star
55

tps-location-sdk-android

1
star
56

tps-location-sdk-native

HTML
1
star
57

tps-location-quick-start-android

Java
1
star
58

tps-location-quick-start-native

C++
1
star
59

cloud-ai-sdk-pages

1
star
60

sbom-check

Python library and CLI application that check a provided SPDX SBOM for adherence to the official specification SPDX 2.3 specification and for the presence of a configurable set of required field values.
Python
1
star
61

aic-operator

Go
1
star
62

v4l-video-test-app

C++
1
star