Model Zoo for AI Model Efficiency Toolkit

We provide a collection of popular neural network models and compare their floating point and quantized performance. Results demonstrate that quantized models can provide good accuracy, comparable to floating point models. Together with results, we also provide scripts and artifacts for users to quantize floating-point models using the AI Model Efficiency ToolKit (AIMET).

Introduction
PyTorch Models
Tensorflow Models
Installation and Usage
Team
License

Introduction

Quantized inference is significantly faster than floating-point inference, and enables models to run in a power-efficient manner on mobile and edge devices. We use AIMET, a library that includes state-of-the-art techniques for quantization, to quantize various models available in PyTorch and TensorFlow frameworks.

An original FP32 source model is quantized either using post-training quantization (PTQ) or Quantization-Aware-Training (QAT) technique available in AIMET. Example scripts for evaluation are provided for each model. When PTQ is needed, the evaluation script performs PTQ before evaluation. Wherever QAT is used, the fine-tuned model checkpoint is also provided.

PyTorch Models

Task	Network^[1]	Model Source^[2]	Floating Pt (FP32) Model ^[3]	Quantized Model ^[4]	Results ^[5]
					Metric	FP32	W8A8^[6]	W4A8^[7]
Image Classification	MobileNetV2	GitHub Repo	Pretrained Model	Quantized Model	(ImageNet) Top-1 Accuracy	71.67%	71.14%	TBD
	Resnet18	Pytorch Torchvision	Pytorch Torchvision	Quantized Model	(ImageNet) Top-1 Accuracy	69.75%	69.54%	69.1%
	Resnet50	Pytorch Torchvision	Pytorch Torchvision	Quantized Model	(ImageNet) Top-1 Accuracy	76.14%	75.81%	75.63%
	Resnet101	Pytorch Torchvision	Pytorch Torchvision	Quantized Model	(ImageNet) Top-1 Accuracy	77.34%	77.13%	TBD
	Regnet_x_3_2gf	Pytorch Torchvision	Pytorch Torchvision	Quantized Model	(ImageNet) Top-1 Accuracy	78.36%	78.10%	77.70%
	ResNeXt101	Pytorch Torchvision	Pytorch Torchvision	Quantized Model	(ImageNet) Top-1 Accuracy	79.23%	78.76%	TBD
	HRNet_W32	GitHub Repo	Pretrained Model	Quantized Model	(ImageNet) Top-1 Accuracy	78.50%	78.20%	TBD
	EfficientNet-lite0	GitHub Repo	Pretrained Model	Quantized Model	(ImageNet) Top-1 Accuracy	75.40%	75.36%	74.46%
	ViT	Repo	Prepared Models	Quantized Models	(ImageNet dataset) Accuracy	81.32	81.57	TBD
	MobileViT	Repo	Prepared Models	Quantized Models	(ImageNet dataset) Accuracy	78.46	77.59	TBD
	GPUNet	Repo	Prepared Models	Quantized Models	(ImageNet dataset) Accuracy	78.86	78.42	TBD
	Uniformer	Repo	Prepared Models	Quantized Models	(ImageNet dataset) Accuracy	82.9	81.9	TBD
Object Detection	MobileNetV2-SSD-Lite	GitHub Repo	Pretrained Model	Quantized Model	(PascalVOC) mAP	68.7%	68.6%	TBD
	SSD_Res50	GitHub Repo	Pretrained Model	Quantized Model	(COCO2017val) mAP	0.250	0.248	TBD
	YOLOX	Github Repo	Pretrained Models (2 in total)	Quantized Model	mAP Results			TBD
Pose Estimation	Pose Estimation	Based on Ref.	Based on Ref.	Quantized Model	(COCO) mAP	0.364	0.359	TBD
	Pose Estimation	Based on Ref.	Based on Ref.	Quantized Model	(COCO) mAR	0.436	0.432	TBD
	HRNET-Posenet	Based on Ref.	FP32 Model	Quantized Model	(COCO) mAP	0.765	0.763	0.762
	HRNET-Posenet	Based on Ref.	FP32 Model	Quantized Model	(COCO) mAR	0.793	0.792	0.791
Super Resolution	SRGAN	GitHub Repo	Pretrained Model (older version from here)	See Example	(BSD100) PSNR / SSIM Detailed Results	25.51 / 0.653	25.5 / 0.648	TBD
	Anchor-based Plain Net (ABPN)	Based on Ref.	See Tarballs	See Example	Average PSNR Results			TBD
	Extremely Lightweight Quantization Robust Real-Time Single-Image Super Resolution (XLSR)	Based on Ref.	See Tarballs	See Example	Average PSNR Results			TBD
	Super-Efficient Super Resolution (SESR)	Based on Ref.	See Tarballs	See Example	Average PSNR Results			TBD
	QuickSRNet	-	See Tarballs	See Example	Average PSNR Results			TBD
Semantic Segmentation	DeepLabV3+	GitHub Repo	Pretrained Model	Quantized Model	(PascalVOC) mIOU	72.91%	72.44%	72.18%
	HRNet-W48	GitHub Repo	Original model weight not available	Quantized Model	(Cityscapes) mIOU	81.04%	80.65%	80.07%
	InverseForm (HRNet-16-Slim-IF)	GitHub Repo	Pretrained Model	See Example	(Cityscapes) mIOU	77.81%	77.17%	TBD
	InverseForm (OCRNet-48)	GitHub Repo	Pretrained Model	See Example	(Cityscapes) mIOU	86.31%	86.21%	TBD
	FFNets	Github Repo	Prepared Models (5 in total)	See Example	mIoU Results			TBD
	RangeNet++	GitHub Repo	Pretrained Model	Quantized Model	(Semantic kitti) mIOU	47.2%	47.1%	46.8%
	SalsaNext	GitHub Repo	Pretrained Model	Quantized Model	(Semantic kitti) mIOU	55.8%	54.9%	55.1%
	SegNet	GitHub Repo	Pretrained Model	Quantized Model	(CamVid dataset) mIOU	50.48%	50.59%	50.58%
Video Understanding	mmaction2 BMN	GitHub Repo	Pretrained Model	Quantized Model	(ActivityNet) auc	67.25	67.05	TBD
Speech Recognition	DeepSpeech2	GitHub Repo	Pretrained Model	See Example	(Librispeech Test Clean) WER	9.92%	10.22%	TBD
NLP / NLU	Bert	Repo	Prepared Models	Quantized Models	(GLUE dataset) GLUE score	83.11	82.44	TBD
					(SQuAD dataset) F1 score	88.48	87.47	TBD
					Detailed Results
	MobileBert	Repo	Prepared Models	Quantized Models	(GLUE dataset) GLUE score	81.24	81.17	TBD
					(SQuAD dataset) F1 score	89.45	88.66	TBD
					Detailed Results
	MiniLM	Repo	Prepared Models	Quantized Models	(GLUE dataset) GLUE score	82.23	82.63	TBD
					(SQuAD dataset) F1 score	90.47	89.70	TBD
					Detailed Results
	Roberta	Repo	Prepared Models	Quantized Models	(GLUE dataset) GLUE score	85.11	84.26	TBD
	Roberta	Repo	Prepared Models	Quantized Models	Detailed Results
	DistilBert	Repo	Prepared Models	Quantized Models	(GLUE dataset) GLUE score	80.71	80.26	TBD
					(SQuAD dataset) F1 score	85.42	85.18	TBD
					Detailed Results
	GPT2	Repo	Prepared Models	Quantized Models	Perplexity	27.67	28.11	TBD

^[1] _{Model usage documentation}
^[2] _{Original FP32 model source}
^[3] _{FP32 model checkpoint}
^[4] _{Quantized Model: For models quantized with post-training technique, refers to FP32 model which can then be quantized using AIMET. For models optimized with QAT, refers to model checkpoint with fine-tuned weights. 8-bit weights and activations are typically used. For some models, 8-bit weights and 16-bit weights are used to further improve performance of post-training quantization.}
^[5] _{Results comparing float and quantized performance}
^[6] _{W8A8 indicates 8-bit weights, 8-bit activations}
^[7] _{W4A8 indicates 4-bit weights, 8-bit activations (Some models include a mix of W4A8 and W8A8 layers).}
_{TBD indicates that support is NOT yet available}

Tensorflow Models

Task	Network ^[1]	Model Source ^[2]	Floating Pt (FP32) Model ^[3]	Quantized Model ^[4]	TensorFlow Version	Results ^[5]
						Metric	FP32	W8A8^[6]	W4A8^[7]
Image Classification	ResNet-50 (v1)	GitHub Repo	Pretrained Model	See Documentation	1.15	(ImageNet) Top-1 Accuracy	75.21%	74.96%	TBD
	ResNet-50-tf2	GitHub Repo	Pretrained Model	Quantized Model	2.4	(ImageNet) Top-1 Accuracy	74.9%	74.8%	TBD
	MobileNet-v2-1.4	GitHub Repo	Pretrained Model	Quantized Model	1.15	(ImageNet) Top-1 Accuracy	75%	74.21%	TBD
	MobileNet-v2-tf2	GitHub Repo	Pretrained Model	See Example	2.4	(ImageNet) Top-1 Accuracy	71.6%	71.0%	TBD
	EfficientNet Lite	GitHub Repo	Pretrained Model	Quantized Model	2.4	(ImageNet) Top-1 Accuracy	74.93%	74.99%	TBD
Object Detection	SSD MobileNet-v2	GitHub Repo	Pretrained Model	See Example	1.15	(COCO) Mean Avg. Precision (mAP)	0.2469	0.2456	TBD
	RetinaNet	GitHub Repo	Pretrained Model	See Example	1.15	(COCO) mAP Detailed Results	0.35	0.349	TBD
	MobileDet-EdgeTPU	GitHub Repo	Pretrained Model	See Example	2.4	(COCO) Mean Avg. Precision (mAP)	0.281	0.279	TBD
Pose Estimation	Pose Estimation	Based on Ref.	Based on Ref.	Quantized Model	2.4	(COCO) mAP	0.383	0.379	TBD
Pose Estimation	Pose Estimation	Based on Ref.	Based on Ref.	Quantized Model	2.4	(COCO) (mAR)	0.452	0.446	TBD
Super Resolution	SRGAN	GitHub Repo	Pretrained Model	See Example	2.4	(BSD100) PSNR / SSIM Detailed Results	25.45 / 0.668	24.78 / 0.628	25.41 / 0.666 (INT8W / INT16Act.)
Semantic Segmentation	DeeplabV3plus_mbnv2	GitHub Repo	Pretrained Model	See Example	2.4	(PascalVOC) mIOU	72.28	71.71	TBD
Semantic Segmentation	DeeplabV3plus_xception	GitHub Repo	Pretrained Model	See Example	2.4	(PascalVOC) mIOU	87.71	87.21	TBD

^[1] _{Model usage documentation}
^[2] _{Original FP32 model source}
^[3] _{FP32 model checkpoint}
^[4] _{Quantized Model: For models quantized with post-training technique, refers to FP32 model which can then be quantized using AIMET. For models optimized with QAT, refers to model checkpoint with fine-tuned weights. 8-bit weights and activations are typically used. For some models, 8-bit weights and 16-bit activations (INT8W/INT16Act.) are used to further improve performance of post-training quantization.}
^[5] _{Results comparing float and quantized performance}
^[6] _{W8A8 indicates 8-bit weights, 8-bit activations}
^[7] _{W4A8 indicates 4-bit weights, 8-bit activations (Some models include a mix of W4A8 and W8A8 layers).}
_{TBD indicates that support is NOT yet available}

Installation and Usage

Install AIMET

Before you can run the evaluation script for a specific model, you need to install the AI Model Efficiency ToolKit (AIMET) software. Please see this Getting Started page for an overview. Then install AIMET and its dependencies using these Installation instructions.

Install AIMET model zoo

Follow the instructions on this page to install the AIMET model zoo python package(s).

Run model evaluation

The evaluation scripts run floating-point and quantized evaluations that demonstrate improved quantized model performance through the use of AIMET techniques. They generate and display the final accuracy results (as documented in the table above). To access the documentation and procedures for a specific model, refer to the relevant .md within the subfolder in TensorFlow or PyTorch folders.

Team

AIMET Model Zoo is a project maintained by Qualcomm Innovation Center, Inc.

License

Please see the LICENSE file for details.

quic/aimet-model-zoo

quic

Reviews

Repository Details