APQ: Joint Search for Nerwork Architecture, Pruning and Quantization Policy
@inproceedings{Wang2020APQ,
title={APQ: Joint Search for Nerwork Architecture, Pruning and Quantization Policy},
author={Tianzhe Wang and Kuan Wang and Han Cai and Ji Lin and Zhijian Liu and Song Han},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2020}
}
Overview
We release the PyTorch code for the APQ. [Paper|Video|Competition]:
Jointly Search for Optimal Model
Save Orders of Magnitude Searching Cost
Better Performance than Sequential Design
How to Use
Prerequisites
- Pytorch version >= 1.0
- Python version >= 3.6
- Progress >= 1.5
- For getting new models, you'll need the NVIDIA GPU
Dataset and Model Preparation
- Download ImageNet dataset and put it into dataset/imagenet.
- Download checkpoints for quantization-aware predictor and once-for-all network, put them into models folder.
Codebase Structure
apq
- dataset (imagenet data path)
- elastic_nn (super network builder , w/ or w/o quantization)
- modules (define the layers, w/ or w/o quantization)
- networks (define the networks, w/ or w/o quantization)
utils.py (some utility functions for elastic_nn folder)
- models (quantzation-aware predictor and once-for-all network checkpoint path)
- imagenet_codebase (training codebase for imagenet)
- lut (latency lookup table path)
- methods (methods to find the mixed-precision network)
- evolution (evolution search code)
- utils (some utility functions, including converter)
accuracy_predictor.py (construction of accuracy predictor)
latency_predictor.py (construction of latency predictor)
converter.py (encode a subnetwork in to 1-hot vector)
quant-aware.py (code for quantization-aware training)
main.py
Readme.md
Testing
For instance, if you want to test the model under exps/test folder.
Run the following command:
CUDA_VISIBLE_DEVICES=0,1,2,3 python test.py \
--exp_dir=exps/test
You will get the exact information (latency/energy) running on BitFusion platform and ImageNet Top-1 accuracy.
Example
Evolution search
For instance, if you want to search a model under 12.80ms latency constraint.
Run the following command:
CUDA_VISIBLE_DEVICES=0 python search.py \
--mode=evolution \
--acc_predictor_dir=models \
--exp_name=test \
--constraint=12.80 \
--type=latency
You will get the candidate under the resource constraints (latency or energy), which is stored in exps/test folder.
Quantization-aware finetune on imagenet
For instance, if you want to quantization-aware finetuning for the model under exps/test folder.
Run the following command:
CUDA_VISIBLE_DEVICES=0,1,2,3 python quant_aware.py \
--exp_name=test
You will get a mixed-precision model under the resource constraints (latency or energy) with considerable performance.
Models
We provide the checkpoints for our APQ reported in the paper:
Latency | Energy | BitOps | Accuracy | Model |
---|---|---|---|---|
6.11ms | 9.14mJ | 12.7G | 72.8% | download |
8.45ms | 11.81mJ | 14.6G | 73.8% | download |
8.40ms | 12.18mJ | 16.5G | 74.1% | download |
12.17ms | 14.14mJ | 23.6G | 75.1% | download |
You can download the models and put it into exps folder to test the performance. Note that the bold item means the search under that constraint.
Related work on automated model compression and acceleration:
Once for All: Train One Network and Specialize it for Efficient Deployment (ICLR'20, code)
ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware (ICLR’19)
AMC: AutoML for Model Compression and Acceleration on Mobile Devices (ECCV’18)
HAQ: Hardware-Aware Automated Quantization (CVPR’19, oral)
Defenstive Quantization: When Efficiency Meets Robustness (ICLR'19)