• Stars
    star
    112
  • Rank 310,447 (Top 7 %)
  • Language
    Python
  • Created over 5 years ago
  • Updated over 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Benchmark of TVM quantized model on CUDA

Benchmark of TVM quantized model on CUDA

This repository contains benchmark code of int8 inference speed of TVM for the blog post Automating Optimization of Quantized Deep Learning Models on CUDA. The benchmark of MXNet and TensorRT is provided as baseline.

How to Run

TVM

The benchmark is conducted using tvm@e22b58. (This is an outdated version. Please checkout this branch to run with recent tvm version.) LLVM and CUDA need to be enabled. Compute Capability 6.1 CUDA device is required to support the dp4a instruction.

We only provide auto-tuning logs on NVIDIA GTX 1080. To run on other devices, you can follow the AutoTVM tutorial to run auto-tuning.

python3 run_tvm.py --log_file logs/history_best_1080.log

MXNet

MXNet 1.4, cuDNN 7.3+ are required.

python3 run_mxnet.py

TensorRT

TensorRT 5 is required. We use onnx models as input. The onnx models will be generated from MXNet when running the benchmark script.

cd TensorRT; make; cd -;
python3 run_tensorrt.py

Result

benchmark results