• Stars
    star
    169
  • Rank 224,453 (Top 5 %)
  • Language
    Python
  • License
    Other
  • Created over 5 years ago
  • Updated almost 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Graph Transforms to Quantize and Retrain Deep Neural Nets in TensorFlow

Graph Transforms to Quantize and Retrain Deep Neural Nets in TensorFlow

Graffitist is a flexible and scalable framework built on top of TensorFlow to process low-level graph descriptions of deep neural networks (DNNs) for accurate and efficient inference on fixed-point hardware. It comprises of a (growing) library of transforms to apply various neural network compression techniques such as quantization, pruning, and compression. Each transform consists of unique pattern matching and manipulation algorithms that when run sequentially produce an optimized output graph.

Graffitist uses a novel technique for training quantization thresholds (TQT) using standard backpropagation and gradient descent, which results in highly accurate and efficient 8-bit and 4-bit quantized networks amenable to most generic fixed-point hardware. For details, please refer to our paper:

Trained Quantization Thresholds for Accurate and Efficient Fixed-Point Inference of Deep Neural Networks,
Sambhav R. Jain, Albert Gural, Michael Wu, Chris H. Dick,
arXiv preprint arXiv:1903.08066, 2019.

Graffitist stands on the shoulders of giants, and the interface is inspired in part by earlier tools from TensorFlow. It is developed with tight integration to the static data-flow graph semantics of TensorFlow. This comes with many benefits of a mature ML framework, such as strong low-level graph processing API, accelerated kernels for bit-accurate emulation, extensive pretrained model-zoo, large-scale distributed training, production readiness, clean documentation and great support from TensorFlow developers and the open-source community.

Contents


Quantization

Graffitist allows for quantization in two modes:

  1. Static Mode. Quantization thresholds (hence scale factors) are determined based on statistics of activations derived from a calibration dataset^. This results in quantized performance (INT8) that is usually competitive with floating-point baselines (FP32) without retraining. Note that while calibration can run on CPU within tens of minutes, use of GPU is recommended due to its ~O(n^2) runtime complexity.

  2. Retrain Mode. Quantization thresholds and weights are simultaneously trained (TQT method) for improved accuracy and further reduced precision (e.g. INT4). This approach yields highly accurate and compact DNN implementations on a fixed-point target. In many cases, INT8 retrained networks match FP32 accuracy while INT4 retrained networks reach within 1-3% of it depending on network topology. Recovery is achieved within 5 epochs of TQT.

^small randomly chosen subset of the validation set with appropriate pre-processing applied

Quantization scheme

For simplicity and ease of mapping on generic fixed-point hardware, the quantization scheme is constrained to use linear (affine symmetric) mapping with:

  • no zero-points
  • strict powers-of-2 scale-factors
  • per-tensor quantization (for both weights and activations)
  • mid-tread quantization

Supported ops

  • compute: Conv2D, MatMul, DepthwiseConv2dNative
  • normalization: FusedBatchNorm
  • activation: Relu, Relu6, LeakyRelu, Maximum (for leaky-relu)
  • scale preserving: ConcatV2, BiasAdd, Add (eltwise-add), Maximum (for leaky-relu)
  • pool: MaxPool, AvgPool, Mean
  • classifier: Softmax

Supported layer topologies

  • compute -> normalization/bias-add
  • compute -> normalization/bias-add -> activation
  • compute -> normalization/bias-add -> eltwise-add
  • compute -> normalization/bias-add -> eltwise-add -> activation (relu/relu6)

Supported optimizations

  • normalization layers following compute layers are folded in
  • concat-of-concat layers are collapsed
  • identity nodes without control edges are spliced
  • reduced mean is transformed to global avgpool with stride equal to input spatial size
  • avgpool is transformed to depthwise conv with reciprocal multiplier as weights
  • input scale sharing is enforced for concat, bias-add, eltwise-add and max ops
  • relu/relu6 outputs are quantized to uint8 instead of int8

Supported precisions

  • compute layers are quantized as q8( q16( sum( q8/4(w)*q8(x) ) ) + q16(b) )
  • leaky-relu is quantized as q8( max( q16(x), q16( q16(a)*q16(x) ) ) )
  • eltwise-add is quantized as q8( q8(x) + q8(y) )
  • avgpool is quantized as q8( sum( q8(r)*q8(x) ) )

Graffitist is in experimental stages as we continue to add support for more operation types, layer topologies, network styles, graph optimizations, and compression techniques. To request support for options not mentioned above, please submit an issue with details.

Imagenet performance

Network FP32 INT8 (static) FP32 (retrain wt) INT8 (retrain wt,th) INT4^ (retrain wt,th)
top1 / top5 top1 / top5 top1 / top5 [#ep] top1 / top5 [#ep] top1 / top5 [#ep]
vgg16 70.9 / 89.8 70.4 / 89.7 71.9 / 90.5 [1.0] 71.7 / 90.4 [0.9] 71.5 / 90.3 [4.0]
vgg19 71.0 / 89.8 70.4 / 89.7 71.8 / 90.4 [1.0] 71.7 / 90.4 [1.0] 71.2 / 90.1 [2.0]
resnet_v1_50 75.2 / 92.2 74.3 / 91.7 75.4 / 92.5 [3.7] 75.4 / 92.3 [1.9] 74.4 / 91.7 [2.0]
resnet_v1_101 76.4 / 92.9 74.8 / 92.0 76.6 / 93.2 [1.2] 76.4 / 93.1 [0.9] 75.7 / 92.5 [2.0]
resnet_v1_152 76.8 / 93.2 76.2 / 93.0 76.8 / 93.3 [1.0] 76.7 / 93.3 [1.4] 76.0 / 93.0 [1.9]
resnet_v1p5_50 76.5 / 93.1 75.6 / 92.7 76.5 / 93.1 [n/a] 76.4 / 93.1 [0.7] 75.1 / 92.6 [2.8]
inception_v1 69.8 / 89.6 68.6 / 88.9 70.3 / 90.0 [2.8] 70.7 / 90.2 [2.4] 67.2 / 88.2 [4.0]
inception_v2 74.0 / 91.8 73.1 / 91.3 74.3 / 92.2 [3.3] 74.4 / 92.4 [2.5] 71.9 / 90.8 [4.8]
inception_v3 78.0 / 93.9 76.8 / 93.3 78.3 / 94.2 [2.1] 78.3 / 94.3 [1.2] 76.4 / 93.1 [4.4]
inception_v4 80.2 / 95.2 79.4 / 94.6 80.2 / 95.2 [n/a] 80.1 / 95.2 [1.5] 78.9 / 94.7 [4.2]
mobilenet_v1 71.0 / 90.0 0.6 / 3.6 71.1 / 90.0 [3.4] 71.1 / 90.0 [2.1] n/a
mobilenet_v2 70.1 / 89.5 0.3 / 1.2 71.7 / 90.7 [3.2] 71.8 / 90.6 [2.2] n/a
darknet19 73.0 / 91.4 68.7 / 89.7 74.4 / 92.3 [3.1] 74.5 / 92.3 [1.8] 73.2 / 91.6 [2.8]

^INT4 weights, INT8 activations. First/last layer weights are kept as INT8.

Changelog

The following modifications were done before exporting the graph definitions to serialized TensorFlow protocol buffers:

  1. Removed dropout
  2. Removed auxiliary logits layers

Reproducibility

The top-1/top-5 accuracy metrics are evaluated on Imagenet validation set (50k images) downloaded from here and processed into TFRecord format using this script. We expect to see some degree of randomness (+-0.1%) between runs on:

  • different platforms
    • when dataset is processed using different image library versions
    • due to non-deterministic thread scheduling
    • due to driver version differences
  • same platform
    • when more than one valid topological orders are feasible
    • due to inexact floating point math

Citations

Please consider citing our work if you find it useful for your research.

Paper

@article{tqt2019,
  title={Trained Quantization Thresholds for Accurate and Efficient Fixed-Point Inference of Deep Neural Networks},
  author={Jain, Sambhav R and Gural, Albert and Wu, Michael and Dick, Chris H},
  journal={arXiv preprint arXiv:1903.08066},
  year={2019}
}

Framework

@misc{graffitist2019,
  author = {Xilinx},
  title = {Graffitist: Graph Transforms to Quantize and Retrain Deep Neural Nets in TensorFlow},
  year = {2019},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/Xilinx/graffitist}}
}

[back to ToC]


Python API

Graffitist uses a Python interface and is invoked as follows:

python graffitist/graffitize.pyc \
          --in_graph        <path_to_in_graph.pb> \
          --out_graph       <path_to_out_graph.pb> \
          --inputs          <input_node_name> \
          --outputs         <output_node_name> \
          --input_shape     <H,W,C> \
          --transforms      <list_of_transforms_to_apply>

For a full list of arguments and available transforms, use the help option: python graffitist/graffitize.pyc -h.

We also provide utility scripts for end-to-end training and validation of Graffitist quantized networks on ImageNet.

Training

python scripts/train_imagenet_tf.py \
          --data_dir        <path_to_tfrecords_dir> \
          --ckpt_dir        <path_to_ckpt_meta_dir> \
          --image_size      <size> \
          --batch_size_t    <N>

Validation

python scripts/validate_imagenet_tf.py \
          --data_dir        <path_to_tfrecords_dir> \
          --model_dir       <path_to_model_dir> \
          --image_size      <size> \
          --batch_size      <N>

Calibration set generation

python scripts/validate_imagenet_tf.py \
          --data_dir        <path_to_tfrecords_dir> \
          --model_dir       <path_to_model_dir> \
          --image_size      <size> \
          --calib_set_size  <N> \
          --gen_calib_set

[back to ToC]


Requirements

Graffitist is packaged with custom quantization kernels (C++/CUDA) that are pre-compiled on the following configuration:

  • Ubuntu 16.04
  • Python 3.6
  • TensorFlow 1.14 (CPU or GPU)
  • CUDA 10.0, cuDNN 7 (if GPU)

Kernels

To load the pre-compiled kernels, prepend the following code (update paths as necessary) to your Python scripts. For example, see the provided validation script.

import tensorflow as tf

cpu_kernel_path = './kernels/quantize_ops.so'
gpu_kernel_path = './kernels/quantize_ops_cuda.so'

if tf.test.is_built_with_cuda() and tf.test.is_gpu_available(cuda_only=True):
  tf.load_op_library(gpu_kernel_path)
else:
  tf.load_op_library(cpu_kernel_path)

Installation

Install Anaconda3

Download and install Anaconda3.

wget https://repo.anaconda.com/archive/Anaconda3-2019.07-Linux-x86_64.sh
bash ./Anaconda3-2019.07-Linux-x86_64.sh

When prompted, allow the installer to initialize Anaconda3 and setup your .bashrc. Then close and open a new bash shell to source the installation correctly.

Create a new conda environment tf1.14 with Python 3.6 interpreter and activate it.

conda create -n tf1.14 pip python=3.6
conda activate tf1.14

Install TensorFlow 1.14

CPU-only:

pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.14.0-cp36-cp36m-linux_x86_64.whl

GPU:

pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.14.0-cp36-cp36m-linux_x86_64.whl

(Optional) Install CUDA 10.0, cuDNN 7

The following NVIDIA® software is required to use TensorFlow with GPU support:

Install Graffitist

Clone Graffitist and install locally.

git clone https://github.com/Xilinx/graffitist.git
cd graffitist
pip install -e .

[back to ToC]


How to run

Prepare models

To get started, we provide a set of standard networks including graph descriptions (.pb/.meta), pre-trained FP32 weights (.ckpt) and calibration datasets (.npy) with applied preprocessing. Both eval and training graphs are included (for static and retrain mode respectively). This is because training specific layers (batch_normalization, dropout, etc) behave differently in the two modes.

Network Download
vgg16 static / retrain
vgg19 static / retrain
resnet_v1_50 static / retrain
resnet_v1_101 static / retrain
resnet_v1_152 static / retrain
resnet_v1p5_50 static / retrain
inception_v1 static / retrain
inception_v2 static / retrain
inception_v3 static / retrain
inception_v4 static / retrain
mobilenet_v1 static / retrain
mobilenet_v2 static / retrain
darknet19 static / retrain
yolo_v2 static
yolo_v2_tiny static

If using one of the provided models, simply download and extract to ./models/ dir and proceed to next step.

If bringing your own model (BYOM), ensure the following files are saved out to ./models/my_model_dir/:

my_model.pb
checkpoint
my_model.ckpt.data-00000-of-00001
my_model.ckpt.index
my_model.ckpt.meta (only needed in retrain mode)
calibration_set.npy

The graph .pb is a serialized TensorFlow protocol buffer containing the nodes and edges. Refer here for details on TensorFlow protocol buffers. Note that the input graph to Graffitist can contain frozen weights if static mode quantization is desired. For retrain mode, weights should NOT be frozen into the graph definition.

The checkpoint file points to the specific .ckpt to use. For example:

model_checkpoint_path: "my_model.ckpt"
all_model_checkpoint_paths: "my_model.ckpt"

Refer here for details on saving TensorFlow checkpoints.

The metagraph .meta contains the graph and training metadata (e.g. variable collections, weight regularization). It is only required in retrain mode. Refer here for details on exporting TensorFlow metagraph.

The calibration set .npy contains N randomly sampled images with applied data pre-processing, stored as a numpy array of shape [N, H, W, C]. See the included validation script for more details on generating a calibration set.

Set paths

Activate conda env and set paths:

conda activate tf1.14

groot=`find -name 'graffitize.pyc' -printf '%h\n'`
mroot=`find -name 'models'`

To retrain for INT4, set INT4_MODE as follows (ignored in static mode):

INT4_MODE="1"

Configure options

Choose a network and configure for either static or retrain mode:

Network Config
vgg16 static / retrain
vgg19 static / retrain
resnet_v1_50 static / retrain
resnet_v1_101 static / retrain
resnet_v1_152 static / retrain
resnet_v1p5_50 static / retrain
inception_v1 static / retrain
inception_v2 static / retrain
inception_v3 static / retrain
inception_v4 static / retrain
mobilenet_v1 static / retrain
mobilenet_v2 static / retrain
darknet19 static / retrain
yolo_v2 static
yolo_v2_tiny static

For a full list of default config options see examples.

If BYOM, set config options based on your model (similar to the examples provided).

Pick a recipe and run

Recipe 1: Optimized inference graph

This recipe applies various layer optimizations and pre-processing to generate a simplified inference graph (not quantized yet).

python $groot/graffitize.pyc \
    --in_graph $in_graph \
    --out_graph $opt_graph \
    --inputs $input_node \
    --outputs $output_node \
    --input_shape $input_shape \
    --transforms 'fix_input_shape' \
                 'fold_batch_norms' \
                 'remove_training_nodes' \
                 'strip_unused_nodes' \
                 'preprocess_layers'

Recipe 2: Quantized inference graph (static mode)

This recipe quantizes the inference graph using static calibration for efficient fixed-point implementation.

python $groot/graffitize.pyc \
    --in_graph $in_graph \
    --out_graph $quant_graph \
    --inputs $input_node \
    --outputs $output_node \
    --input_shape $input_shape \
    --transforms 'fix_input_shape' \
                 'fold_batch_norms' \
                 'remove_training_nodes' \
                 'strip_unused_nodes' \
                 'preprocess_layers' \
                 'quantize(weight_bits='$wb', activation_bits='$ab', layer_bits='$lb', relu_bits='$rb', avgpool_bits='$pb', avgpool_reciprocal_bits='$prb')'

Recipe 3: Quantized training graph (retrain mode)

Follow these steps to retrain quantized networks:

Step 1: Generate the quantized training graph.

python $groot/graffitize.pyc \
    --in_graph $in_metagraph \
    --out_graph $trainquant_graph \
    --inputs $input_node \
    --outputs $output_node \
    --input_shape $input_shape \
    --transforms 'fix_input_shape' \
                 'fold_batch_norms(is_training=True)' \
                 'preprocess_layers' \
                 'quantize(is_training=True, weight_bits='$wb', activation_bits='$ab', layer_bits='$lb', relu_bits='$rb', avgpool_bits='$pb', avgpool_reciprocal_bits='$prb', first_layer='$first_layer', last_layer='$last_layer')'

Step 2: Retrain. For the example networks, use the included training script. Once converged, ensure checkpoint points to the correct retrained ckpt.

Step 3: Generate the equivalent quantized inference graph using retrained variables from previous step.

python $groot/graffitize.pyc \
    --in_graph $in_graph \
    --out_graph $infquant_graph \
    --inputs $input_node \
    --outputs $output_node \
    --input_shape $input_shape \
    --transforms 'fix_input_shape' \
                 'fold_batch_norms' \
                 'remove_training_nodes' \
                 'strip_unused_nodes' \
                 'preprocess_layers' \
                 'quantize(calibrate_quant_layers=False, weight_bits='$wb', activation_bits='$ab', layer_bits='$lb', relu_bits='$rb', avgpool_bits='$pb', avgpool_reciprocal_bits='$prb', first_layer='$first_layer', last_layer='$last_layer')'

[back to ToC]


Common errors

Error Message Possible Fix
RuntimeError: Bad magic number in .pyc file Use correct Python version; see Requirements
tensorflow.python.framework.errors_impl.NotFoundError: graffitist/kernels/quantize_ops.so: undefined symbol: _ZN10tensorflow22CheckNotInComputeAsyncEPNS_15OpKernelContextEPKc Use correct TensorFlow version; see Requirements
tensorflow.python.framework.errors_impl.NotFoundError: graffitist/kernels/quantize_ops_cuda.so: undefined symbol: _ZN10tensorflow22CheckNotInComputeAsyncEPNS_15OpKernelContextEPKc Use correct TensorFlow version; see Requirements
ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory Use correct CUDA version with TF for GPU; see Requirements
RecursionError: maximum recursion depth exceeded in comparison Stack limit set to 3k (default: 1k) to avoid stack overflows for lack of tail-call optimization in Python; get in touch

[back to ToC]


Default model config

vgg16 (static)

mdir=$mroot/vgg16_slim_pretrained
in_graph=$mdir/vgg16_slim_pretrained.pb
opt_graph=$mdir/vgg16_slim_pretrained_opt.pb
quant_graph=$mdir/vgg16_slim_pretrained_quant.pb
input_node=input
output_node=vgg_16/fc8/squeezed
input_shape=224,224,3
wb=-8; ab=-8; lb=-16; rb=8; pb=8; prb=8;

[continue]

vgg16 (retrain)

mdir=$mroot/vgg16_slim_pretrained_train
in_metagraph=$mdir/vgg16_slim_pretrained.ckpt.meta
in_graph=$mdir/vgg16_slim_pretrained.pb
opt_graph=$mdir/vgg16_slim_pretrained_opt.pb
trainquant_graph=$mdir/vgg16_slim_pretrained_trainquant.pb
infquant_graph=$mdir/vgg16_slim_pretrained_infquant.pb
input_node=input
output_node=vgg_16/fc8/squeezed
input_shape=224,224,3
[ "$INT4_MODE" = 1 ] && wb=-4 || wb=-8; ab=-8; lb=-16; rb=8; pb=8; prb=8;
first_layer=vgg_16/conv1/conv1_1/Conv2D
last_layer=vgg_16/fc8/Conv2D

[continue]

vgg19 (static)

mdir=$mroot/vgg19_slim_pretrained
in_graph=$mdir/vgg19_slim_pretrained.pb
opt_graph=$mdir/vgg19_slim_pretrained_opt.pb
quant_graph=$mdir/vgg19_slim_pretrained_quant.pb
input_node=input
output_node=vgg_19/fc8/squeezed
input_shape=224,224,3
wb=-8; ab=-8; lb=-16; rb=8; pb=8; prb=8;

[continue]

vgg19 (retrain)

mdir=$mroot/vgg19_slim_pretrained_train
in_metagraph=$mdir/vgg19_slim_pretrained.ckpt.meta
in_graph=$mdir/vgg19_slim_pretrained.pb
opt_graph=$mdir/vgg19_slim_pretrained_opt.pb
trainquant_graph=$mdir/vgg19_slim_pretrained_trainquant.pb
infquant_graph=$mdir/vgg19_slim_pretrained_infquant.pb
input_node=input
output_node=vgg_19/fc8/squeezed
input_shape=224,224,3
[ "$INT4_MODE" = 1 ] && wb=-4 || wb=-8; ab=-8; lb=-16; rb=8; pb=8; prb=8;
first_layer=vgg_19/conv1/conv1_1/Conv2D
last_layer=vgg_19/fc8/Conv2D

[continue]

resnet_v1_50 (static)

mdir=$mroot/resnet_v1_50_slim_pretrained
in_graph=$mdir/resnet_v1_50_slim_pretrained.pb
opt_graph=$mdir/resnet_v1_50_slim_pretrained_opt.pb
quant_graph=$mdir/resnet_v1_50_slim_pretrained_quant.pb
input_node=input
output_node=resnet_v1_50/predictions/Softmax
input_shape=224,224,3
wb=-8; ab=-8; lb=-16; rb=8; pb=8; prb=8;

[continue]

resnet_v1_50 (retrain)

mdir=$mroot/resnet_v1_50_slim_pretrained_train
in_metagraph=$mdir/resnet_v1_50_slim_pretrained.ckpt.meta
in_graph=$mdir/resnet_v1_50_slim_pretrained.pb
opt_graph=$mdir/resnet_v1_50_slim_pretrained_opt.pb
trainquant_graph=$mdir/resnet_v1_50_slim_pretrained_trainquant.pb
infquant_graph=$mdir/resnet_v1_50_slim_pretrained_infquant.pb
input_node=input
output_node=resnet_v1_50/predictions/Softmax
input_shape=224,224,3
[ "$INT4_MODE" = 1 ] && wb=-4 || wb=-8; ab=-8; lb=-16; rb=8; pb=8; prb=8;
first_layer=resnet_v1_50/conv1/Conv2D
last_layer=resnet_v1_50/logits/Conv2D

[continue]

resnet_v1_101 (static)

mdir=$mroot/resnet_v1_101_slim_pretrained
in_graph=$mdir/resnet_v1_101_slim_pretrained.pb
opt_graph=$mdir/resnet_v1_101_slim_pretrained_opt.pb
quant_graph=$mdir/resnet_v1_101_slim_pretrained_quant.pb
input_node=input
output_node=resnet_v1_101/predictions/Softmax
input_shape=224,224,3
wb=-8; ab=-8; lb=-16; rb=8; pb=8; prb=8;

[continue]

resnet_v1_101 (retrain)

mdir=$mroot/resnet_v1_101_slim_pretrained_train
in_metagraph=$mdir/resnet_v1_101_slim_pretrained.ckpt.meta
in_graph=$mdir/resnet_v1_101_slim_pretrained.pb
opt_graph=$mdir/resnet_v1_101_slim_pretrained_opt.pb
trainquant_graph=$mdir/resnet_v1_101_slim_pretrained_trainquant.pb
infquant_graph=$mdir/resnet_v1_101_slim_pretrained_infquant.pb
input_node=input
output_node=resnet_v1_101/predictions/Softmax
input_shape=224,224,3
[ "$INT4_MODE" = 1 ] && wb=-4 || wb=-8; ab=-8; lb=-16; rb=8; pb=8; prb=8;
first_layer=resnet_v1_101/conv1/Conv2D
last_layer=resnet_v1_101/logits/Conv2D

[continue]

resnet_v1_152 (static)

mdir=$mroot/resnet_v1_152_slim_pretrained
in_graph=$mdir/resnet_v1_152_slim_pretrained.pb
opt_graph=$mdir/resnet_v1_152_slim_pretrained_opt.pb
quant_graph=$mdir/resnet_v1_152_slim_pretrained_quant.pb
input_node=input
output_node=resnet_v1_152/predictions/Softmax
input_shape=224,224,3
wb=-8; ab=-8; lb=-16; rb=8; pb=8; prb=8;

[continue]

resnet_v1_152 (retrain)

mdir=$mroot/resnet_v1_152_slim_pretrained_train
in_metagraph=$mdir/resnet_v1_152_slim_pretrained.ckpt.meta
in_graph=$mdir/resnet_v1_152_slim_pretrained.pb
opt_graph=$mdir/resnet_v1_152_slim_pretrained_opt.pb
trainquant_graph=$mdir/resnet_v1_152_slim_pretrained_trainquant.pb
infquant_graph=$mdir/resnet_v1_152_slim_pretrained_infquant.pb
input_node=input
output_node=resnet_v1_152/predictions/Softmax
input_shape=224,224,3
[ "$INT4_MODE" = 1 ] && wb=-4 || wb=-8; ab=-8; lb=-16; rb=8; pb=8; prb=8;
first_layer=resnet_v1_152/conv1/Conv2D
last_layer=resnet_v1_152/logits/Conv2D

[continue]

resnet_v1p5_50 (static)

mdir=$mroot/resnet_v1p5_50_estimator_pretrained
in_graph=$mdir/resnet_v1p5_50_estimator_pretrained.pb
opt_graph=$mdir/resnet_v1p5_50_estimator_pretrained_opt.pb
quant_graph=$mdir/resnet_v1p5_50_estimator_pretrained_quant.pb
input_node=input
output_node=resnet_model/Softmax
input_shape=224,224,3
wb=-8; ab=-8; lb=-16; rb=8; pb=8; prb=8;

[continue]

resnet_v1p5_50 (retrain)

mdir=$mroot/resnet_v1p5_50_estimator_pretrained_train
in_metagraph=$mdir/resnet_v1p5_50_estimator_pretrained.ckpt.meta
in_graph=$mdir/resnet_v1p5_50_estimator_pretrained.pb
opt_graph=$mdir/resnet_v1p5_50_estimator_pretrained_opt.pb
trainquant_graph=$mdir/resnet_v1p5_50_estimator_pretrained_trainquant.pb
infquant_graph=$mdir/resnet_v1p5_50_estimator_pretrained_infquant.pb
input_node=input
output_node=resnet_model/Softmax
input_shape=224,224,3
[ "$INT4_MODE" = 1 ] && wb=-4 || wb=-8; ab=-8; lb=-16; rb=8; pb=8; prb=8;
first_layer=resnet_model/conv2d/Conv2D
last_layer=resnet_model/dense/MatMul

[continue]

inception_v1 (static)

mdir=$mroot/inception_v1_bn_slim_pretrained
in_graph=$mdir/inception_v1_bn_slim_pretrained.pb
opt_graph=$mdir/inception_v1_bn_slim_pretrained_opt.pb
quant_graph=$mdir/inception_v1_bn_slim_pretrained_quant.pb
input_node=input
output_node=InceptionV1/Logits/Predictions/Softmax
input_shape=224,224,3
wb=-8; ab=-8; lb=-16; rb=8; pb=8; prb=8;

[continue]

inception_v1 (retrain)

mdir=$mroot/inception_v1_bn_slim_pretrained_train
in_metagraph=$mdir/inception_v1_bn_slim_pretrained.ckpt.meta
in_graph=$mdir/inception_v1_bn_slim_pretrained.pb
opt_graph=$mdir/inception_v1_bn_slim_pretrained_opt.pb
trainquant_graph=$mdir/inception_v1_bn_slim_pretrained_trainquant.pb
infquant_graph=$mdir/inception_v1_bn_slim_pretrained_infquant.pb
input_node=input
output_node=InceptionV1/Logits/Predictions/Softmax
input_shape=224,224,3
[ "$INT4_MODE" = 1 ] && wb=-4 || wb=-8; ab=-8; lb=-16; rb=8; pb=8; prb=8;
first_layer=InceptionV1/InceptionV1/Conv2d_1a_7x7/Conv2D
last_layer=InceptionV1/Logits/Conv2d_0c_1x1/Conv2D

[continue]

inception_v2 (static)

mdir=$mroot/inception_v2_slim_pretrained
in_graph=$mdir/inception_v2_slim_pretrained.pb
opt_graph=$mdir/inception_v2_slim_pretrained_opt.pb
quant_graph=$mdir/inception_v2_slim_pretrained_quant.pb
input_node=input
output_node=InceptionV2/Predictions/Softmax
input_shape=224,224,3
wb=-8; ab=-8; lb=-16; rb=8; pb=8; prb=8;

[continue]

inception_v2 (retrain)

mdir=$mroot/inception_v2_slim_pretrained_train
in_metagraph=$mdir/inception_v2_slim_pretrained.ckpt.meta
in_graph=$mdir/inception_v2_slim_pretrained.pb
opt_graph=$mdir/inception_v2_slim_pretrained_opt.pb
trainquant_graph=$mdir/inception_v2_slim_pretrained_trainquant.pb
infquant_graph=$mdir/inception_v2_slim_pretrained_infquant.pb
input_node=input
output_node=InceptionV2/Predictions/Softmax
input_shape=224,224,3
[ "$INT4_MODE" = 1 ] && wb=-4 || wb=-8; ab=-8; lb=-16; rb=8; pb=8; prb=8;
first_layer=InceptionV2/InceptionV2/Conv2d_1a_7x7/separable_conv2d/depthwise
last_layer=InceptionV2/Logits/Conv2d_1c_1x1/Conv2D

[continue]

inception_v3 (static)

mdir=$mroot/inception_v3_slim_pretrained
in_graph=$mdir/inception_v3_slim_pretrained.pb
opt_graph=$mdir/inception_v3_slim_pretrained_opt.pb
quant_graph=$mdir/inception_v3_slim_pretrained_quant.pb
input_node=input
output_node=InceptionV3/Predictions/Softmax
input_shape=299,299,3
wb=-8; ab=-8; lb=-16; rb=8; pb=8; prb=8;

[continue]

inception_v3 (retrain)

mdir=$mroot/inception_v3_slim_pretrained_train
in_metagraph=$mdir/inception_v3_slim_pretrained.ckpt.meta
in_graph=$mdir/inception_v3_slim_pretrained.pb
opt_graph=$mdir/inception_v3_slim_pretrained_opt.pb
trainquant_graph=$mdir/inception_v3_slim_pretrained_trainquant.pb
infquant_graph=$mdir/inception_v3_slim_pretrained_infquant.pb
input_node=input
output_node=InceptionV3/Predictions/Softmax
input_shape=299,299,3
[ "$INT4_MODE" = 1 ] && wb=-4 || wb=-8; ab=-8; lb=-16; rb=8; pb=8; prb=8;
first_layer=InceptionV3/InceptionV3/Conv2d_1a_3x3/Conv2D
last_layer=InceptionV3/Logits/Conv2d_1c_1x1/Conv2D

[continue]

inception_v4 (static)

mdir=$mroot/inception_v4_slim_pretrained
in_graph=$mdir/inception_v4_slim_pretrained.pb
opt_graph=$mdir/inception_v4_slim_pretrained_opt.pb
quant_graph=$mdir/inception_v4_slim_pretrained_quant.pb
input_node=input
output_node=InceptionV4/Logits/Predictions
input_shape=299,299,3
wb=-8; ab=-8; lb=-16; rb=8; pb=8; prb=8;

[continue]

inception_v4 (retrain)

mdir=$mroot/inception_v4_slim_pretrained_train
in_metagraph=$mdir/inception_v4_slim_pretrained.ckpt.meta
in_graph=$mdir/inception_v4_slim_pretrained.pb
opt_graph=$mdir/inception_v4_slim_pretrained_opt.pb
trainquant_graph=$mdir/inception_v4_slim_pretrained_trainquant.pb
infquant_graph=$mdir/inception_v4_slim_pretrained_infquant.pb
input_node=input
output_node=InceptionV4/Logits/Predictions
input_shape=299,299,3
[ "$INT4_MODE" = 1 ] && wb=-4 || wb=-8; ab=-8; lb=-16; rb=8; pb=8; prb=8;
first_layer=InceptionV4/InceptionV4/Conv2d_1a_3x3/Conv2D
last_layer=InceptionV4/Logits/Logits/MatMul

[continue]

mobilenet_v1 (static)

mdir=$mroot/mobilenet_v1_slim_pretrained
in_graph=$mdir/mobilenet_v1_slim_pretrained.pb
opt_graph=$mdir/mobilenet_v1_slim_pretrained_opt.pb
quant_graph=$mdir/mobilenet_v1_slim_pretrained_quant.pb
input_node=input
output_node=MobilenetV1/Predictions/Softmax
input_shape=224,224,3
wb=-8; ab=-8; lb=-16; rb=8; pb=8; prb=8;

[continue]

mobilenet_v1 (retrain)

mdir=$mroot/mobilenet_v1_slim_pretrained_train
in_metagraph=$mdir/mobilenet_v1_slim_pretrained.ckpt.meta
in_graph=$mdir/mobilenet_v1_slim_pretrained.pb
opt_graph=$mdir/mobilenet_v1_slim_pretrained_opt.pb
trainquant_graph=$mdir/mobilenet_v1_slim_pretrained_trainquant.pb
infquant_graph=$mdir/mobilenet_v1_slim_pretrained_infquant.pb
input_node=input
output_node=MobilenetV1/Predictions/Softmax
input_shape=224,224,3
[ "$INT4_MODE" = 1 ] && wb=-4 || wb=-8; ab=-8; lb=-16; rb=8; pb=8; prb=8;
first_layer=MobilenetV1/MobilenetV1/Conv2d_0/Conv2D
last_layer=MobilenetV1/Logits/Conv2d_1c_1x1/Conv2D

[continue]

mobilenet_v2 (static)

mdir=$mroot/mobilenet_v2_slim_pretrained
in_graph=$mdir/mobilenet_v2_slim_pretrained.pb
opt_graph=$mdir/mobilenet_v2_slim_pretrained_opt.pb
quant_graph=$mdir/mobilenet_v2_slim_pretrained_quant.pb
input_node=input
output_node=MobilenetV2/Predictions/Softmax
input_shape=224,224,3
wb=-8; ab=-8; lb=-16; rb=8; pb=8; prb=8;

[continue]

mobilenet_v2 (retrain)

mdir=$mroot/mobilenet_v2_slim_pretrained_train
in_metagraph=$mdir/mobilenet_v2_slim_pretrained.ckpt.meta
in_graph=$mdir/mobilenet_v2_slim_pretrained.pb
opt_graph=$mdir/mobilenet_v2_slim_pretrained_opt.pb
trainquant_graph=$mdir/mobilenet_v2_slim_pretrained_trainquant.pb
infquant_graph=$mdir/mobilenet_v2_slim_pretrained_infquant.pb
input_node=input
output_node=MobilenetV2/Predictions/Softmax
input_shape=224,224,3
[ "$INT4_MODE" = 1 ] && wb=-4 || wb=-8; ab=-8; lb=-16; rb=8; pb=8; prb=8;
first_layer=MobilenetV2/Conv/Conv2D
last_layer=MobilenetV2/Logits/Conv2d_1c_1x1/Conv2D

[continue]

darknet19 (static)

mdir=$mroot/darknet19_dw2tf_pretrained
in_graph=$mdir/darknet19.pb
opt_graph=$mdir/darknet19_opt.pb
quant_graph=$mdir/darknet19_quant.pb
input_node=darknet19/net1
output_node=darknet19/softmax1/Softmax
input_shape=256,256,3
wb=-8; ab=-8; lb=-16; rb=8; pb=-8; prb=8;

[continue]

darknet19 (retrain)

mdir=$mroot/darknet19_dw2tf_pretrained_train
in_metagraph=$mdir/darknet19.ckpt.meta
in_graph=$mdir/darknet19.pb
opt_graph=$mdir/darknet19_opt.pb
trainquant_graph=$mdir/darknet19_trainquant.pb
infquant_graph=$mdir/darknet19_infquant.pb
input_node=darknet19/net1
output_node=darknet19/softmax1/Softmax
input_shape=256,256,3
[ "$INT4_MODE" = 1 ] && wb=-4 || wb=-8; ab=-8; lb=-16; rb=8; pb=-8; prb=8;
first_layer=darknet19/convolutional1/Conv2D
last_layer=darknet19/convolutional19/Conv2D

[continue]

yolo_v2 (static)

mdir=$mroot/yolo_v2_dw2tf_pretrained
in_graph=$mdir/yolov2.pb
opt_graph=$mdir/yolov2_opt.pb
quant_graph=$mdir/yolov2_quant.pb
input_node=yolov2/net1
output_node=yolov2/convolutional23/BiasAdd
input_shape=608,608,3
wb=-8; ab=-8; lb=-16; rb=8; pb=8; prb=8;

[continue]

yolo_v2_tiny (static)

mdir=$mroot/yolo_v2_tiny_dw2tf_pretrained
in_graph=$mdir/yolov2-tiny.pb
opt_graph=$mdir/yolov2-tiny_opt.pb
quant_graph=$mdir/yolov2-tiny_quant.pb
input_node=yolov2-tiny/net1
output_node=yolov2-tiny/convolutional9/BiasAdd
input_shape=416,416,3
wb=-8; ab=-8; lb=-16; rb=8; pb=8; prb=8;

[continue]

[back to ToC]


More Repositories

1

PYNQ

Python Productivity for ZYNQ
Jupyter Notebook
1,975
star
2

Vitis-AI

Vitis AI is Xilinx’s development stack for AI inference on Xilinx hardware platforms, including both edge devices and Alveo cards.
Python
1,475
star
3

linux-xlnx

The official Linux kernel from Xilinx
C
1,205
star
4

brevitas

Brevitas: neural network quantization in PyTorch
Python
1,158
star
5

Vitis-Tutorials

Vitis In-Depth Tutorials
C
891
star
6

Vitis_Libraries

Vitis Libraries
C++
853
star
7

embeddedsw

Xilinx Embedded Software (embeddedsw) Development
HTML
766
star
8

finn

Dataflow compiler for QNN inference on FPGAs
Python
715
star
9

BNN-PYNQ

Quantized Neural Networks (QNNs) on PYNQ
Jupyter Notebook
661
star
10

XRT

Run Time for AIE and FPGA based platforms
C++
553
star
11

u-boot-xlnx

The official Xilinx u-boot repository
C
531
star
12

Vitis_Accel_Examples

Vitis_Accel_Examples
Makefile
503
star
13

Vitis-HLS-Introductory-Examples

C++
420
star
14

dma_ip_drivers

Xilinx QDMA IP Drivers
C
400
star
15

HLS

Vitis HLS LLVM source code and examples
378
star
16

Vitis-AI-Tutorials

362
star
17

PYNQ_Workshop

Jupyter Notebook
354
star
18

SDAccel_Examples

SDAccel Examples
C++
350
star
19

ml-suite

Getting Started with Xilinx ML Suite
Jupyter Notebook
334
star
20

CHaiDNN

HLS based Deep Neural Network Accelerator Library for Xilinx Ultrascale+ MPSoCs
C++
322
star
21

xfopencv

C++
313
star
22

XilinxTclStore

Xilinx Tcl Store
Tcl
310
star
23

mlir-aie

An MLIR-based toolchain for AMD AI Engine-enabled devices.
MLIR
300
star
24

XilinxBoardStore

Python
251
star
25

RapidWright

Build Customized FPGA Implementations for Vivado
Java
248
star
26

QNN-MO-PYNQ

Jupyter Notebook
236
star
27

libsystemctlm-soc

SystemC/TLM-2.0 Co-simulation framework
Verilog
210
star
28

qemu

Xilinx's fork of Quick EMUlator (QEMU) with improved support and modelling for the Xilinx platforms.
C
200
star
29

DPU-PYNQ

DPU on PYNQ
Tcl
198
star
30

device-tree-xlnx

Linux device tree generator for the Xilinx SDK (Vivado > 2014.1)
Tcl
181
star
31

finn-examples

Dataflow QNN inference accelerator examples on FPGAs
Python
174
star
32

PYNQ-ComputerVision

Computer Vision Overlays on Pynq
Jupyter Notebook
173
star
33

XilinxVirtualCable

Xilinx Virtual Cable (XVC) is a TCP/IP-based protocol that acts like a JTAG cable and provides a means to access and debug your FPGA or SoC design without using a physical cable.
C
172
star
34

finn-hlslib

Vitis HLS Library for FINN
C++
172
star
35

open-nic

AMD OpenNIC Project Overview
Shell
166
star
36

SDSoC-Tutorials

SDSoC™ (Software-Defined System-On-Chip) Environment Tutorials
C++
142
star
37

xilinx-tiny-cnn

C++
141
star
38

FPGA_as_a_Service

Go
136
star
39

xup_vitis_network_example

VNx: Vitis Network Examples
Jupyter Notebook
133
star
40

meta-xilinx

Collection of Yocto Project layers to enable AMD Xilinx products
C
123
star
41

systemctlm-cosim-demo

QEMU libsystemctlm-soc co-simulation demos.
C++
114
star
42

Vitis-In-Depth-Tutorial

C++
113
star
43

Vitis_Embedded_Platform_Source

Tcl
105
star
44

llvm-aie

Fork of LLVM to support AMD AIEngine processors
LLVM
104
star
45

SDAccel-Tutorials

SDAccel Development Environment Tutorials
C++
101
star
46

nanotube

LLVM
101
star
47

Embedded-Design-Tutorials

100
star
48

RecoNIC

RecoNIC is a software/hardware shell used to enable network-attached processing within an RDMA-featured SmartNIC for scale-out computing.
SystemVerilog
94
star
49

RFNoC-HLS-NeuralNet

CMake
92
star
50

PYNQ-DL

Xilinx Deep Learning IP
VHDL
91
star
51

Kria-PYNQ

PYNQ support and examples for Kria SOMs
Jupyter Notebook
90
star
52

PYNQ-HelloWorld

This repository contains a "Hello World" introduction application to the Xilinx PYNQ framework.
Jupyter Notebook
90
star
53

LSTM-PYNQ

C++
86
star
54

kria-vitis-platforms

Kria Vitis platforms and overlays
SystemVerilog
86
star
55

meta-petalinux

meta-petalinux distro layer supporting Xilinx Tools
BitBake
84
star
56

Vivado-Design-Tutorials

Tcl
83
star
57

SDSoC_Examples

C++
82
star
58

ACCL

Alveo Collective Communication Library: MPI-like communication operations for Xilinx Alveo accelerators
C++
81
star
59

logicnets

Python
80
star
60

IIoT-EDDP

The repository contains the design database and documentation for Electric Drives Demonstration Platform
VHDL
79
star
61

mlir-air

C++
77
star
62

AI-Model-Zoo

75
star
63

Applications

C
71
star
64

open-nic-shell

AMD OpenNIC Shell includes the HDL source files
SystemVerilog
70
star
65

XilinxUnisimLibrary

Xilinx Unisim Library in Verilog
Verilog
66
star
66

PYNQ_Composable_Pipeline

PYNQ Composabe Overlays
Tcl
64
star
67

merlin-compiler

C++
57
star
68

PYNQ_RFSOC_Workshop

Open-sourcing the PYNQ & RFSoC workshop materials
Jupyter Notebook
56
star
69

gemx

Matrix Operation Library for FPGA https://xilinx.github.io/gemx/
C++
56
star
70

xup_high_level_synthesis_design_flow

AMD University Program HLS tutorial
Jupyter Notebook
54
star
71

meta-xilinx-tools

Yocto Project layer enables AMD Xilinx tools related metadata for MicroBlaze, Zynq, ZynqMP and Versal devices.
BitBake
54
star
72

RFSoC-PYNQ

Python productivity for RFSoC platforms
Jupyter Notebook
52
star
73

Alveo-PYNQ

Introductory examples for using PYNQ with Alveo
Jupyter Notebook
48
star
74

ResNet50-PYNQ

Quantized ResNet50 Dataflow Acceleration on Alveo, with PYNQ
C++
48
star
75

xup_compute_acceleration

Hands-on experience using the Vitis unified software platform with Xilinx FPGA hardware
C++
46
star
76

Vitis_Model_Composer

Vitis Model Composer Examples and Tutorials
C++
46
star
77

KRS

The Kria Robotics Stack (KRS) is a ROS 2 superset for industry, an integrated set of robot libraries and utilities to accelerate the development, maintenance and commercialization of industrial-grade robotic solutions while using adaptive computing.
HTML
46
star
78

Vitis-AWS-F1-Developer-Labs

C++
45
star
79

PYNQ_Bootcamp

PYNQ Bootcamp 2019-2022 teaching materials.
Jupyter Notebook
44
star
80

PYNQ-Networking

Networking Overlay on PYNQ
Tcl
44
star
81

HLS_packet_processing

C++
43
star
82

blockchainacceleration

Tcl
43
star
83

Get_Moving_With_Alveo

For publishing the source for UG1352 "Get Moving with Alveo"
C++
42
star
84

DSRL

42
star
85

pcie-model

PCI Express controller model
C
41
star
86

XilinxCEDStore

This store contains Configurable Example Designs.
Tcl
41
star
87

inference-server

C++
41
star
88

HLS_arbitrary_Precision_Types

C++
40
star
89

Xilinx_Kria_KV260_Workshop

40
star
90

vcu-ctrl-sw

C
38
star
91

VVAS

Vitis Video Analytics SDK
C
38
star
92

chipscopy

ChipScoPy (ChipScope Python API) is an open source Python API to the various ChipScope services provided by the TCF-based (Target Communication Framework) ChipScope Server (cs_server).
Jupyter Notebook
38
star
93

pyxir

Python
36
star
94

xup_aie_training

Hands-on experience programming AI Engines using Vitis Unified Software Platform
Jupyter Notebook
36
star
95

DSP-PYNQ

A PYNQ overlay demonstrating Pythonic DSP running on Zynq UltraScale+
Tcl
36
star
96

hdmi-modules

Xilinx Soft-IP HDMI Rx/Tx core Linux drivers
C
35
star
97

pytorch-ocr

Python
35
star
98

open-nic-driver

AMD OpenNIC driver includes the Linux kernel driver
C
33
star
99

bootgen

bootgen source code
C++
33
star
100

qemu-devicetrees

Device trees used by QEMU to describe the hardware
Makefile
32
star