• Stars
    star
    315
  • Rank 130,154 (Top 3 %)
  • Language
    C++
  • License
    Other
  • Created over 6 years ago
  • Updated almost 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

HLS based Deep Neural Network Accelerator Library for Xilinx Ultrascale+ MPSoCs

CHaiDNN-v2

Analysis and Eval
Supported Layers Performance/Resource Utilization
Performance Eval
Design and Development
API Reference Quantization User Guide for CHaiDNN Model Zoo Running Inference on new Network
Creating SDx GUI Project Configurable Parameters Custom Platform Generation Software Layer Plugin
SDSoC Environment User Guide Hardware-Software Partitioning for Performance

Introduction

CHaiDNN is a Xilinx Deep Neural Network library for acceleration of deep neural networks on Xilinx UltraScale MPSoCs. It is designed for maximum compute efficiency at 6-bit integer data type. It also supports 8-bit integer data type.

The design goal of CHaiDNN is to achieve best accuracy with maximum performance. The inference on CHaiDNN works in fixed point domain for better performance. All the feature maps and trained parameters are converted from single precision to fixed point based on the precision parameters specified by the user. The precision parameters can vary a lot depending upon the network, datasets, or even across layers in the same network. Accuracy of a network depends on the precision parameters used to represent the feature maps and trained parameters. Well-crafted precision parameters are expected to give accuracy similar to accuracy obtained from a single precision model.

What's new in CHaiDNN-v2

  • 4x GOPS compared to CHaiDNN-v1 (2017.4) (Performance numbers)

  • 2x MAC on DSPs at int6

  • Double-Pumped DSPs allowing the DSPs to be clocked at twice the core clock (Some configs can go upto 350/700Mhz)

  • Introducing DietChai - A miniature version of CHai for smaller MPSoC/ Zynq devices

  • 128, 256, 512, 1024 DSP design configs verified for ZU9

  • Support for URAM

  • 128, 256, 512 DSP configs verified for ZU7

  • ModelZoo of 6 networks at int8 and int6 precision

  • Support for two quantization modes - Dynamic fixed point and Xilinx Quantizer

  • Enhanced API to enable better hardware- software partitioning for users

  • Support for software custom layer plug-ins

  • Fully Connected layers on CPU

  • More documentation

Performance Benchmarks(fps)

Network Xilinx CHai w/ 1024DSP @ 250/500MHz (Measured on ZU9) Nvidia Jetson TX2 @ 1.3GHz*
GoogleNet-6bit w/o FC 220 Googlenet-16FP: 201
GoogleNet-6bit w/ FC 207
GoogleNet-8bit w/o FC 151
GoogleNet-8bit w/ FC 145
Alexnet-6bit w/o FC 606 Alexnet-16FP: 250
Alexnet-6bit w/ FC 10
Alexnet-8bit w/o FC 390
Alexnet-8bit w/ FC 10

* Source: https://devblogs.nvidia.com/jetson-tx2-delivers-twice-intelligence-edge/

Hardware and Software Requirements

The CHaiDNN library is designed to work with Zynq UltraScale+ MPSoCs. The library has been verified on zcu102 and zcu104 boards. Xilinx SDSoC 2018.2 Development Environment is required to work with the library.

How to Download the Repository

To get a local copy of the CHaiDNN repository, configure git-lfs and then, clone this repository to the local system with the following command:

git clone https://github.com/Xilinx/CHaiDNN.git CHaiDNN

Where CHaiDNN is the name of the directory where the repository will be stored on the local system. This command needs to be executed only once to retrieve the latest version of the CHaiDNN library.

GitHub Repository Structure
CHaiDNN/
|
|-- CONTRIBUTING.md
|-- LICENSE
|-- README.md
|-- SD_Card
|   |-- lib
|   |-- cblas
|   |-- images
|   |-- opencv
|   |-- protobuf
|   |-- zcu102
|   `-- zcu104
|-- design
|   |-- build
|   |-- conv
|   |-- deconv
|   |-- pool
|   `-- wrapper
|-- docs
|   |-- API.md
|   |-- BUILD_USING_SDX_GUI.md
|   |-- CONFIGURABLE_PARAMS.md
|   |-- CUSTOM_PLATFORM_GEN.md
|   |-- HW_SW_PARTITIONING.md
|   |-- MODELZOO.md
|   |-- PERFORMANCE_SNAPSHOT.md
|   |-- QUANTIZATION.md
|   |-- RUN_NEW_NETWORK.md
|   |-- SOFTWARE_LAYER_PLUGIN.md
|   |-- SUPPORTED_LAYERS.md
|   `-- images
|-- software
|   |-- bufmgmt
|   |-- checkers
|   |-- common
|   |-- custom
|   |-- example
|   |-- imageread
|   |-- include
|   |-- init
|   |-- interface
|   |-- scheduler
|   |-- scripts
|   |-- swkernels
|   `-- xtract
`-- tools
    |-- SETUP_TOOLS.md
    `-- tools.zip

Run Inference

Using Pre-built binaries

To run inference on example networks, follow these steps:

  1. Download the example network 6-bit GoogleNet with Xilinx Quantization scheme. More networks are available as part of the ModelZoo.

  2. Place the downloaded and unzipped contents at "SD_Card/models" directory. Create SD_Card/models directory if not present already.

  3. Copy the required contents of "SD_Card" folder into a SD-Card.

    • opencv
    • protobuf
    • cblas
    • images
    • bit-stream, boot loader, lib & executables (either from SD_Card/zcu102 or SD_Card/zcu104)
  4. Insert the SD-Card and power ON the board.

    πŸ“Œ NOTE: A serial port emulator (Teraterm/Minicom) is required to interface the user commands to the board

  5. Attach a USB-UART cable from the board to the host PC. Set the UART serial port to

    Baud rate: 115200
    Data: 8 bit
    Parity: none
    Stop: 1 bit
    Flow control: none
    
  6. After boot sequence, set LD_LIBRARY_PATH env variable.

    export OPENBLAS_NUM_THREADS=2
    export LD_LIBRARY_PATH=lib/:opencv/arm64/lib/:protobuf/arm64/lib:cblas/arm64/lib
  7. Create a folder "out" inside the network directory to save the outputs sh cd /mnt mkdir models/<network>/out

  8. Execute "*.elf" file to run inference

    • The format for running these example networks is described below:
      ./<example network>.elf <quantization scheme> <bit width> <img1_path> <img2_path>
    • For GoogleNet 6-bit inference with Xilinx quantization scheme execute the following
      ./googlenet.elf Xilinx 6 images/camel.jpg images/goldfish.JPEG
  9. Sync after execution

    cd /
    sync
    umount /mnt
  10. Output will be written into text file inside respective output folders.

    Ex : models/<network>/out
    

πŸ“Œ NOTE: Failing to run sync might corrupt the file system and cause crash on subsequent runs.

πŸ“Œ NOTE: For running inference on a new network, please follow the instructions in Run new Network using CHaiDNN.

Build from Source

CHaiDNN can be built using Makefiles OR using SDx IDE. The below steps describe how to build CHaiDNN using Makefiles. For steps to build using SDx IDE, see the instructions in Build using SDx IDE.

Build CHaiDNN Hardware

Please follow the steps to build the design for zcu102 (ZU9 device based board)

  1. Please generate a custom platform with 1x and 2x clocks using the steps described here. With Chai-v2, we now have the DSPs operating at twice the frequency of the rest of the core.

  2. Go to CHaiDNN/design/build folder.

  3. Set SDx tool environment

    • For BASH:
      source <SDx Installation Dir>/installs/lin64/SDx/2018.2/settings64.sh
    • For CSH
      source <SDx Installation Dir>/installs/lin64/SDx/2018.2/settings64.csh
  4. To build the design, run Makefile. (By default this will build 1024 DSP design @ 200/400 MHz)

    make ultraclean
    make

    πŸ“Œ NOTE:

    • To build DietChai, run make DIET_CHAI_Z=1. This builds a design with 128 compute DSPs and 64-bit AXI interface. Run make DIET_CHAI_ZUPLUS=1 to build a design with 128 compute DSPs and 128-bit AXI interface.
    • To exclude deconv Kernel, set DECONV_ENABLE=0 in Makefile. Default is DECONV_ENABLE=1.
    • To exclude Pool Kernel, set POOL_ENABLE=0 in Makefile. With this setting, Pooling functionality embedded in Convolution accelerator is used. Default is POOL_ENABLE=1.
    • When building DietChai, do not change POOL_ENABLE, DECONV_ENABLE values in Makefile.
  5. After the build is completed, copy the libxlnxdnn.so file and other build files (BOOT.BIN, image.ub and _sds directory) inside build/sd_card to SD_Card directory.

    make copy
  6. The hardware setup is now ready.

πŸ“Œ NOTE:

  • The 1024 DSP config was timing closed at 250/500Mhz with an iterative synthesis and P&R strategy. In the first iteration, the design was taken through the SDx flow (all the way till the bitstream Gen) at 200/400Mhz. In the second iteration the post-routed design from the first iteration was re-routed at 250/500Mhz. We believe that this is a general strategy that can be applied for other configs also. We would definitely like to hear from you on this if you are able to crank the frequency further up on other configs with this strategy.
  • Please note that when you try building some of the configs that are mentioned in the performance table, you might see some negative slack reported by the tools but we encourage you to try the bitstreams generated on hardware for functionality. These timing closure issues can be cleaned up with some special synthesis and P&R strategies. (You are welcome to try the timing-closure strategies that have worked for you in the past on other designs.)
Build CHaiDNN Software

Follow the steps to compile the software stack.

  1. Copy libxlnxdnn.so to SD_Card/lib directory. The libxlnxdnn.so file can be found in the design/build/sd_card directory once the HW build is finished. You can skip this step if have already copied the libxlnxdnn.so file to the suggested directory.

  2. Set the SDx tool environment.

    • CSH
      source <SDx Installation Dir>/installs/lin64/SDx/2018.2/settings64.csh
    • BASH
      source <SDx Installation Dir>/installs/lin64/SDx/2018.2/settings64.sh
  3. Go to the software directory. This contains all the files to generate software libraries (.so).

    cd <path to CHaiDNN>/software
  4. Go to scripts directory, open Makefile and update the SDx_BUILD_PATH variable. See example below.

    SDx_BUILD_PATH = <SDx Installation Dir>/installs/lin64/SDx/2018.2
    
  5. Now run the following commands.

    make ultraclean
    make

    πŸ“Œ NOTE:

    • To build DietChai, run make DIET_CHAI_Z=1. This builds a design with 128 compute DSPs and 64-bit AXI interface. Run make DIET_CHAI_ZUPLUS=1 to build a design with 128 compute DSPs and 128-bit AXI interface.
    • To exclude deconv Kernel, set DECONV_ENABLE=0 in Makefile. Default is DECONV_ENABLE=1.
    • To exclude Pool Kernel, set POOL_ENABLE=0 in Makefile. With this setting, Pooling functionality embedded in Convolution accelerator is used. Default is POOL_ENABLE=1.
    • When building DietChai, do not change POOL_ENABLE, DECONV_ENABLE values in Makefile.

    πŸ“Œ NOTE: Ensure that the software and the hardware are build with the same settings.

  6. Make will copy all executables to SD_Card directory and all .so files to SD_Card/lib directory.

  7. Now, we are set to run inference. Follow the steps mentioned in "run inference using pre-build binaries"

Additional Resources and Support

For questions and to get help on this project or your own projects, visit the CHaiDNN Github Issues.

License and Contributing to the Repository

The source for this project is licensed under the Apache License 2.0

To contribute to this project, follow the guidelines in the Repository Contribution README

Acknowledgements
Revision History
Date Readme Version Release Notes Tool Version
Feb, 2018 1.0 Initial Xilinx release SDx-2017.4
June, 2018 2.0 CHaiDNN-v2 SDx-2018.2
Deprecated Features
  • 16-bit activations

CopyrightΒ© 2018 Xilinx

More Repositories

1

PYNQ

Python Productivity for ZYNQ
Jupyter Notebook
1,894
star
2

Vitis-AI

Vitis AI is Xilinx’s development stack for AI inference on Xilinx hardware platforms, including both edge devices and Alveo cards.
Python
1,400
star
3

linux-xlnx

The official Linux kernel from Xilinx
C
1,205
star
4

brevitas

Brevitas: neural network quantization in PyTorch
Python
1,107
star
5

Vitis-Tutorials

Vitis In-Depth Tutorials
C
891
star
6

Vitis_Libraries

Vitis Libraries
C++
818
star
7

embeddedsw

Xilinx Embedded Software (embeddedsw) Development
HTML
766
star
8

finn

Dataflow compiler for QNN inference on FPGAs
Python
679
star
9

BNN-PYNQ

Quantized Neural Networks (QNNs) on PYNQ
Jupyter Notebook
652
star
10

u-boot-xlnx

The official Xilinx u-boot repository
C
531
star
11

XRT

Run Time for AIE and FPGA based platforms
C++
529
star
12

Vitis_Accel_Examples

Vitis_Accel_Examples
Makefile
467
star
13

Vitis-HLS-Introductory-Examples

C++
420
star
14

dma_ip_drivers

Xilinx QDMA IP Drivers
C
400
star
15

HLS

Vitis HLS LLVM source code and examples
375
star
16

Vitis-AI-Tutorials

354
star
17

PYNQ_Workshop

Jupyter Notebook
354
star
18

SDAccel_Examples

SDAccel Examples
C++
350
star
19

ml-suite

Getting Started with Xilinx ML Suite
Jupyter Notebook
334
star
20

xfopencv

C++
313
star
21

XilinxTclStore

Xilinx Tcl Store
Tcl
310
star
22

mlir-aie

An MLIR-based toolchain for AMD AI Engine-enabled devices.
MLIR
257
star
23

RapidWright

Build Customized FPGA Implementations for Vivado
Java
248
star
24

QNN-MO-PYNQ

Jupyter Notebook
234
star
25

XilinxBoardStore

Python
224
star
26

libsystemctlm-soc

SystemC/TLM-2.0 Co-simulation framework
Verilog
200
star
27

qemu

Xilinx's fork of Quick EMUlator (QEMU) with improved support and modelling for the Xilinx platforms.
C
200
star
28

DPU-PYNQ

DPU on PYNQ
Tcl
189
star
29

device-tree-xlnx

Linux device tree generator for the Xilinx SDK (Vivado > 2014.1)
Tcl
181
star
30

PYNQ-ComputerVision

Computer Vision Overlays on Pynq
Jupyter Notebook
173
star
31

XilinxVirtualCable

Xilinx Virtual Cable (XVC) is a TCP/IP-based protocol that acts like a JTAG cable and provides a means to access and debug your FPGA or SoC design without using a physical cable.
C
172
star
32

finn-hlslib

Vitis HLS Library for FINN
C++
168
star
33

graffitist

Graph Transforms to Quantize and Retrain Deep Neural Nets in TensorFlow
Python
168
star
34

open-nic

AMD OpenNIC Project Overview
Shell
166
star
35

finn-examples

Dataflow QNN inference accelerator examples on FPGAs
Jupyter Notebook
163
star
36

SDSoC-Tutorials

SDSoCβ„’ (Software-Defined System-On-Chip) Environment Tutorials
C++
142
star
37

xilinx-tiny-cnn

C++
140
star
38

FPGA_as_a_Service

Go
136
star
39

xup_vitis_network_example

VNx: Vitis Network Examples
Jupyter Notebook
124
star
40

meta-xilinx

Collection of Yocto Project layers to enable AMD Xilinx products
C
123
star
41

Vitis-In-Depth-Tutorial

C++
113
star
42

systemctlm-cosim-demo

QEMU libsystemctlm-soc co-simulation demos.
C++
106
star
43

Vitis_Embedded_Platform_Source

Tcl
105
star
44

SDAccel-Tutorials

SDAccel Development Environment Tutorials
C++
101
star
45

nanotube

LLVM
101
star
46

RFNoC-HLS-NeuralNet

CMake
92
star
47

Embedded-Design-Tutorials

91
star
48

PYNQ-DL

Xilinx Deep Learning IP
VHDL
91
star
49

PYNQ-HelloWorld

This repository contains a "Hello World" introduction application to the Xilinx PYNQ framework.
Jupyter Notebook
90
star
50

LSTM-PYNQ

C++
86
star
51

Vivado-Design-Tutorials

Tcl
83
star
52

SDSoC_Examples

C++
82
star
53

Kria-PYNQ

PYNQ support and examples for Kria SOMs
Jupyter Notebook
82
star
54

meta-petalinux

meta-petalinux distro layer supporting Xilinx Tools
BitBake
82
star
55

kria-vitis-platforms

Kria KV260 Vitis platforms and overlays
SystemVerilog
81
star
56

IIoT-EDDP

The repository contains the design database and documentation for Electric Drives Demonstration Platform
VHDL
79
star
57

logicnets

Python
78
star
58

AI-Model-Zoo

75
star
59

RecoNIC

RecoNIC is a software/hardware shell used to enable network-attached processing within an RDMA-featured SmartNIC for scale-out computing.
SystemVerilog
75
star
60

ACCL

Alveo Collective Communication Library: MPI-like communication operations for Xilinx Alveo accelerators
C++
75
star
61

open-nic-shell

AMD OpenNIC Shell includes the HDL source files
SystemVerilog
70
star
62

mlir-air

C++
70
star
63

Applications

C
67
star
64

llvm-aie

Fork of LLVM to support AMD AIEngine processors
LLVM
66
star
65

XilinxUnisimLibrary

Xilinx Unisim Library in Verilog
Verilog
64
star
66

PYNQ_Composable_Pipeline

PYNQ Composabe Overlays
Tcl
61
star
67

gemx

Matrix Operation Library for FPGA https://xilinx.github.io/gemx/
C++
56
star
68

PYNQ_RFSOC_Workshop

Open-sourcing the PYNQ & RFSoC workshop materials
Jupyter Notebook
55
star
69

merlin-compiler

C++
52
star
70

meta-xilinx-tools

Yocto Project layer enables AMD Xilinx tools related metadata for MicroBlaze, Zynq, ZynqMP and Versal devices.
BitBake
50
star
71

RFSoC-PYNQ

Python productivity for RFSoC platforms
Jupyter Notebook
49
star
72

ResNet50-PYNQ

Quantized ResNet50 Dataflow Acceleration on Alveo, with PYNQ
C++
48
star
73

Alveo-PYNQ

Introductory examples for using PYNQ with Alveo
Jupyter Notebook
47
star
74

xup_compute_acceleration

Hands-on experience using the Vitis unified software platform with Xilinx FPGA hardware
C++
46
star
75

xup_high_level_synthesis_design_flow

AMD Xilinx University Program HLS tutorial
C
46
star
76

Vitis_Model_Composer

Vitis Model Composer Examples and Tutorials
C++
46
star
77

Vitis-AWS-F1-Developer-Labs

C++
44
star
78

PYNQ_Bootcamp

PYNQ Bootcamp 2019-2022 teaching materials.
Jupyter Notebook
44
star
79

PYNQ-Networking

Networking Overlay on PYNQ
Tcl
44
star
80

KRS

The Kria Robotics Stack (KRS) is a ROS 2 superset for industry, an integrated set of robot libraries and utilities to accelerate the development, maintenance and commercialization of industrial-grade robotic solutions while using adaptive computing.
HTML
43
star
81

Get_Moving_With_Alveo

For publishing the source for UG1352 "Get Moving with Alveo"
C++
42
star
82

blockchainacceleration

Tcl
42
star
83

HLS_packet_processing

C++
41
star
84

HLS_arbitrary_Precision_Types

C++
40
star
85

DSRL

40
star
86

inference-server

C++
40
star
87

Xilinx_Kria_KV260_Workshop

39
star
88

chipscopy

ChipScoPy (ChipScope Python API) is an open source Python API to the various ChipScope services provided by the TCF-based (Target Communication Framework) ChipScope Server (cs_server).
Jupyter Notebook
38
star
89

VVAS

Vitis Video Analytics SDK
C
37
star
90

vcu-ctrl-sw

C
36
star
91

XilinxCEDStore

This store contains Configurable Example Designs.
Tcl
36
star
92

pyxir

Python
36
star
93

pytorch-ocr

Python
35
star
94

DSP-PYNQ

A PYNQ overlay demonstrating Pythonic DSP running on Zynq UltraScale+
Tcl
35
star
95

xup_aie_training

Hands-on experience programming AI Engines using Vitis Unified Software Platform
Jupyter Notebook
34
star
96

open-nic-driver

AMD OpenNIC driver includes the Linux kernel driver
C
33
star
97

pcie-model

PCI Express controller model
C
32
star
98

qemu-devicetrees

Device trees used by QEMU to describe the hardware
Makefile
32
star
99

bootgen

bootgen source code
C++
31
star
100

hdmi-modules

Xilinx Soft-IP HDMI Rx/Tx core Linux drivers
C
30
star