• Stars
    star
    116
  • Rank 303,894 (Top 6 %)
  • Language
    Python
  • License
    MIT License
  • Created almost 4 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

[FPGA 2022, Best Paper Award] Parallel placement and routing of Vivado HLS dataflow designs.

About

  • RapidStream takes in a Vivado HLS dataflow design, then generates a fully placed and routed checkpoint.

  • RapidStream adopts a divide-and-conquer approach at the behavior level that achieves 5-7X speed up compared to the vanilla Vivado flow.

  • The key insight of RapidStream is to utilize the fact that we can additionally pipeline the FIFO connections, which create additional flexibility for split placement and routing without timing degradation.

  • More details could be found in our FPGA 2022 paper that is also included in the repo:

@inproceedings{guo22rapidstream,
  title={RapidStream: Parallel Physical Implementation of FPGA HLS Designs},
  author={Licheng Guo, Pongstorn Maidee, Yun Zhou, Chris Lavin, Jie Wang, Yuze Chi, Weikang Qiao, Alireza Kaviani, Zhiru Zhang, Jason Cong},
  year={2022},
  booktitle={Proceedings of the 2022 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA '22)}
}

Highlights

  • This figure shows (1) the number of active cores in a vanilla Vivado flow, (2) Vivado runtime V.S. the number of threads.

alt text

  • In comparison, this figure shows the statistics of RapidStream:

alt text

  • Here is the runtime and frequency comparison between RapidStream and Vivado:

alt text

  • To achieve such improvement, RapidStream go through the following steps.
    • The key is to make good use of the pipelining flexibility of dataflow designs.

  • This figure shows the input and output of each phase:
    • Phase 1 floorplans and re-builds the hierarchy of the HLS-generated RTL, then adds pipelining.
    • Phase 2 places and routes each island in parallel while ensuring the interface of neighbor islands align.
    • Phase 3 stitch the islands together and route the inter-island nets

Install

  • The script has been tested on ubuntu 18.04.

  • Python 3.6+ required.

  • Please install on EVERY server that will be used for the distributed execution.

bash install.sh

Examples

There are 6 examples to demonstrate the flow. Each one includes:

  • The HLS source code and the HLS synthesis project.

  • A reference configuration file.

  • A one-click script to run the whole flow.

  • Note that you need to update the environment variables in the script.

  • You should edit the SERVER_LIST variable to include the IP address/acronym of your server fleet. You could use whatever number of servers you want.

    • If you only provide 1 server, then all parallel tasks will be performed on it, which will use lots of memory. We recommend that you should have at least 256 GB of memory to run everything on 1 server.

    • In our experiments, we distribute the workloads to 4 servers, each with the 56-core Inter Xeon E5-2680 v4 CPU at 2.40GHz and 128 GB of memory.

  • To help you reproduce the results as in the paper, we include a reference floorplanning result as this step is non-deterministic.

    • To re-run the Phase 1 floorplanning process from scratch, delete the "ResultReuse" field in the JSON configuration file.

File Organizations

  • examples provide six benchmarks and one-click script to run RapidStream.

  • python/rapidstream contains the main implementation of RapidStream.

    • python/rapidstream/FE corresponds to the front end transformation on the HLS-generated RTL (Phase 1 in the paper)
    • python/rapidstream/BE corresponds to the back end parallel implementation (Phase 2, 3 in the paper)
  • java/ contains tools implemented in RapidWright, including the checkpoint stitcher for Phase 3 (java/mergeDCP.java )

  • bash/ include scripts to glue together various part of the flow.

    • bash/run_back_end.sh is the main flow of Phase 2 and 3.

Next Step

  • Currently RapidStream results cannot run on board as we are not compatible with the Vitis workflow.

  • The next step is to develop a customized IO shell so that the RapidStream bitstream could communicate with the host.

More Repositories

1

AutoSA

AutoSA: Polyhedral-Based Systolic Array Compiler
C++
191
star
2

AutoBridge

[FPGA 2021, Best Paper Award] An automated floorplanning and pipelining tool for Vivado HLS.
C++
119
star
3

FlexCNN

C++
65
star
4

GNN-DSE

DAC'22 paper: "Automated Accelerator Optimization Aided by Graph Neural Networks"
LLVM
36
star
5

minimap2-acceleration

Hardware Acceleration of Long Read Pairwise Overlapping in Genome Sequencing: Open Source Repository
C
30
star
6

hbmbench

C++
23
star
7

blaze

Blaze runtime system that support efficient accelerator integration for big data.
C++
23
star
8

AutoDSE

ACM TODAES Best Paper Award, 2022
Python
22
star
9

splag

Accelerating SSSP for power-law graphs using an FPGA.
C++
21
star
10

recut

Large-scale medical image processing and reconstruction toolbox
C++
18
star
11

DPQA

Python
17
star
12

CLINK

Compact LSTM inference kernel (CLINK) designed in C/HLS for FPGA implementation.
C++
17
star
13

soda-compiler

Stencil with Optimized Dataflow Architecture Compiler
Python
16
star
14

heterohalide

HeteroHalide: From Image Processing DSL to Efficient FPGA Acceleration
C++
13
star
15

HT-Deflate-FPGA

Verilog
13
star
16

HARP

ICCAD'23 Best Paper Award candidate: Robust GNN-based Representation Learning for HLS
LLVM
9
star
17

TAPA-CS

Ada
8
star
18

Merlin-UCLA

C++
7
star
19

cs-259-21f

C
2
star
20

Enola

Python
2
star
21

adds

A Fast Work-Efficient GPU Algorithm for SSSP
Cuda
2
star
22

Bonsai

Verilog
2
star
23

DSE-for-HLS

2
star
24

tapa-fast-cosim

Use a customized RTL testbench for cosim, avoid generating .xclbin
VHDL
2
star
25

EBMF

Python
2
star
26

cs-259-graph500-sssp

C
1
star
27

cs-133-21w

Starter Kits for CS 133 (2021 Winter)
C++
1
star
28

cs-259-w24

C++
1
star