• Stars
    star
    119
  • Rank 297,930 (Top 6 %)
  • Language
    C++
  • License
    MIT License
  • Created over 4 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

[FPGA 2021, Best Paper Award] An automated floorplanning and pipelining tool for Vivado HLS.

Latest

  • [04/29/2022] We are holding a tutorial at FCCM 2022.

  • [04/29/2022] Check our latest documentation for the workflow here.

  • [02/20/2022] We decide to only maintain AutoBridge as a plug-in of the TAPA workflow. The TAPA framework provides a stable and robust environment for AutoBridge across different HLS versions. TAPA is easy and natural to use if you are familiar with the HLS dataflow coding style.

image

  • [01/06/2022] We are integrating AutoBridge and TAPA to create a robust workflow. Currently AutoBridge relis on hacking the RTL generated by Vivado HLS, which makes it fragile. Instead, using the open-source TAPA compiler as the frontend will make the floorplanning-pipelining flow much more robust. While the integration of AutoBridge and TAPA is still in progress, feel free to contact me if you want to try it out, we will provide as much help as needed to make your design work!

  • [01/06/2022] With the help of AutoBridge and TAPA, Serpens achieves 270 MHz on Alveo U280 while using 24 HBM channels, while a normal Vitis flow will failed in routing. Serpens is an HBM-based accelerator for sparse matrix-vector multiplication (SpMV). With the high frequency, Serpens gets a 3.79X performance improvement over the previous state-of-the art GraphLily.

  • [01/06/2022] With the help of AutoBridge and TAPA, Sextans achieves 260 MHz on Alveo U250 while using 4 DDR channels, while a normal Vitis flow will only achieves 190 MHz.

  • [12/20/2021] We just open-sourced RapidStream, a follow-up work of AutoBridge. This time we parallelize the placement and routing of each slot based on the floorplanning by AutoBridge. Check out how we achieve 5-7X speedup over Vivado!

  • A new implementation has been ready! Check the example in AutoBridge/in-develop/test/autosa_cnn_13x8/.

  • The user interface has been significantly simplified. To invoke the new AutoBridge, just write a simple config file like this:

{
  "Board" : "U250",
  "HLSProjectPath" : "./kernel3",
  "HLSSolutionName" : "solution",
  "TopName" : "kernel3",

  "FloorplanMethod": "IterativeDivisionToHalfSLR",
  "AreaUtilizationRatio" : 0.7,

  "BundleToDDRMapping" : {
    "gmem_A": 0,
    "gmem_B": 1,
    "gmem_C": 2
  },

  "LoggingLevel" : "DEBUG"
}

About

  • What: AutoBridge is a floorplanning tool for Vivado HLS dataflow designs.

  • Why: Co-optimizing HLS compilation and placement brings new opportunities to improve the final achievable frequency.

  • How: Pre-determine the rough location of each module during HLS compilation, so that:

    • the long interconnect could be adequately pipelined by the HLS scheduler.

    • we prevent the Vivado placer to place the logic too densely.

  • In our experiments with a total of 43 design configurations, we improve the average frequency from 147 MHz to 297 MHz.

    • Notably, in 16 experiments we make the originally unroutable designs achieve 274 MHz on average
  • The pre-print manuscript of our paper could be found at https://vast.cs.ucla.edu/sites/default/files/publications/AutoBridge_FPGA2021.pdf

  • Projects using AutoBridge:

  • Motivating Examples:

    • Comparison of a stencil accelerator on Xilinx U280. From routing failure to 297 MHz.

      • Each color represents a module.
      • AutoBridge ensures a clean separation of logic in different regions to minimize unnecessary die crossing.
    • Comparison of a systolic array on Xilinx U250. From 158 MHz to 316 MHz.

      • Note that Vivado will try to pack things together to avoid die crossing as much as possible.
      • Instead, we ensure a balanced resource utilization across the whole device to reduce local congestion.
      • Meanwhile, the global connections will be adequately pipelined.

Successful Cases

  • Serpens, to appear in DAC'22, achieves 270 MHz on the Xilinx Alveo U280 HBM board when using 24 HBM channels. The Vivado baseline failed in routing.
  • Sextans, FPGA'22, achieves 260 MHz on the Xilinx Alveo U250 board when using 4 DDR channels. The Vivado baseline achieves only 189 MHz.
  • SPLAG, FPGA'22, achieves up to a 4.9Γ— speedup over state-of-the-art FPGA accelerators, up to a 2.6Γ— speedup over 32-thread CPU running at 4.4 GHz, and up to a 0.9Γ— speedup over an A100 GPU (that has 4.1Γ— power budget and 3.4Γ— HBM bandwidth).
  • AutoSA Systolic-Array Compiler, FPGA'21: AutoSA Frequency Figure
  • KNN, FPT'20, achieves 252 MHz on the Xilinx Alveo U280 board. The Vivado baseline achieves only 165 MHz.

Getting Started

Related Publications

FPGA'21 Artifact Review

The experiment results for all benchmarks in our submission to FPGA'21 are available at: https://ucla.box.com/s/5hpgduqrx93t2j4kx6fflw6z15oylfhu

Currently only a subset of the source code of the benchmarks are open-sourced here, as some designs are not published yet and will be updated later.

More Repositories

1

AutoSA

AutoSA: Polyhedral-Based Systolic Array Compiler
C++
191
star
2

RapidStream

[FPGA 2022, Best Paper Award] Parallel placement and routing of Vivado HLS dataflow designs.
Python
116
star
3

FlexCNN

C++
65
star
4

GNN-DSE

DAC'22 paper: "Automated Accelerator Optimization Aided by Graph Neural Networks"
LLVM
36
star
5

minimap2-acceleration

Hardware Acceleration of Long Read Pairwise Overlapping in Genome Sequencing: Open Source Repository
C
30
star
6

hbmbench

C++
23
star
7

blaze

Blaze runtime system that support efficient accelerator integration for big data.
C++
23
star
8

AutoDSE

ACM TODAES Best Paper Award, 2022
Python
22
star
9

splag

Accelerating SSSP for power-law graphs using an FPGA.
C++
21
star
10

recut

Large-scale medical image processing and reconstruction toolbox
C++
18
star
11

DPQA

Python
17
star
12

CLINK

Compact LSTM inference kernel (CLINK) designed in C/HLS for FPGA implementation.
C++
17
star
13

soda-compiler

Stencil with Optimized Dataflow Architecture Compiler
Python
16
star
14

heterohalide

HeteroHalide: From Image Processing DSL to Efficient FPGA Acceleration
C++
13
star
15

HT-Deflate-FPGA

Verilog
13
star
16

HARP

ICCAD'23 Best Paper Award candidate: Robust GNN-based Representation Learning for HLS
LLVM
9
star
17

TAPA-CS

Ada
8
star
18

Merlin-UCLA

C++
7
star
19

cs-259-21f

C
2
star
20

Enola

Python
2
star
21

adds

A Fast Work-Efficient GPU Algorithm for SSSP
Cuda
2
star
22

Bonsai

Verilog
2
star
23

DSE-for-HLS

2
star
24

tapa-fast-cosim

Use a customized RTL testbench for cosim, avoid generating .xclbin
VHDL
2
star
25

EBMF

Python
2
star
26

cs-259-graph500-sssp

C
1
star
27

cs-133-21w

Starter Kits for CS 133 (2021 Winter)
C++
1
star
28

cs-259-w24

C++
1
star