Latest

[04/29/2022] We are holding a tutorial at FCCM 2022.
[04/29/2022] Check our latest documentation for the workflow here.
[02/20/2022] We decide to only maintain AutoBridge as a plug-in of the TAPA workflow. The TAPA framework provides a stable and robust environment for AutoBridge across different HLS versions. TAPA is easy and natural to use if you are familiar with the HLS dataflow coding style.

[01/06/2022] We are integrating AutoBridge and TAPA to create a robust workflow. Currently AutoBridge relis on hacking the RTL generated by Vivado HLS, which makes it fragile. Instead, using the open-source TAPA compiler as the frontend will make the floorplanning-pipelining flow much more robust. While the integration of AutoBridge and TAPA is still in progress, feel free to contact me if you want to try it out, we will provide as much help as needed to make your design work!
[01/06/2022] With the help of AutoBridge and TAPA, Serpens achieves 270 MHz on Alveo U280 while using 24 HBM channels, while a normal Vitis flow will failed in routing. Serpens is an HBM-based accelerator for sparse matrix-vector multiplication (SpMV). With the high frequency, Serpens gets a 3.79X performance improvement over the previous state-of-the art GraphLily.
[01/06/2022] With the help of AutoBridge and TAPA, Sextans achieves 260 MHz on Alveo U250 while using 4 DDR channels, while a normal Vitis flow will only achieves 190 MHz.
[12/20/2021] We just open-sourced RapidStream, a follow-up work of AutoBridge. This time we parallelize the placement and routing of each slot based on the floorplanning by AutoBridge. Check out how we achieve 5-7X speedup over Vivado!
A new implementation has been ready! Check the example in AutoBridge/in-develop/test/autosa_cnn_13x8/.
The user interface has been significantly simplified. To invoke the new AutoBridge, just write a simple config file like this:

{
  "Board" : "U250",
  "HLSProjectPath" : "./kernel3",
  "HLSSolutionName" : "solution",
  "TopName" : "kernel3",

  "FloorplanMethod": "IterativeDivisionToHalfSLR",
  "AreaUtilizationRatio" : 0.7,

  "BundleToDDRMapping" : {
    "gmem_A": 0,
    "gmem_B": 1,
    "gmem_C": 2
  },

  "LoggingLevel" : "DEBUG"
}

For the old implementation (AutoBridge/src), here is a useful example of integrating AutoBridge: https://autosa.readthedocs.io/en/latest/tutorials/auto_bridge.html

About

What: AutoBridge is a floorplanning tool for Vivado HLS dataflow designs.
Why: Co-optimizing HLS compilation and placement brings new opportunities to improve the final achievable frequency.
How: Pre-determine the rough location of each module during HLS compilation, so that:
- the long interconnect could be adequately pipelined by the HLS scheduler.
- we prevent the Vivado placer to place the logic too densely.
In our experiments with a total of 43 design configurations, we improve the average frequency from 147 MHz to 297 MHz.
- Notably, in 16 experiments we make the originally unroutable designs achieve 274 MHz on average
The pre-print manuscript of our paper could be found at https://vast.cs.ucla.edu/sites/default/files/publications/AutoBridge_FPGA2021.pdf
Projects using AutoBridge:
- AutoSA Systolic Array Compiler (https://github.com/UCLA-VAST/AutoSA)
- TAPA Compiler (https://github.com/UCLA-VAST/tapa)
Motivating Examples:
- Comparison of a stencil accelerator on Xilinx U280. From routing failure to 297 MHz.
  - Each color represents a module.
  - AutoBridge ensures a clean separation of logic in different regions to minimize unnecessary die crossing.
- Comparison of a systolic array on Xilinx U250. From 158 MHz to 316 MHz.
  - Note that Vivado will try to pack things together to avoid die crossing as much as possible.
  - Instead, we ensure a balanced resource utilization across the whole device to reduce local congestion.
  - Meanwhile, the global connections will be adequately pipelined.

Successful Cases

Serpens, to appear in DAC'22, achieves 270 MHz on the Xilinx Alveo U280 HBM board when using 24 HBM channels. The Vivado baseline failed in routing.
Sextans, FPGA'22, achieves 260 MHz on the Xilinx Alveo U250 board when using 4 DDR channels. The Vivado baseline achieves only 189 MHz.
SPLAG, FPGA'22, achieves up to a 4.9× speedup over state-of-the-art FPGA accelerators, up to a 2.6× speedup over 32-thread CPU running at 4.4 GHz, and up to a 0.9× speedup over an A100 GPU (that has 4.1× power budget and 3.4× HBM bandwidth).
AutoSA Systolic-Array Compiler, FPGA'21:
KNN, FPT'20, achieves 252 MHz on the Xilinx Alveo U280 board. The Vivado baseline achieves only 165 MHz.

Getting Started

Related Publications

Yuze Chi, Licheng Guo, Jason Lau, Young-kyu Choi, Jie Wang, Jason Cong. Extending High-Level Synthesis for Task-Parallel Programs. In FCCM, 2021. [PDF] [Code] [Slides] [Video]
Licheng Guo, Yuze Chi, Jie Wang, Jason Lau, Weikang Qiao, Ecenur Ustun, Zhiru Zhang, Jason Cong. AutoBridge: Coupling Coarse-Grained Floorplanning and Pipelining for High-Frequency HLS Design on Multi-Die FPGAs. In FPGA, 2021. (Best Paper Award) [PDF] [Code] [Slides] [Video]

FPGA'21 Artifact Review

The experiment results for all benchmarks in our submission to FPGA'21 are available at: https://ucla.box.com/s/5hpgduqrx93t2j4kx6fflw6z15oylfhu

Currently only a subset of the source code of the benchmarks are open-sourced here, as some designs are not published yet and will be updated later.

UCLA-VAST/AutoBridge

UCLA-VAST

Reviews

Repository Details