• Stars
    star
    144
  • Rank 254,082 (Top 6 %)
  • Language
    C++
  • License
    MIT License
  • Created about 4 years ago
  • Updated 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

TAPA is a dataflow HLS framework that features fast compilation, expressive programming model and generates high-frequency FPGA accelerators.

TAPA

CI install Documentation Status

TAPA is a dataflow HLS framework that features fast compilation, expressive programming model and generates high-frequency FPGA accelerators.

TAPA Framework

High-Frequency

  • TAPA explicitly decouples communication and computation for better QoR.

  • TAPA integrates the AutoBridge floorplanner to optimize the RTL generation process.

  • TAPA achieves 2Γ— higher the frequency on average compared to Vivado. 1

Speed

  • TAPA compiles 7Γ— faster than Vitis HLS. 2

  • TAPA provides 3Γ— faster software simulation than Vitis HLS.2

  • TAPA provides 8Γ— faster RTL simulation than Vitis.

  • [in-progress] TAPA is integrating RapidStream that is up to 10Γ— faster than Vivado.3

Expressiveness

  • TAPA extends the Vitis HLS syntax for richer expressiveness at the C++ level.

  • TAPA provides dedicated APIs for arbitrary external memory access patterns.

  • TAPA allows users to explicitly specify parallelism.

  • In addition to static burst analysis, TAPA supports runtime burst detectuion by transparently merging small memory transactions into large bursts.

HBM-Specific Optimizations

  • TAPA significantly reduce the area overhead of HBM interface IPs compared to Vitis HLS.

  • TAPA includes an automated design space exploration tool to balance the resource pressure and the wire pressure for HBM FPGAs.

  • TAPA automatically select the physical channel for each top-level argument of your accelerator.

Successful Cases

  • Serpens, DAC'22, achieves 270 MHz on the Xilinx Alveo U280 HBM board when using 24 HBM channels. The Vitis HLS baseline failed in routing.
  • Sextans, FPGA'22, achieves 260 MHz on the Xilinx Alveo U250 board when using 4 DDR channels. The Vivado baseline achieves only 189 MHz.
  • SPLAG, FPGA'22, achieves up to a 4.9Γ— speedup over state-of-the-art FPGA accelerators, up to a 2.6Γ— speedup over 32-thread CPU running at 4.4 GHz, and up to a 0.9Γ— speedup over an A100 GPU (that has 4.1Γ— power budget and 3.4Γ— HBM bandwidth).
  • AutoSA Systolic-Array Compiler, FPGA'21: AutoSA Frequency Figure
  • KNN, FPT'20, achieves 252 MHz on the Xilinx Alveo U280 board. The Vivado baseline achieves only 165 MHz.

Getting Started

TAPA Publications

More Repositories

1

AutoSA

AutoSA: Polyhedral-Based Systolic Array Compiler
C++
191
star
2

AutoBridge

[FPGA 2021, Best Paper Award] An automated floorplanning and pipelining tool for Vivado HLS.
C++
117
star
3

RapidStream

[FPGA 2022, Best Paper Award] Parallel placement and routing of Vivado HLS dataflow designs.
Python
116
star
4

FlexCNN

C++
65
star
5

GNN-DSE

DAC'22 paper: "Automated Accelerator Optimization Aided by Graph Neural Networks"
LLVM
36
star
6

minimap2-acceleration

Hardware Acceleration of Long Read Pairwise Overlapping in Genome Sequencing: Open Source Repository
C
30
star
7

hbmbench

C++
23
star
8

blaze

Blaze runtime system that support efficient accelerator integration for big data.
C++
23
star
9

AutoDSE

ACM TODAES Best Paper Award, 2022
Python
22
star
10

splag

Accelerating SSSP for power-law graphs using an FPGA.
C++
21
star
11

recut

Large-scale medical image processing and reconstruction toolbox
C++
18
star
12

DPQA

Python
17
star
13

CLINK

Compact LSTM inference kernel (CLINK) designed in C/HLS for FPGA implementation.
C++
17
star
14

soda-compiler

Stencil with Optimized Dataflow Architecture Compiler
Python
16
star
15

heterohalide

HeteroHalide: From Image Processing DSL to Efficient FPGA Acceleration
C++
13
star
16

HT-Deflate-FPGA

Verilog
13
star
17

HARP

ICCAD'23 Best Paper Award candidate: Robust GNN-based Representation Learning for HLS
LLVM
9
star
18

TAPA-CS

Ada
8
star
19

Merlin-UCLA

C++
7
star
20

cs-259-21f

C
2
star
21

Enola

Python
2
star
22

adds

A Fast Work-Efficient GPU Algorithm for SSSP
Cuda
2
star
23

Bonsai

Verilog
2
star
24

DSE-for-HLS

2
star
25

tapa-fast-cosim

Use a customized RTL testbench for cosim, avoid generating .xclbin
VHDL
2
star
26

EBMF

Python
2
star
27

cs-259-graph500-sssp

C
1
star
28

cs-133-21w

Starter Kits for CS 133 (2021 Winter)
C++
1
star
29

cs-259-w24

C++
1
star