• Stars
    star
    2
  • Language
    C
  • Created over 1 year ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

More Repositories

1

graph-of-thoughts

Official Implementation of "Graph of Thoughts: Solving Elaborate Problems with Large Language Models"
Python
2,059
star
2

dace

DaCe - Data Centric Parallel Programming
Python
487
star
3

gemm_hls

Scalable systolic array-based matrix-matrix multiplication implemented in Vivado HLS for Xilinx FPGAs.
C++
297
star
4

QuaRot

Code for QuaRot, an end-to-end 4-bit inference of large language models.
Python
247
star
5

pymlir

Python interface for MLIR - the Multi-Level Intermediate Representation
Python
210
star
6

ncc

Neural Code Comprehension: A Learnable Representation of Code Semantics
Python
206
star
7

hls_tutorial_examples

Examples shown as part of the tutorial "Productive parallel programming on FPGA with high-level synthesis".
C++
188
star
8

MRAG

Official Implementation of "Multi-Head RAG: Solving Multi-Aspect Problems with LLMs"
Python
151
star
9

serverless-benchmarks

SeBS: serverless benchmarking suite for automatic performance analysis of FaaS platforms.
Python
142
star
10

substation

Research and development for optimizing transformers
Python
121
star
11

pspin

PsPIN: A RISC-V in-network accelerator for flexible high-performance low-power packet processing
SystemVerilog
95
star
12

deep-weather

Deep Learning for Post-Processing Ensemble Weather Forecasts
Jupyter Notebook
85
star
13

daceml

A Data-Centric Compiler for Machine Learning
Python
81
star
14

FBLAS

BLAS implementation for Intel FPGA
C++
75
star
15

open-earth-compiler

development repository for the open earth compiler
MLIR
74
star
16

npbench

NPBench - A Benchmarking Suite for High-Performance NumPy
Python
73
star
17

ucudnn

Accelerating DNN Convolutional Layers with Micro-batches
C++
64
star
18

rFaaS

rFaaS: a high-performance FaaS platform with RDMA acceleration for low-latency invocations.
C++
48
star
19

haystack

Haystack is an analytical cache model that given a program computes the number of cache misses.
C++
42
star
20

sparsity-in-deep-learning

Bibtex for Sparsity in Deep Learning paper (https://arxiv.org/abs/2102.00554) - open for pull requests
TeX
40
star
21

mlir-dace

Data-Centric MLIR dialect
C++
37
star
22

redmark

ReDMArk: Bypassing RDMA Security Mechanisms.
C++
37
star
23

apfp

FPGA acceleration of arbitrary precision floating point computations.
C++
34
star
24

NoPFS

Near-optimal Prefetching System
32
star
25

sten

Sparsity support for PyTorch
Python
31
star
26

rapidchiplet

A toolchain for rapid design space exploration of chiplet architectures
C++
27
star
27

ens10

Scripts and examples for the ENS-10 Ensemble Prediction System machine learning dataset
Python
25
star
28

gms

GraphMineSuite (GMS): a benchmarking suite for graph mining algorithms such as graph pattern matching or graph learning
C++
25
star
29

sage

Python
24
star
30

liblsb

Rebol
23
star
31

smoe

Spatial Mixture-of-Experts
Python
19
star
32

CoRM

CoRM: Compactable Remote Memory over RDMA
C++
19
star
33

dace-vscode

Rich editor for SDFGs with included profiling and debugging, static analysis, and interactive optimization.
TypeScript
18
star
34

kafkadirect

RDMA-enabled Apache Kafka
Java
17
star
35

faaskeeper

A fully serverless implementation of the ZooKeeper coordination protocol.
Python
17
star
36

fmi

Function Message Interface (FMI): library for message-passing and collective communication for serverless functions.
C++
15
star
37

SMI

Streaming Message Interface: High-Performance Distributed Memory Programming on Reconfigurable Hardware
C++
15
star
38

stencilflow

Python
15
star
39

naos

Naos: Serialization-free RDMA networking in Java
Java
15
star
40

absinthe

Absinthe is an optimization framework to fuse and tile stencil codes in one shot
Python
14
star
41

NNCompression

Compressing weather and climate data into neural networks
Python
13
star
42

DNN-cpp-proxies

C++/MPI proxies for distributed training of deep neural networks.
C++
13
star
43

arrow-matrix

Arrow Matrix Decomposition - Communication-Efficient Distributed Sparse Matrix Multiplication
Python
13
star
44

CheckEmbed

Official Implementation of "CheckEmbed: Effective Verification of LLM Solutions to Open-Ended Tasks"
Python
12
star
45

.github

10
star
46

LogGOPSim

A LogGOPS (LogP, LogGP, LogGPS) Simulator and Simulation Framework
C
10
star
47

vldb19-distributed-locking

This repository hosts the code used for the following paper: Claude Barthels, Ingo Mรผller, Konstantin Taranov, Torsten Hoefler, Gustavo Alonso. "Strong consistency is not hard to get: Two-Phase Locking and Two-Phase Commit on Thousands of Cores." In: PVLDB, 2020.
C++
10
star
48

SimFS

SimFS: A Virtualizing Simulation Data File System Interface
C++
8
star
49

CLaMPI

Caching Layer for MPI
C
8
star
50

FBACode

Python
8
star
51

nbody_hls

Implementation of the N^2-formulation of N-body simulation with Vivado HLS for SDAccel platforms.
C++
8
star
52

GDI-RMA

Official Implementation of "The Graph Database Interface: Scaling Online Transactional and Analytical Graph Workloads to Hundreds of Thousands of Cores"
C
8
star
53

DiffDA

Python
7
star
54

stencil_hls

Implementation of time and space-tiled stencil in Vivado HLS.
C++
7
star
55

open-earth-benchmarks

Open repository for climate and weather benchmark kernels
C++
7
star
56

cppless

C++
6
star
57

polybench-comparator

Regression and comparison tools for the Polybench benchmark
Shell
6
star
58

nevermore

The source code for the Nevermore paper at ACM CCS'22
C++
6
star
59

foMPI-NA

C
6
star
60

perf-taint

Taint-based program analysis framework for empirical performance modeling.
LLVM
5
star
61

streamingsched

Streaming Task Scheduling
Python
5
star
62

faaskeeper-python

Python client library for FaaSKeeper, the serverless ZooKeeeper.
Python
5
star
63

muliticast-based-allgather

C
4
star
64

libNBC

Shell
3
star
65

climetlab-maelstrom-ens10

MAELSTROM ENS10 dataset plugin for CliMetLab
Jupyter Notebook
3
star
66

dace-webclient

Web-based SDFG viewer for DaCe
JavaScript
3
star
67

libhear

C++
3
star
68

TCPunch

C++
3
star
69

LGSxNS3

Python
2
star
70

cppless-clang

2
star
71

c2dace

C
2
star
72

probgraph

Emacs Lisp
2
star
73

LogGOPSim2

C++
2
star
74

fflib

C
2
star
75

serverless-benchmarks-data

TeX
2
star
76

spatial-collectives

Optimized communication collectives for the Cerebras waferscale engine
Python
2
star
77

conflux

C++
1
star
78

fuzzyflow-artifact

Computational artifacts for the FuzzyFlow publication
Shell
1
star
79

SAILOR

Python
1
star
80

praas-benchmarks

Jupyter Notebook
1
star
81

HTSIM-old

C++
1
star
82

faas-profiler

Python
1
star
83

UPM

User-guided Page Merging: Memory Deduplication for Serverless
C
1
star
84

f2dace-artifact

Fortran
1
star
85

smat

Code for High Performance Unstructured SpMM Computation Using Tensor Cores
Emacs Lisp
1
star