There are no reviews yet. Be the first to send feedback to the community and the maintainers!
graph-of-thoughts
Official Implementation of "Graph of Thoughts: Solving Elaborate Problems with Large Language Models"dace
DaCe - Data Centric Parallel Programminggemm_hls
Scalable systolic array-based matrix-matrix multiplication implemented in Vivado HLS for Xilinx FPGAs.QuaRot
Code for QuaRot, an end-to-end 4-bit inference of large language models.pymlir
Python interface for MLIR - the Multi-Level Intermediate Representationncc
Neural Code Comprehension: A Learnable Representation of Code Semanticshls_tutorial_examples
Examples shown as part of the tutorial "Productive parallel programming on FPGA with high-level synthesis".MRAG
Official Implementation of "Multi-Head RAG: Solving Multi-Aspect Problems with LLMs"serverless-benchmarks
SeBS: serverless benchmarking suite for automatic performance analysis of FaaS platforms.substation
Research and development for optimizing transformerspspin
PsPIN: A RISC-V in-network accelerator for flexible high-performance low-power packet processingdeep-weather
Deep Learning for Post-Processing Ensemble Weather Forecastsdaceml
A Data-Centric Compiler for Machine LearningFBLAS
BLAS implementation for Intel FPGAopen-earth-compiler
development repository for the open earth compilernpbench
NPBench - A Benchmarking Suite for High-Performance NumPyucudnn
Accelerating DNN Convolutional Layers with Micro-batchesrFaaS
rFaaS: a high-performance FaaS platform with RDMA acceleration for low-latency invocations.haystack
Haystack is an analytical cache model that given a program computes the number of cache misses.sparsity-in-deep-learning
Bibtex for Sparsity in Deep Learning paper (https://arxiv.org/abs/2102.00554) - open for pull requestsmlir-dace
Data-Centric MLIR dialectredmark
ReDMArk: Bypassing RDMA Security Mechanisms.apfp
FPGA acceleration of arbitrary precision floating point computations.NoPFS
Near-optimal Prefetching Systemsten
Sparsity support for PyTorchrapidchiplet
A toolchain for rapid design space exploration of chiplet architecturesens10
Scripts and examples for the ENS-10 Ensemble Prediction System machine learning datasetgms
GraphMineSuite (GMS): a benchmarking suite for graph mining algorithms such as graph pattern matching or graph learningsage
liblsb
smoe
Spatial Mixture-of-ExpertsCoRM
CoRM: Compactable Remote Memory over RDMAdace-vscode
Rich editor for SDFGs with included profiling and debugging, static analysis, and interactive optimization.kafkadirect
RDMA-enabled Apache Kafkafaaskeeper
A fully serverless implementation of the ZooKeeper coordination protocol.fmi
Function Message Interface (FMI): library for message-passing and collective communication for serverless functions.SMI
Streaming Message Interface: High-Performance Distributed Memory Programming on Reconfigurable Hardwarestencilflow
naos
Naos: Serialization-free RDMA networking in Javaabsinthe
Absinthe is an optimization framework to fuse and tile stencil codes in one shotNNCompression
Compressing weather and climate data into neural networksDNN-cpp-proxies
C++/MPI proxies for distributed training of deep neural networks.arrow-matrix
Arrow Matrix Decomposition - Communication-Efficient Distributed Sparse Matrix MultiplicationCheckEmbed
Official Implementation of "CheckEmbed: Effective Verification of LLM Solutions to Open-Ended Tasks".github
LogGOPSim
A LogGOPS (LogP, LogGP, LogGPS) Simulator and Simulation Frameworkvldb19-distributed-locking
This repository hosts the code used for the following paper: Claude Barthels, Ingo Mรผller, Konstantin Taranov, Torsten Hoefler, Gustavo Alonso. "Strong consistency is not hard to get: Two-Phase Locking and Two-Phase Commit on Thousands of Cores." In: PVLDB, 2020.SimFS
SimFS: A Virtualizing Simulation Data File System InterfaceCLaMPI
Caching Layer for MPIFBACode
nbody_hls
Implementation of the N^2-formulation of N-body simulation with Vivado HLS for SDAccel platforms.GDI-RMA
Official Implementation of "The Graph Database Interface: Scaling Online Transactional and Analytical Graph Workloads to Hundreds of Thousands of Cores"DiffDA
stencil_hls
Implementation of time and space-tiled stencil in Vivado HLS.open-earth-benchmarks
Open repository for climate and weather benchmark kernelscppless
polybench-comparator
Regression and comparison tools for the Polybench benchmarknevermore
The source code for the Nevermore paper at ACM CCS'22foMPI-NA
perf-taint
Taint-based program analysis framework for empirical performance modeling.streamingsched
Streaming Task Schedulingfaaskeeper-python
Python client library for FaaSKeeper, the serverless ZooKeeeper.muliticast-based-allgather
smat
Code for High Performance Unstructured SpMM Computation Using Tensor CoreslibNBC
climetlab-maelstrom-ens10
MAELSTROM ENS10 dataset plugin for CliMetLabspatial-collectives
Optimized communication collectives for the Cerebras waferscale enginelibhear
TCPunch
LGSxNS3
cppless-clang
c2dace
probgraph
LogGOPSim2
fflib
serverless-benchmarks-data
rivets
conflux
fuzzyflow-artifact
Computational artifacts for the FuzzyFlow publicationSAILOR
praas-benchmarks
HTSIM-old
faas-profiler
UPM
User-guided Page Merging: Memory Deduplication for Serverlessf2dace-artifact
Love Open Source and this site? Check out how you can help us