• Stars
    star
    328
  • Rank 123,990 (Top 3 %)
  • Language
  • License
    MIT License
  • Created over 5 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Awesome machine learning model compression research papers, tools, and learning material.

Awesome ML Model Compression Awesome

An awesome style list that curates the best machine learning model compression and acceleration research papers, articles, tutorials, libraries, tools and more. PRs are welcome!

Contents


Papers

General

Architecture

Quantization

Binarization

Pruning

Distillation

Low Rank Approximation

Offloading

Recent years have witnessed the emergence of systems that are specialized for LLM inference, such as FasterTransformer (NVIDIA, 2022), PaLM inference (Pope et al., 2022), Deepspeed-Inference (Aminabadi et al., 2022), Accelerate (HuggingFace, 2022), LightSeq (Wang et al., 2021), TurboTransformers (Fang et al., 2021).

To enable LLM inference on easily accessible hardware, offloading is an essential technique — to our knowledge, among current systems, only Deepspeed-Inference and Huggingface Accelerate include such functionality.

Parallelism

Compression methods for model acceleration (i.e., model parallelism) papers:

  • Does compressing activations help model parallel training? (2023) - They presents the first empirical study on the effectiveness of compression algorithms (pruning-based, learning-based, and quantization-based - using a Transformer architecture) to improve the communication speed of model parallelism. Summary: 1) activation compression not equal to gradient compression; 2) training setups matter a lot; 3) don't compress early layers' activation.

Articles

Content published on the Web.

Howtos

Assorted

Reference

Blogs

Tools

Libraries

  • TensorFlow Model Optimization Toolkit. Accompanied blog post, TensorFlow Model Optimization Toolkit — Pruning API
  • XNNPACK is a highly optimized library of floating-point neural network inference operators for ARM, WebAssembly, and x86 (SSE2 level) platforms. It's a based on QNNPACK library. However, unlike QNNPACK, XNNPACK focuses entirely on floating-point operators.
  • Bitsandbytes is a lightweight wrapper around CUDA custom functions, in particular 8-bit optimizers and quantization functions.
  • NNCP - An experiment to build a practical lossless data compressor with neural networks. The latest version uses a Transformer model (slower but best ratio). LSTM (faster) is also available.

Frameworks

Paper Implementations

  • facebookresearch/kill-the-bits - code and compressed models for the paper, "And the bit goes down: Revisiting the quantization of neural networks" by Facebook AI Research.

Videos

Talks

Training & tutorials

License

I am providing code and resources in this repository to you under an open source license. Because this is my personal repository, the license you receive to my code and resources is from me and not my employer.

More Repositories

1

awesome-transformer-nlp

A curated list of NLP resources focused on Transformer networks, attention mechanism, GPT, BERT, ChatGPT, LLMs, and transfer learning.
904
star
2

awesome-wireguard

A curated list of WireGuard tools, projects, and resources.
398
star
3

chatgpt-universe

ChatGPT Universe is fleeting notes on ChatGPT, GPT, and large language models (LLMs)
288
star
4

capsule-net-pytorch

[NO MAINTENANCE INTENDED] A PyTorch implementation of CapsNet architecture in the NIPS 2017 paper "Dynamic Routing Between Capsules".
Python
166
star
5

knowledge

Everything I know. My knowledge wiki. My notes (mostly for fast.ai). Document everything. Brain dump.
123
star
6

pytorch-android

[EXPERIMENTAL] Demo of using PyTorch 1.0 inside an Android app. Test with your own deep neural network such as ResNet18/SqueezeNet/MobileNet v2 and a phone camera.
C++
103
star
7

data-science-notebooks

Data science Python notebooks—a collection of Jupyter notebooks on machine learning, deep learning, statistical inference, data analysis and visualization.
Jupyter Notebook
87
star
8

react-typescript-jest-enzyme-testing

Testing React.JS + TypeScript component with Jest and Enzyme. A simple example for reference.
TypeScript
52
star
9

saas-starter

Everything you need to get your next Unicorn-For-X startup off the ground.
JavaScript
43
star
10

realtime-detectron

Real-time Detectron using webcam.
Python
41
star
11

llama

Inference code for LLaMA models
Jupyter Notebook
26
star
12

transformers-llama

LLaMA implementation for HuggingFace Transformers
Python
25
star
13

YDKGo

You Don't Know Go Yet book.
Go
24
star
14

rnnoise-nodejs

Node.js bindings to Xiph's RNNoise denoising C library
Rust
17
star
15

e-mart

Open source full stack React and Next.js online mart complete with shopping cart and real credit checkout.
JavaScript
16
star
16

pytorch-serving

[UNMAINTAINED] A starter pack for creating a lightweight responsive web app for Fast.AI PyTorch models.
Python
16
star
17

ssd-yolo-retinanet

Multi-class object detection pipeline—Single Shot MultiBox Detector (SSD) + YOLOv3 (real-time) + focal loss (RetinaNet) + Pascal VOC 2007 dataset
Python
16
star
18

pytorch-lite

[Deprecated] PyTorch Lite is a lightweight machine learning framework for on-device mobile inference.
Jupyter Notebook
14
star
19

tch-js

A JavaScript and TypeScript port of PyTorch C++ library (libtorch) - Node.js N-API bindings for libtorch.
C++
9
star
20

pytorch-mobile-kit

PyTorch Mobile starter kit.
Java
8
star
21

experiments

A collection of little snippets of programs I write when I test out ideas. A code "playground".
Go
8
star
22

wasserstein-gan

PyTorch implementation of Wasserstein GAN paper
Jupyter Notebook
6
star
23

neural-network-in-13-lines

A neural network in 13 lines of Python.
Python
6
star
24

openintercom

An open source modern Intercom alternative.
JavaScript
5
star
25

react-18-beta

React 18 Beta (Suspense, concurrent rendering, HTTP streaming, Server Components) + Next.js 12.0.4 demo & benchmark (performance & UX)
JavaScript
5
star
26

painless-pg-node

Painless PostgreSQL Node.js backend with Objection + Knex + Express
JavaScript
4
star
27

fastai-course-v3

My notebooks for the 3rd edition of course.fast.ai - coming in 2019
Jupyter Notebook
4
star
28

kaggle-facial-detection

Facial keypoints detection challenge tutorial and solution for Singapore Kaggle ML Challenge meetup.
Jupyter Notebook
3
star
29

neocargo

neoCargo microservices in Go with PostgreSQL, MongoDB, Terraform, Google Kubernetes Engine, and CircleCI
Go
3
star
30

nodejs-in-depth

Master and understand deeper Node.js fundamentals and internals
JavaScript
3
star
31

dawnbench-analysis

DAWNBench analysis of CIFAR-10 time-to-accuracy.
Jupyter Notebook
2
star
32

hou

Hou 🐒 programming language interpreter and compiler
Go
2
star
33

myapp

A ruby on rails app experiments
Ruby
2
star
34

kafka-eventsourcing-restapi

REST API service using Apache Kafka for event sourcing
Go
2
star
35

snippetbox

A web app to paste and share snippets of text
Go
2
star
36

learn-ts-handbook

Learn TypeScript in 2021 by reading the Handbook.
TypeScript
2
star
37

fastai-dl2-2017

My notebooks for fast.ai cutting edge deep learning for coders part 2 2017 course.
Jupyter Notebook
2
star
38

data-science-hacks

A collection of notebooks for engineer practicing machine learning / deep learning through hacking project-based learning.
Jupyter Notebook
2
star
39

feed

A feed of things I'm reading and will read. It's sort of like bookmarks or favorites.
2
star
40

min-torrent

Yet another minimalistic torrent client
Go
2
star
41

advent-of-code-2022

Advent of Code (AoC) 2022 in Rust
Rust
2
star
42

migraine_diary

My personal migraine log.
Ruby
1
star
43

bitcask

My key/value store (embedded database) solution for PingCAP training courses
Rust
1
star
44

personal-website

The personal website of Cedric Chee
HTML
1
star
45

postgresql-consul-demo

A minimal demo app showing PostgreSQL HA cluster managed by Patroni and Consul in Docker
Python
1
star
46

rails323_testing

An app to learn about what's new in Rails 3.2.3 compared to 3.0.10.
Ruby
1
star
47

rl-algorithms

A collection of Reinforcement Learning algorithms.
Jupyter Notebook
1
star
48

tensorflow-community-builds

TensorFlow prebuilt binary (Python wheels) from source by the community.
1
star
49

shuttlecock

Badminton news & results
1
star
50

first_app

Testing new workstation for rails+git installation
Ruby
1
star
51

todos

Super simple todos app. Develop using Rails 3 to learn the changes to Rails 2.3.8.
Ruby
1
star
52

squidgame

[WIP] Red Light, Green Light game inspired by Squid Game implemented in Rust, TypeScript & WebSocket. Play in your browser, multiplayer (at least 2 players), and tiny.
Rust
1
star
53

skel

Skel is an idiomatic and flexible code structure for REST API project - practical code patterns and best practices for building (developing, managing, and deploying) APIs in Go.
Go
1
star
54

amethyst

My digital garden
SCSS
1
star
55

dockerfile-fastai

Dockerfile for building NVIDIA CUDA image for PyTorch 1.0 and fastai 1.0 deep learning
Dockerfile
1
star
56

soshiok

A full stack restaurant app
JavaScript
1
star