• Stars
    star
    165
  • Rank 228,906 (Top 5 %)
  • Language
  • Created over 5 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

papers about model compression

awesome-model-compression

this collecting the papers (main from arxiv.org) about Model compression:
Structure;
Distillation;
Binarization;
Quantization;
Pruning;
Low Rank.


CONTENT

PAPERS
BOOKS
BLOGS&ATRICLES
LIBRARIES
PROJECTS
OTHERS
REFERENCE


PAPERS

1990

1993

  • Hassibi, Babak, and David G. Stork. Second order derivatives for network pruning: Optimal brain surgeon .[C]Advances in neural information processing systems. 1993.
  • J. L. Holi and J. N. Hwang. [Finite precision error analysis of neural network hardware implementations]. In Ijcnn-91- Seattle International Joint Conference on Neural Networks, pages 519–525 vol.1, 1993.

1995

1997

1998

2000

2001

2006

2011

2012

  • D. Hammerstrom. [A vlsi architecture for highperformance, low-cost, on-chip learning]. In IJCNN International Joint Conference on Neural Networks, pages 537– 544 vol.2, 2012.

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023


BOOKS


BLOGS & ATRICLES


LIBRARIES

  • OpenBMB/BMCook:Model Compression for Big Models
  • NVIDIA TensorRT:  Programmable Inference Accelerator;  
  • Tencent/PocketFlow:  An Automatic Model Compression (AutoMC) framework for developing smaller and faster AI applications;
  • dmlc/tvm:  Open deep learning compiler stack for cpu, gpu and specialized accelerators;
  • Tencent/ncnn:  ncnn is a high-performance neural network inference framework optimized for the mobile platform;

PROJECTS

  • pytorch/glow:  Compiler for Neural Network hardware accelerators;
  • NervanaSystems/neon:  Intel® Nervana™ reference deep learning framework committed to best performance on all hardware;
  • NervanaSystems/distiller:  Neural Network Distiller by Intel AI Lab: a Python package for neural network compression research;
  • MUSCO - framework for model compression using tensor decompositions (PyTorch)
  • OAID/Tengine:  Tengine is a lite, high performance, modular inference engine for embedded device;
  • fpeder/espresso:  Efficient forward propagation for BCNNs;
  • Tensorflow lite:  TensorFlow Lite is an open source deep learning framework for on-device inference.;  
  • Core ML:  Reduce the storage used by the Core ML model inside your app bundle;
  • pytorch-tensor-decompositions:  PyTorch implementation of [1412.6553] and [1511.06530] tensor decomposition methods for convolutional layers;
  • tensorflow/quantize:  
  • mxnet/quantization:  This folder contains examples of quantizing a FP32 model with Intel® MKL-DNN or CUDNN.
  • TensoRT4-Example:  
  • NAF-tensorflow:  "Continuous Deep Q-Learning with Model-based Acceleration" in TensorFlow;
  • Mayo - deep learning framework with fine- and coarse-grained pruning, network slimming, and quantization methods
  • Keras compressor - compression using low-rank approximations, SVD for matrices, Tucker for tensors.
  • Caffe compressor K-means based quantization

OTHERS


REFERENCE

some papers and links collected from below, they are all awesome resources:

keyword:compress; prun; accelera; distill; binarization; "low rank"; quantization; "efficient comput";
NLP大模型压缩加速:蒸馏;量化;低秩分解;剪枝;权重层共享(参数共享);算子融合;检索增强;存储外挂;MoE;自适应深度解码;

grep -i -f key.txt arxiv-20* >compress20.txt
grep -i -f whitekey_limit.txt compress20.txt > compress_good.txt
grep -i -v -f blackkey.txt compress_good.txt > compress_good_bad.txt