There are no reviews yet. Be the first to send feedback to the community and the maintainers!
gptq
Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".sparsegpt
Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".marlin
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.qmoe
Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".PanzaMail
QUIK
Repository for the QUIK project, enabling the use of 4bit kernels for generative inferenceOBC
Code for the NeurIPS 2022 paper "Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning".WoodFisher
Code accompanying the NeurIPS 2020 paper: WoodFisher (Singh & Alistarh, 2020)Sparse-Marlin
Boosting 4-bit inference kernels with 2:4 SparsitySparseFinetuning
Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundryRoSA
QIGen
Repository for CPU Kernel Generation for LLM InferenceACDC
Code for reproducing "AC/DC: Alternating Compressed/DeCompressed Training of Deep Neural Networks" (NeurIPS 2021)spdy
Code for ICML 2022 paper "SPDY: Accurate Pruning with Speedup Guarantees"M-FAC
Efficient reference implementations of the static & dynamic M-FAC algorithms (for pruning and optimization)torch_cgx
Pytorch distributed backend extension with compression supportsparseprop
peft-rosa
A fork of the PEFT library, supporting Robust Adaptation (RoSA)MicroAdam
This repository contains code for the MicroAdam paper.CrAM
Code for reproducing the results from "CrAM: A Compression-Aware Minimizer" accepted at ICLR 2023spops
ISTA-DASLab-Optimizers
EFCP
The repository contains code to reproduce the experiments from our paper Error Feedback Can Accurately Compress Preconditioners available below:pruned-vision-model-bias
Code for reproducing the paper "Bias in Pruned Vision Models: In-Depth Analysis and Countermeasures"Mathador-LM
Code for the paper "Mathador-LM: A Dynamic Benchmark for Mathematical Reasoning on LLMs".CAP
Repository for Correlation Aware Prune (NeurIPS23) source and experimental codeevolution-strategies
TACO4NLP
Task aware compression for various NLP taskssmart-quantizer
Repository for Vitaly's implementation of the distribution-adaptive quantizerZipLM
Code for the NeurIPS 2023 paper: "ZipLM: Inference-Aware Structured Pruning of Language Models".QRGD
Repository for the implementation of "Distributed Principal Component Analysis with Limited Communication" (Alimisis et al., NeurIPS 2021). Parts of this code were originally based on code from "Communication-Efficient Distributed PCA by Riemannian Optimization" (Huang and Pan, ICML 2020).KDVR
Code for the experiments in Knowledge Distillation Performs Partial Variance Reduction, NeurIPS 2023GridSearcher
GridSearcher simplifies running grid searches for machine learning projects in Python, emphasizing parallel execution and GPU scheduling without dependencies on SLURM or other workload managers.Love Open Source and this site? Check out how you can help us