• This repository has been archived on 02/May/2023
  • Stars
    star
    4,332
  • Rank 9,925 (Top 0.2 %)
  • Language
    Jupyter Notebook
  • License
    Apache License 2.0
  • Created over 6 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Neural Network Distiller by Intel AI Lab: a Python package for neural network compression research. https://intellabs.github.io/distiller

⚠️ DISCONTINUATION OF PROJECT - This project will no longer be maintained by Intel. This project has been identified as having known security escapes. Intel has ceased development and contributions including, but not limited to, maintenance, bug fixes, new releases, or updates, to this project. Intel no longer accepts patches to this project.

License DOI

Distiller is an open-source Python package for neural network compression research.

Network compression can reduce the memory footprint of a neural network, increase its inference speed and save energy. Distiller provides a PyTorch environment for prototyping and analyzing compression algorithms, such as sparsity-inducing methods and low-precision arithmetic.

Table of Contents

Highlighted features

  • Automatic Compression
  • Weight pruning
    • Element-wise pruning using magnitude thresholding, sensitivity thresholding, target sparsity level, and activation statistics
  • Structured pruning
    • Convolution: 2D (kernel-wise), 3D (filter-wise), 4D (layer-wise), and channel-wise structured pruning.
    • Fully-connected: column-wise and row-wise structured pruning.
    • Structure groups (e.g. structures of 4 filters).
    • Structure-ranking with using weights or activations criteria (Lp-norm, APoZ, gradients, random, etc.).
    • Support for new structures (e.g. block pruning)
  • Control
    • Soft (mask on forward-pass only) and hard pruning (permanently disconnect neurons)
    • Dual weight copies (compute loss on masked weights, but update unmasked weights)
    • Model thinning (AKA "network garbage removal") to permanently remove pruned neurons and connections.
  • Schedule
    • Flexible scheduling of pruning, regularization, and learning rate decay (compression scheduling)
    • One-shot and iterative pruning (and fine-tuning) are supported.
    • Easily control what is performed each training step (e.g. greedy layer by layer pruning to full model pruning).
    • Automatic gradual schedule (AGP) for pruning individual connections and complete structures.
    • The compression schedule is expressed in a YAML file so that a single file captures the details of experiments. This dependency injection design decouples the Distiller scheduler and library from future extensions of algorithms.
  • Element-wise and filter-wise pruning sensitivity analysis (using L1-norm thresholding). Examine the data from some of the networks we analyzed, using this notebook.
  • Regularization
    • L1-norm element-wise regularization
    • Group Lasso an group variance regularization
  • Quantization
    • Automatic mechanism to transform existing models to quantized versions, with customizable bit-width configuration for different layers. No need to re-write the model for different quantization methods.
    • Post-training quantization of trained full-precision models, dynamic and static (statistics-based)
    • Support for quantization-aware training in the loop
  • Knowledge distillation
    • Training with knowledge distillation, in conjunction with the other available pruning / regularization / quantization methods.
  • Conditional computation
    • Sample implementation of Early Exit
  • Low rank decomposition
  • Lottery Ticket Hypothesis training
  • Export statistics summaries using Pandas dataframes, which makes it easy to slice, query, display and graph the data.
  • A set of Jupyter notebooks to plan experiments and analyze compression results. The graphs and visualizations you see on this page originate from the included Jupyter notebooks.
    • Take a look at this notebook, which compares visual aspects of dense and sparse Alexnet models.
    • This notebook creates performance indicator graphs from model data.
  • Sample implementations of published research papers, using library-provided building blocks. See the research papers discussions in our model-zoo.
  • Logging to the console, text file and TensorBoard-formatted file.
  • Export to ONNX (export of quantized models pending ONNX standardization)

Installation

These instructions will help get Distiller up and running on your local machine.

1. Clone Distiller

Clone the Distiller code repository from github:

$ git clone https://github.com/IntelLabs/distiller.git

The rest of the documentation that follows, assumes that you have cloned your repository to a directory called distiller.

2. Create a Python virtual environment

We recommend using a Python virtual environment, but that of course, is up to you. There's nothing special about using Distiller in a virtual environment, but we provide some instructions, for completeness.
Before creating the virtual environment, make sure you are located in directory distiller. After creating the environment, you should see a directory called distiller/env.

Using virtualenv

If you don't have virtualenv installed, you can find the installation instructions here.

To create the environment, execute:

$ python3 -m virtualenv env

This creates a subdirectory named env where the python virtual environment is stored, and configures the current shell to use it as the default python environment.

Using venv

If you prefer to use venv, then begin by installing it:

$ sudo apt-get install python3-venv

Then create the environment:

$ python3 -m venv env

As with virtualenv, this creates a directory called distiller/env.

Activate the environment

The environment activation and deactivation commands for venv and virtualenv are the same.
!NOTE: Make sure to activate the environment, before proceeding with the installation of the dependency packages:

$ source env/bin/activate

3. Install the Distiller package

Finally, install the Distiller package and its dependencies using pip3:

$ cd distiller
$ pip3 install -e .

This installs Distiller in "development mode", meaning any changes made in the code are reflected in the environment without re-running the install command (so no need to re-install after pulling changes from the Git repository).

Notes:

  • Distiller has only been tested on Ubuntu 16.04 LTS, and with Python 3.5.
  • If you are not using a GPU, you might need to make small adjustments to the code.

Required PyTorch Version

Distiller is tested using the default installation of PyTorch 1.3.1, which uses CUDA 10.1. We use TorchVision version 0.4.2. These are included in Distiller's requirements.txt and will be automatically installed when installing the Distiller package as listed above.

If you do not use CUDA 10.1 in your environment, please refer to PyTorch website to install the compatible build of PyTorch 1.3.1 and torchvision 0.4.2.

Getting Started

Distiller comes with sample applications and tutorials covering a range of model types:

Model Type Sparsity Post-training quantization Quantization-aware training Auto Compression (AMC) Knowledge Distillation
Image classification βœ… βœ… βœ… βœ… βœ…
Word-level language model βœ… βœ…
Translation (GNMT) βœ…
Recommendation System (NCF) βœ…
Object Detection βœ…

Head to the examples directory for more details.

Other resources to refer to, beyond the examples:

Basic Usage Examples

The following are simple examples using Distiller's image classifcation sample, showing some of Distiller's capabilities.

Example: Simple training-only session (no compression)

The following will invoke training-only (no compression) of a network named 'simplenet' on the CIFAR10 dataset. This is roughly based on TorchVision's sample Imagenet training application, so it should look familiar if you've used that application. In this example we don't invoke any compression mechanisms: we just train because for fine-tuning after pruning, training is an essential part.

Note that the first time you execute this command, the CIFAR10 code will be downloaded to your machine, which may take a bit of time - please let the download process proceed to completion.

The path to the CIFAR10 dataset is arbitrary, but in our examples we place the datasets in the same directory level as distiller (i.e. ../../../data.cifar10).

First, change to the sample directory, then invoke the application:

$ cd distiller/examples/classifier_compression
$ python3 compress_classifier.py --arch simplenet_cifar ../../../data.cifar10 -p 30 -j=1 --lr=0.01

You can use a TensorBoard backend to view the training progress (in the diagram below we show a couple of training sessions with different LR values). For compression sessions, we've added tracing of activation and parameter sparsity levels, and regularization loss.

Example: Getting parameter statistics of a sparsified model

We've included in the git repository a few checkpoints of a ResNet20 model that we've trained with 32-bit floats. Let's load the checkpoint of a model that we've trained with channel-wise Group Lasso regularization.
With the following command-line arguments, the sample application loads the model (--resume) and prints statistics about the model weights (--summary=sparsity). This is useful if you want to load a previously pruned model, to examine the weights sparsity statistics, for example. Note that when you resume a stored checkpoint, you still need to tell the application which network architecture the checkpoint uses (-a=resnet20_cifar):

$ python3 compress_classifier.py --resume=../ssl/checkpoints/checkpoint_trained_ch_regularized_dense.pth.tar -a=resnet20_cifar ../../../data.cifar10 --summary=sparsity

You should see a text table detailing the various sparsities of the parameter tensors. The first column is the parameter name, followed by its shape, the number of non-zero elements (NNZ) in the dense model, and in the sparse model. The next set of columns show the column-wise, row-wise, channel-wise, kernel-wise, filter-wise and element-wise sparsities.
Wrapping it up are the standard-deviation, mean, and mean of absolute values of the elements.

In the Compression Insights notebook we use matplotlib to plot a bar chart of this summary, that indeed show non-impressive footprint compression.

Although the memory footprint compression is very low, this model actually saves 26.6% of the MACs compute.

$ python3 compress_classifier.py --resume=../ssl/checkpoints/checkpoint_trained_channel_regularized_resnet20_finetuned.pth.tar -a=resnet20_cifar ../../../data.cifar10 --summary=compute

Example: Post-training quantization

This example performs 8-bit quantization of ResNet20 for CIFAR10. We've included in the git repository the checkpoint of a ResNet20 model that we've trained with 32-bit floats, so we'll take this model and quantize it:

$ python3 compress_classifier.py -a resnet20_cifar ../../../data.cifar10 --resume ../ssl/checkpoints/checkpoint_trained_dense.pth.tar --quantize-eval --evaluate

The command-line above will save a checkpoint named quantized_checkpoint.pth.tar containing the quantized model parameters. See more examples here.

Explore the sample Jupyter notebooks

The set of notebooks that come with Distiller is described here, which also explains the steps to install the Jupyter notebook server.
After installing and running the server, take a look at the notebook covering pruning sensitivity analysis.

Sensitivity analysis is a long process and this notebook loads CSV files that are the output of several sessions of sensitivity analysis.

Running the tests

We are currently light-weight on test and this is an area where contributions will be much appreciated.
There are two types of tests: system tests and unit-tests. To invoke the unit tests:

$ cd distiller/tests
$ pytest

We use CIFAR10 for the system tests, because its size makes for quicker tests. To invoke the system tests, you need to provide a path to the CIFAR10 dataset which you've already downloaded. Alternatively, you may invoke full_flow_tests.py without specifying the location of the CIFAR10 dataset and let the test download the dataset (for the first invocation only). Note that --cifar1o-path defaults to the current directory.
The system tests are not short, and are even longer if the test needs to download the dataset.

$ cd distiller/tests
$ python full_flow_tests.py --cifar10-path=<some_path>

The script exits with status 0 if all tests are successful, or status 1 otherwise.

Generating the HTML documentation site

Install mkdocs and the required packages by executing:

$ pip3 install -r doc-requirements.txt

To build the project documentation run:

$ cd distiller/docs-src
$ mkdocs build --clean

This will create a folder named 'site' which contains the documentation website. Open distiller/docs/site/index.html to view the documentation home page.

Versioning

We use SemVer for versioning. For the versions available, see the tags on this repository.

License

This project is licensed under the Apache License 2.0 - see the LICENSE.md file for details

Community

Github projects using Distiller

  • DeGirum Pruned Models - a repository containing pruned models and related information.

  • TorchFI - TorchFI is a fault injection framework build on top of PyTorch for research purposes.

  • hsi-toolbox - Hyperspectral CNN compression and band selection

Research papers citing Distiller

If you used Distiller for your work, please use the following citation:

@article{nzmora2019distiller,
  author       = {Neta Zmora and
                  Guy Jacob and
                  Lev Zlotnik and
                  Bar Elharar and
                  Gal Novik},
  title        = {Neural Network Distiller: A Python Package For DNN Compression Research},
  month        = {October},
  year         = {2019},
  url          = {https://arxiv.org/abs/1910.12232}
}

Acknowledgments

Any published work is built on top of the work of many other people, and the credit belongs to too many people to list here.

  • The Python and PyTorch developer communities have shared many invaluable insights, examples and ideas on the Web.
  • The authors of the research papers implemented in the Distiller model-zoo have shared their research ideas, theoretical background and results.

Built With

  • PyTorch - The tensor and neural network framework used by Distiller.
  • Jupyter - Notebook serving.
  • TensorBoard - Used to view training graphs.
  • Cadene - Pretrained PyTorch models.

Disclaimer

Distiller is released as a reference code for research purposes. It is not an official Intel product, and the level of quality and support may not be as expected from an official product. Additional algorithms and features are planned to be added to the library. Feedback and contributions from the open source and research communities are more than welcome.

More Repositories

1

nlp-architect

A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks
Python
2,936
star
2

coach

Reinforcement Learning Coach by Intel AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms
Python
2,321
star
3

control-flag

A system to flag anomalous source code expressions by learning typical expressions from training data
C++
1,241
star
4

fastRAG

Efficient Retrieval Augmentation and Generation Framework
Python
1,194
star
5

flrc

Haskell Research Compiler
Standard ML
814
star
6

RiverTrail

An API for data parallelism in JavaScript
JavaScript
748
star
7

kAFL

A fuzzer for full VM kernel/driver targets
Makefile
636
star
8

bayesian-torch

A library for Bayesian neural network layers and uncertainty estimation in Deep Learning extending the core of PyTorch
Python
503
star
9

academic-budget-bert

Repository containing code for "How to Train BERT with an Academic Budget" paper
Python
308
star
10

ParallelAccelerator.jl

The ParallelAccelerator package, part of the High Performance Scripting project at Intel Labs
Julia
294
star
11

RAGFoundry

Framework for enhancing LLMs for RAG tasks using fine-tuning.
Python
289
star
12

SkimCaffe

Caffe for Sparse Convolutional Neural Network
C++
238
star
13

pWord2Vec

Parallelizing word2vec in shared and distributed memory
C++
191
star
14

causality-lab

Causal discovery algorithms and tools for implementing new ones
Jupyter Notebook
167
star
15

matsciml

Open MatSci ML Toolkit is a framework for prototyping and scaling out deep learning models for materials discovery supporting widely used materials science datasets, and built on top of PyTorch Lightning, the Deep Graph Library, and PyTorch Geometric.
Python
143
star
16

riscv-vector

Vector Acceleration IP core for RISC-V*
Scala
136
star
17

Model-Compression-Research-Package

A library for researching neural networks compression and acceleration methods.
Python
134
star
18

IntelNeuromorphicDNSChallenge

Intel Neuromorphic DNS Challenge
Jupyter Notebook
126
star
19

MMPano

Official implementation of L-MAGIC
Python
123
star
20

rnnlm

Recurrent Neural Network Language Modeling (RNNLM) Toolkit
C++
121
star
21

HPAT.jl

High Performance Analytics Toolkit (HPAT) is a Julia-based framework for big data analytics on clusters.
Julia
120
star
22

FP8-Emulation-Toolkit

PyTorch extension for emulating FP8 data formats on standard FP32 Xeon/GPU hardware.
Python
90
star
23

ScalableVectorSearch

C++
88
star
24

VL-InterpreT

Visual Language Transformer Interpreter - An interactive visualization tool for interpreting vision-language transformers
Python
84
star
25

vdms

VDMS: Your Favorite Visual Data Management System
C++
82
star
26

SpMP

sparse matrix pre-processing library
C++
81
star
27

SLIDE_opt_ia

C++
74
star
28

CLNeRF

Python
63
star
29

baa-ngp

This repository contains the official Implementation for "BAA-NGP: Bundle-Adjusting Accelerated Neural Graphics Primitives".
Python
56
star
30

autonomousmavs

Framework for Autonomous Navigation of Micro Aerial Vehicles
C++
56
star
31

multimodal_cognitive_ai

research work on multimodal cognitive ai
Python
56
star
32

Latte.jl

A high-performance DSL for deep neural networks in Julia
Julia
53
star
33

AVUC

Code to accompany the paper 'Improving model calibration with accuracy versus uncertainty optimization'.
Python
51
star
34

GraVi-T

Graph learning framework for long-term video understanding
Python
49
star
35

PreSiFuzz

Pre-Silicon Hardware Fuzzing Toolkit
Rust
47
star
36

pmgd

Persistent Memory Graph Database
C++
43
star
37

TSAD-Evaluator

Intel Labs open source repository for time series anomaly detection evaluator
C++
41
star
38

Open-Omics-Acceleration-Framework

Intel lab's open sourced data science framework for accelerating digital biology
Jupyter Notebook
36
star
39

Auto-Steer

Auto-Steer
Python
36
star
40

FloorSet

Jupyter Notebook
34
star
41

SAR

Python
34
star
42

kafl.fuzzer

kAFL Fuzzer
Python
32
star
43

CompilerTools.jl

The CompilerTools package, part of the High Performance Scripting project at Intel Labs
Julia
30
star
44

TinyGarble2.0

C++
29
star
45

t2sp

Productive and portable performance programming across spatial architectures (FPGAs, etc.) and vector architectures (GPUs, etc.)
C++
29
star
46

DyNAS-T

Dynamic Neural Architecture Search Toolkit
Jupyter Notebook
28
star
47

ParallelJavaScript

A collection of example workloads for Parallel JavaScript
HTML
26
star
48

kafl.targets

Target components for kAFL/Nyx Fuzzer
C
25
star
49

continuallearning

Python
25
star
50

iHRC

Intel Heterogeneous Research Compiler (iHRC)
C++
25
star
51

scenario_execution

Scenario Execution for Robotics
Python
25
star
52

flrc-lib

Pillar compiler, Pillar runtime, garbage collector.
C++
23
star
53

lvlm-interpret

Python
23
star
54

iACT

C++
22
star
55

OSCAR

Object Sensing and Cognition for Adversarial Robustness
Jupyter Notebook
20
star
56

MICSAS

MISIM: A Neural Code Semantics Similarity System Using the Context-Aware Semantics Structure
Python
19
star
57

mat2qubit

Python
19
star
58

csg

IV 2020 "CSG: Critical Scenario Generation from Real Traffic Accidents"
Python
18
star
59

Sparso

Julia package for accelerating sparse matrix applications.
Julia
18
star
60

open-omics-alphafold

Python
17
star
61

MART

Modular Adversarial Robustness Toolkit
Python
16
star
62

Trans-Omics-Acceleration-Library

HTML
15
star
63

Hardware-Aware-Automated-Machine-Learning

Jupyter Notebook
15
star
64

kafl.linux

Linux kernel branches for confidential compute research
15
star
65

c3-simulator

C3-Simulator is a Simics-based functional simulator for the X86 C3 processor, including library and kernel support for pointer and data encryption, stack unwinding support for C++ exception handling, debugger enabling, and scripting for running tests.
C++
14
star
66

VectorSearchDatasets

Python
11
star
67

flrc-benchmarks

Benchmarks for use with IntelLabs/flrc.
Haskell
10
star
68

ais-benchmarks

A framework, based on python and numpy, for evaluation of sampling methods
Python
10
star
69

ALTO

A template-based implementation of the Adaptive Linearized Tensor Order (ALTO) format for storing and processing sparse tensors.
C++
10
star
70

hec-p-isa-tools

Intel’s HERACLES accelerator introduces a new set of fundamental instructions, the Polynomial Instructions Set Architecture (P-ISA) that operates directly on polynomials requiring a completely new programming environment. This open-source project aims at developing the building blocks for a compiler toolchain for HERACLES.
Python
10
star
71

PyTorchALFI

Application Level Fault Injection for Pytorch
Python
9
star
72

RiverTrail-interactive

An interactive shell in your browser for writing and running River Trail programs
JavaScript
8
star
73

gma

Linux Client & Server Software to support Generic Multi-Access Network Virtualization
C++
8
star
74

dfm

DFM (Deep Feature Modeling) is an efficient and principled method for out-of-distribution detection, novelty and anomaly detection.
Python
7
star
75

SOI_FFT

Segment-of-interest low-communication FFT algorithm
C
7
star
76

vcl

DEPRECATED - No longer maintained. Updates are will be provided through the VDMS project
C++
6
star
77

DATSA

DATSA
C++
6
star
78

Hybrid-Quantum-Classical-Library

Hybrid Quantum-Classical Library (HQCL)
C++
6
star
79

spic

Semantic Preserving Image Compression
Python
6
star
80

generative-ai

Intel Generative Image Model Benchmark
Jupyter Notebook
6
star
81

Optimized-Implementation-of-Word-Movers-Distance

C++
6
star
82

token_elimination

Python
6
star
83

NeuroCounterfactuals

Jupyter Notebook
5
star
84

c3-glibc

C
5
star
85

PolarFly

Source code repository for paper being presented at Super Computing 22 Conference.
C++
5
star
86

aspect-extraction

Pattern Based Aspect Term Extraction
Python
5
star
87

networkgym

NetworkGym is a Simulation-aaS framework to support Network AI algorithm development by providing high-fidelity full-stack e2e network simulation in cloud and allowing AI developers to interact with the simulated network environment through open APIs.
C++
5
star
88

Latte.py

Python
5
star
89

HDFIT

HDFIT (Hardware Design Fault Injection Toolkit) Github documentation pages.
5
star
90

TME-MK-Fine-Grained-Encryption-Integrity

Makefile
5
star
91

EquiTriton

EquiTriton is a project that seeks to implement high-performance kernels for commonly used building blocks in equivariant neural networks, enabling compute efficient training and inference.
Python
4
star
92

Incremental-Neural-Videos-with-PyTorch

Incremental-Neural-Videos-with-PyTorch*
Python
4
star
93

kafl.qemu

4
star
94

simics-plus-rtl

This project contains the Chisel code for a CRC32 datapath alongside a skeleton PCI component in Simics DML which connects to the C++ conversion of the CRC32 datapath.
Scala
4
star
95

Chisel-cocotb-Examples

This project contains generic example hardware modules and their testbenches written in Chisel and cocotb to demonstrate an agile hardware development methodology.
Python
4
star
96

LogReplicationRocksDB

C++
4
star
97

emp-ot

C++
3
star
98

kafl.libxdc

C
3
star
99

kafl.actions

Github actions for KAFL
Python
3
star
100

emp-tool

C++
3
star