• Stars
    star
    136
  • Rank 266,253 (Top 6 %)
  • Language
    Python
  • License
    MIT License
  • Created almost 5 years ago
  • Updated almost 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

[ICLR 2020] Drawing Early-Bird Tickets: Toward More Efficient Training of Deep Networks

Early-Bird-Tickets

ICLR2020: spotlight License: MIT

This is PyTorch implementation of Drawing Early-Bird Tickets: Toward More Efficient Training of Deep Networks

ICLR 2020 spotlight oral paper

Table of Content

Introduction

  • Lottery Ticket Hypothesis: (Frankle & Carbin, 2019) shows that there exist winning tickets (small but critical subnetworks) for dense, randomly initialized networks, that can be trained alone to achieve comparable accuracies to the latter in a similar number of iterations.

  • Limitation: However, the identification of these winning tickets still requires the costly train-prune-retrain process, limiting their practical benefits.

  • Our Contributions:

    • We discover for the first time that the winning tickets can be identified at the very early training stage, which we term as early-bird (EB) tickets, via low-cost training schemes (e.g., early stopping and low-precision training) at large learning rates. Our finding of EB tickets is consistent with recently reported observations that the key connectivity patterns of neural networks emerge early.
    • Furthermore, we propose a mask distance metric that can be used to identify EB tickets with low computational overhead, without needing to know the true winning tickets that emerge after the full training.
    • Finally, we leverage the existence of EB tickets and the proposed mask distance to develop efficient training methods, which are achieved by first identifying EB tickets via low-cost schemes, and then continuing to train merely the EB tickets towards the target accuracy.

Experiments based on various deep networks and datasets validate: 1) the existence of EB tickets, and the effectiveness of mask distance in efficiently identifying them; and 2) that the proposed efficient training via EB tickets can achieve up to 4.7x energy savings while maintaining comparable or even better accuracy, demonstrating a promising and easily adopted method for tackling cost-prohibitive deep network training.

Early-Bird Tickets

Existence of Early-Bird Tickets

To articulate the Early-Bird (EB) tickets phenomenon: the winning tickets can be drawn very early in training, we perform ablation simulation using two representative deep models (VGG16 and PreResNet101) on two popular datasets (CIFAR10 and CIFAR100). Specifically, we follow the main idea of (Frankle & Carbin, 2019) but instead prune networks trained at earlier points to see if reliable tickets can be drawn. We adopt the same channel pruning in (Liu et al., 2017) as pruning techniuqes for all experiments since it aligns with our end goal of efficient trianing. Below figure demonstrates the existence of EB tickets (p = 30% means 30% weights are pruned, hollow star means retraining accuracy of subnetwork drawn from checkpoint with best accuracy in search stage).

Identify Early-Bird Tickets

we visialize distance evolution process among the tickets drawn from each epoch. Below figure plots the pairwise mask distance matrices (160 x 160) of the VGG16 and PreResNet101 experiments on CIFAR100 at different pruning ratio p, where (i, j)-th element in a matrix denotes the mask distance between epochs i and j in that corresponding experiment. A lower distance (close to 0) indicates a smaller mask distance and is colored warmer.

overlap

Our observation that the ticket masks quickly become stable and hardly changed in early training stages supports drawing EB tickets. We therefore measure the mask distance consecutive epochs, and draw EB tickets when such distance is smaller than a threshold. Practically, to improve the reliability of EB tickets, we will stop to draw EB tickets when the last five recorded mask distances are all smaller than given threshold.

Efficient Training via Early-Bird Tickets

Instead of adopting a three-step routine of 1) training a dense model, 2) pruning it and 3) then retraining the pruned model to restore performance, and these three steps can be iterated, we leverage the existence of EB tickets to develop EB Train scheme which replaces the aforementioned steps 1 and 2 with a lower-cost step of detecting the EB tickets.

eb-train

Basic Usage

Prerequisites

The code has the following dependencies:

  • python 3.7
  • pytorch 1.1.0
  • torchvision 0.3.0
  • Pillow (PIL) 5.4.1
  • scipy 1.2.1
  • qtorch 0.1.1 (for low precision)
  • GCC >= 4.9 on linux (for low precision)

Core Training Options

  • dataset: which dataset you want to use CIFAR10/100 by default
  • data: If you want to use ImageNet, plz specified the path to raw data
  • batch-size: all exps use 256 by default in paper
  • epochs: total epochs, 160 in total
  • schedule: at which points the learning rate degraded, use [80, 120] by default
  • lr: initial learning rate, 0.1 by default
  • save: save checkpoints to the specific directory
  • arch: which model you want to use, support vgg and resnet now
  • depth: model depth
  • filter: apply filter to dataset, default is none
  • sparsify_gt: sparify the dataset with given percentage
  • gpu_ids: multi-gpus is supported

Standard Train for Identifying Early-Bird Tickets

Example: Reproduce early-bird (EB) tickets on CIFAR-100

  • Step1: Standard train to find EB tickets at different pruning ratio. Note that one can directly stop training after identifying the emergence of EB tickets while we keep training here to compare among underlying subnetworks drawn at different training stages.
bash ./scripts/standard-train/search.sh
  • Step2: Conduct real prune for the saved checkpoints (checkpoints containing EB tickets are represented as EB-{pruning ratio}-{drawing epoch}.pth.tar format).
bash ./scripts/standard-train/prune.sh
  • Optional: Pairwise mask distance matrix visualization.
bash ./scripts/standard-train/mask_distance.sh

After calculating mask distance matrix (automatically save as overlap-0.5.npy), u can call plot_overlap.py to draw figures.

Retrain to Restore Accuracy

Example: Retrain drawn EB tickets (e.g., VGG16 for CIFAR-100) to restore accuracy

  • Finetune EB tickets from emergence epoch. Note we keep sparsity regularization for underlying iterative pruning.
bash ./scripts/standard-train/retrain_continue.sh
  • Retrain re-initialized EB tickets from scratch (refer to EB Train (re-init) in Sec. 4.3 of paper).
bash ./scripts/standard-train/retrain_scratch.sh

Low Precision Search and Retrain

We perform low precision method SWALP to both the search and retrian stages (refer to EB Train LL in Sec. 4.3 of paper). Below is the guidance taking VGG16 performed on CIFAR-10 as an example:

  • Step1: Standard train to find EB tickets at different pruning ratio.
bash ./scripts/low-precision/search.sh
  • Step 2: Conduct real prune for the saved checkpoints.
bash ./scripts/low-precision/prune.sh
  • Step 3: Finetune EB tickets from emergence epoch.
bash ./scripts/low-precision/retrain_continue.sh
  • Comparison example
eb-train

ImageNet Experiments

All pretrained checkpoints of different pruning ratio have been collected in Google Drive. To evaluate the inference accuracy of test set, we provide evaluation scripts ( EVAL_ResNet18_ImageNet.py and EVAL_ResNet50_ImageNet.py ) and corresponding commands shown below for your convenience.

bash ./scripts/resnet18-imagenet/evaluation.sh
bash ./scripts/resnet50-imagenet/evaluation.sh

ResNet18 on ImageNet

  • Step1: Standard train to find EB tickets at different pruning ratio.
bash ./scripts/resnet18-imagenet/search.sh
  • Step 2: Conduct real prune for the saved checkpoints.
bash ./scripts/resnet18-imagenet/prune.sh
  • Step 3: Finetune EB tickets from emergence epoch.
bash ./scripts/resnet18-imagenet/retrain_continue.sh
  • comparison results
eb-train

ResNet50 on ImageNet

  • Step1: Standard train to find EB tickets at different pruning ratio.
bash ./scripts/resnet50-imagenet/search.sh
  • Step 2: Conduct real prune for the saved checkpoints.
bash ./scripts/resnet50-imagenet/prune.sh
  • Step 3: Finetune EB tickets from emergence epoch.
bash ./scripts/resnet50-imagenet/retrain_continue.sh

Citation

If you find this code is useful for your research, please cite:

@inproceedings{
you2020drawing,
title={Drawing Early-Bird Tickets: Toward More Efficient Training of Deep Networks},
author={Haoran You and Chaojian Li and Pengfei Xu and Yonggan Fu and Yue Wang and Xiaohan Chen and Yingyan Lin and Zhangyang Wang and Richard G. Baraniuk},
booktitle={International Conference on Learning Representations},
year={2020},
url={https://openreview.net/forum?id=BJxsrgStvr}
}

Acknowledgement

More Repositories

1

HW-NAS-Bench

[ICLR 2021] HW-NAS-Bench: Hardware-Aware Neural Architecture Search Benchmark
Python
100
star
2

ViTCoD

[HPCA 2023] ViTCoD: Vision Transformer Acceleration via Dedicated Algorithm and Accelerator Co-Design
Python
87
star
3

ShiftAddNet

[NeurIPS 2020] ShiftAddNet: A Hardware-Inspired Deep Network
Python
66
star
4

AutoDNNchip

Python
66
star
5

BNS-GCN

[MLSys 2022] "BNS-GCN: Efficient Full-Graph Training of Graph Convolutional Networks with Partition-Parallelism and Random Boundary Node Sampling" by Cheng Wan, Youjie Li, Ang Li, Nam Sung Kim, Yingyan Lin
Python
49
star
6

ShiftAddLLM

ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization
Python
41
star
7

DepthShrinker

[ICML 2022] "DepthShrinker: A New Compression Paradigm Towards Boosting Real-Hardware Efficiency of Compact Neural Networks", by Yonggan Fu, Haichuan Yang, Jiayi Yuan, Meng Li, Cheng Wan, Raghuraman Krishnamoorthi, Vikas Chandra, Yingyan Lin
35
star
8

GCoD

[HPCA 2022] GCoD: Graph Convolutional Network Acceleration via Dedicated Algorithm and Accelerator Co-Design
Python
32
star
9

CPT

[ICLR 2021] "CPT: Efficient Deep Neural Network Training via Cyclic Precision" by Yonggan Fu, Han Guo, Meng Li, Xin Yang, Yining Ding, Vikas Chandra, Yingyan Lin
Python
29
star
10

ShiftAddViT

[NeurIPS 2023] ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision Transformer
Python
28
star
11

PipeGCN

[ICLR 2022] "PipeGCN: Efficient Full-Graph Training of Graph Convolutional Networks with Pipelined Feature Communication" by Cheng Wan, Youjie Li, Cameron R. Wolfe, Anastasios Kyrillidis, Nam Sung Kim, Yingyan Lin
Python
27
star
12

Patch-Fool

[ICLR 2022] "Patch-Fool: Are Vision Transformers Always Robust Against Adversarial Perturbations?" by Yonggan Fu, Shunyao Zhang, Shang Wu, Cheng Wan, Yingyan Lin
Python
25
star
13

Castling-ViT

[CVPR 2023] Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention During Vision Transformer Inference
Python
24
star
14

DNN-Chip-Predictor

[ICASSP'20] DNN-Chip Predictor: An Analytical Performance Predictor for DNN Accelerators with Various Dataflows and Hardware Architectures
Python
22
star
15

E2Train

[NeurIPS 2019] E2-Train: Training State-of-the-art CNNs with Over 80% Less Energy
Python
21
star
16

ViTALiTy

ViTALiTy (HPCA'23) Code Repository
Python
18
star
17

SuperTickets

[ECCV 2022] SuperTickets: Drawing Task-Agnostic Lottery Tickets from Supernets via Jointly Architecture Searching and Parameter Pruning
Python
17
star
18

ShiftAddNAS

[ICML 2022] ShiftAddNAS: Hardware-Inspired Search for More Accurate and Efficient Neural Networks
Python
14
star
19

mg-verilog

Python
14
star
20

Auto-NBA

[ICML 2021] "Auto-NBA: Efficient and Effective Search Over the Joint Space of Networks, Bitwidths, and Accelerators" by Yonggan Fu, Yongan Zhang, Yang Zhang, David Cox, Yingyan Lin
Python
14
star
21

NeRFool

[ICML 2023] "NeRFool: Uncovering the Vulnerability of Generalizable Neural Radiance Fields against Adversarial Perturbations" by Yonggan Fu, Ye Yuan, Souvik Kundu, Shang Wu, Shunyao Zhang, Yingyan (Celine) Lin
Python
14
star
22

S3-Router

[NeurIPS 2022] "Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing" by Yonggan Fu, Yang Zhang, Kaizhi Qian, Zhifan Ye, Zhongzhi Yu, Cheng-I Lai, Yingyan Lin
Python
14
star
23

Robust-Scratch-Ticket

[NeurIPS 2021] "Drawing Robust Scratch Tickets: Subnetworks with Inborn Robustness Are Found within Randomly Initialized Networks" by Yonggan Fu, Qixuan Yu, Yang Zhang, Shang Wu, Xu Ouyang, David Cox, Yingyan Lin
Python
13
star
24

Double-Win-Quant

[ICML 2021] "Double-Win Quant: Aggressively Winning Robustness of Quantized DeepNeural Networks via Random Precision Training and Inference" by Yonggan Fu, Qixuan Yu, Meng Li, Vikas Chandra, Yingyan Lin
Python
12
star
25

Linearized-LLM

[ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models
Python
11
star
26

LLM4HWDesign_Starting_Toolkit

LLM4HWDesign Starting Toolkit
Python
11
star
27

FracTrain

[NeurIPS 2020] "FracTrain: Fractionally Squeezing Bit Savings Both Temporally and Spatially for Efficient DNN Training" by Yonggan Fu, Haoran You, Yang Zhao, Yue Wang, Chaojian Li, Kailash Gopalakrishnan, Zhangyang Wang, Yingyan Lin
Python
11
star
28

torchshiftadd

An open-sourced PyTorch library for developing energy efficient multiplication-less models and applications.
10
star
29

HALO

The official code for [ECCV2020] "HALO: Hardware-aware Learning to Optimize"
Python
9
star
30

NASA

[ICCAD 2022] NASA: Neural Architecture Search and Acceleration for Hardware Inspired Hybrid Networks
Python
8
star
31

SACoD

[ICCV 2021] "SACoD: Sensor Algorithm Co-Design Towards Efficient CNN-powered Intelligent PhlatCam" by Yonggan Fu, Yang Zhang, Yue Wang, Zhihan Lu, Vivek Boominathan, Ashok Veeraraghavan, Yingyan Lin
Python
8
star
32

Edge-LLM

[DAC 2024] EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Layerwise Unified Compression and Adaptive Layer Tuning and Voting
Python
7
star
33

TinyML-Contest-Solution

7
star
34

TinyML2023EIC-Gatech-Open

C
6
star
35

ACT

[ICML 2024] Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibration
Python
6
star
36

Early-Bird-GCN

[AAAI 2022] Early-Bird GCNs: Graph-Network Co-Optimization Towards More Efficient GCN Training and Inference via Drawing Early-Bird Lottery Tickets
Python
4
star
37

Hint-Aug

Python
4
star
38

EyeCoD

[ISCA 2022] EyeCoD: Eye Tracking System Acceleration via FlatCam-based Algorithm & Accelerator Co-Design
3
star
39

InstantNet

[DAC 2021] "InstantNet: Automated Generation and Deployment of Instantaneously Switchable-Precision Networks" by Yonggan Fu, Zhongzhi Yu, Yongan Zhang, Yifan Jiang, Chaojian Li, Yongyuan Liang, Mingchao Jiang, Zhangyang Wang, Yingyan Lin
Python
3
star
40

Spline-EB

[TMLR] Max-Affine Spline Insights Into Deep Network Pruning
Python
1
star