• Stars
    star
    180
  • Rank 213,097 (Top 5 %)
  • Language
    Python
  • License
    Other
  • Created over 3 years ago
  • Updated over 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

[CVPR 2021] Teachers Do More Than Teach: Compressing Image-to-Image Models (CAT)

CAT

CVPR | arXiv | website | Tutorial (Image-to-Image) (Our method can be used in mobile devices!)

Input
Night
Style
Anime

Pytorch implementation of our method for compressing image-to-image models.
Teachers Do More Than Teach: Compressing Image-to-Image Models
Qing Jin1, Jian Ren2, Oliver J. Woodford, Jiazhuo Wang2, Geng Yuan1, Yanzhi Wang1, Sergey Tulyakov2
1Northeastern University, 2Snap Inc.
In CVPR 2021.

Overview

Compression And Teaching (CAT) framework for compressing image-to-image models: â‘  Given a pre-trained teacher generator Gt, we determine the architecture of a compressed student generator Gs by eliminating those channels with smallest magnitudes of batch norm scaling factors. â‘¡ We then distill knowledge from the pretrained teacher Gt on the student Gs via a novel distillation technique, which maximize the similarity between features of both generators, defined in terms of kernel alignment (KA).

Prerequisites

  • Linux
  • Python 3
  • CPU or NVIDIA GPU + CUDA CuDNN

Getting Started

Installation

  • Clone this repo:

    git clone [email protected]:snap-research/CAT.git
    cd CAT
  • Install PyTorch 1.7 and other dependencies (e.g., torchvision).

    • For pip users, please type the command pip install -r requirements.txt.
    • For Conda users, please create a new Conda environment using conda env create -f environment.yml.

Data Preparation

CycleGAN

Setup

  • Download the CycleGAN dataset (e.g., horse2zebra).

    bash datasets/download_cyclegan_dataset.sh horse2zebra
  • Get the statistical information for the ground-truth images for your dataset to compute FID. We provide pre-prepared real statistic information for several datasets on Google Drive Folder.

Pix2pix

Setup

  • Download the pix2pix dataset (e.g., cityscapes).

    bash datasets/download_pix2pix_dataset.sh cityscapes

Cityscapes Dataset

For the Cityscapes dataset, we cannot provide it due to license issue. Please download the dataset from https://cityscapes-dataset.com and use the script prepare_cityscapes_dataset.py to preprocess it. You need to download gtFine_trainvaltest.zip and leftImg8bit_trainvaltest.zip and unzip them in the same folder. For example, you may put gtFine and leftImg8bit in database/cityscapes-origin. You need to prepare the dataset with the following commands:

python datasets/get_trainIds.py database/cityscapes-origin/gtFine/
python datasets/prepare_cityscapes_dataset.py \
--gtFine_dir database/cityscapes-origin/gtFine \
--leftImg8bit_dir database/cityscapes-origin/leftImg8bit \
--output_dir database/cityscapes \
--table_path datasets/table.txt

You will get a preprocessed dataset in database/cityscapes and a mapping table (used to compute mIoU) in dataset/table.txt.

  • Get the statistical information for the ground-truth images for your dataset to compute FID. We provide pre-prepared real statistics for several datasets. For example,

    bash datasets/download_real_stat.sh cityscapes A

Evaluation Preparation

mIoU Computation

To support mIoU computation, you need to download a pre-trained DRN model drn-d-105_ms_cityscapes.pth from http://go.yf.io/drn-cityscapes-models. By default, we put the drn model in the root directory of our repo. Then you can test our compressed models on cityscapes after you have downloaded our compressed models.

FID/KID Computation

To compute the FID/KID score, you need to get some statistical information from the groud-truth images of your dataset. We provide a script get_real_stat.py to extract statistical information. For example, for the map2arial dataset, you could run the following command:

python get_real_stat.py \
--dataroot database/map2arial \
--output_path real_stat/maps_B.npz \
--direction AtoB

For paired image-to-image translation (pix2pix and GauGAN), we calculate the FID between generated test images to real test images. For unpaired image-to-image translation (CycleGAN), we calculate the FID between generated test images to real training+test images. This allows us to use more images for a stable FID evaluation, as done in previous unconditional GANs research. The difference of the two protocols is small. The FID of our compressed CycleGAN model increases by 4 when using real test images instead of real training+test images.

KID is not supported for the cityscapes dataset.

Model Training

Teacher Training

The first step of our framework is to train a teacher model. For this purpose, please run the script train_inception_teacher.sh under the correponding folder named as the dataset, for example, run

bash scripts/cycle_gan/horse2zebra/train_inception_teacher.sh

Student Training

With the pretrained teacher model, we can determine the architecture of student model under prescribed computational budget. For this purpose, please run the script train_inception_student_XXX.sh under the correponding folder named as the dataset, where XXX stands for the computational budget (in terms of FLOPs for this case) and can be different for different datasets and models. For example, for CycleGAN with Horse2Zebra dataset, our computational budget is 2.6B FLOPs, so we run

bash scripts/cycle_gan/horse2zebra/train_inception_student_2p6B.sh

Pre-trained Models

For convenience, we also provide pretrained teacher and student models on Google Drive Folder.

Model Evaluation

With pretrained teacher and student models, we can evaluate them on the dataset. For this purpose, please run the script evaluate_inception_student_XXX.sh under the corresponding folder named as the dataset, where XXX is the computational budget (in terms of FLOPs). For example, for CycleGAN with Horse2Zebra dataset where the computational budget is 2.6B FLOPs, please run

bash scripts/cycle_gan/horse2zebra/evaluate_inception_student_2p6B.sh

Model Export

The final step is to export the trained compressed model as onnx file to run on mobile devices. For this purpose, please run the script onnx_export_inception_student_XXX.sh under the corresponding folder named as the dataset, where XXX is the computational budget (in terms of FLOPs). For example, for CycleGAN with Horse2Zebra dataset where the computational budget is 2.6B FLOPs, please run

bash scripts/cycle_gan/horse2zebra/onnx_export_inception_student_2p6B.sh

This will create one .onnx file in addition to log files.

Citation

If you use this code for your research, please cite our paper.

@inproceedings{jin2021teachers,
  title={Teachers Do More Than Teach: Compressing Image-to-Image Models},
  author={Jin, Qing and Ren, Jian and Woodford, Oliver J and Wang, Jiazhuo and Yuan, Geng and Wang, Yanzhi and Tulyakov, Sergey},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={13600--13611},
  year={2021}
}

Acknowledgements

Our code is developed based on AtomNAS and gan-compression.

We also thank pytorch-fid for FID computation and drn for mIoU computation.

More Repositories

1

articulated-animation

Code for Motion Representations for Articulated Animation paper
Jupyter Notebook
1,233
star
2

EfficientFormer

EfficientFormerV2 [ICCV 2023] & EfficientFormer [NeurIPs 2022]
Python
972
star
3

NeROIC

Python
909
star
4

Panda-70M

[CVPR 2024] Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Python
516
star
5

HyperHuman

[ICLR 2024] Github Repo for "HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion"
HTML
489
star
6

MoCoGAN-HD

[ICLR 2021 Spotlight] A Good Image Generator Is What You Need for High-Resolution Video Synthesis
Python
242
star
7

3dgp

3D generation on ImageNet [ICLR 2023]
Python
207
star
8

MMVID

[CVPR 2022] Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning
Python
194
star
9

MobileR2L

[CVPR 2023] Real-Time Neural Light Field on Mobile Devices
Python
192
star
10

R2L

[ECCV 2022] R2L: Distilling Neural Radiance Field to Neural Light Field for Efficient Novel View Synthesis
Python
189
star
11

discoscene

CVPR 2023 Highlight: DiscoScene
Python
143
star
12

3DVADER

Source code for the paper: "AutoDecoding Latent 3D Diffusion Models"
133
star
13

BitsFusion

118
star
14

weights2weights

Official Implementation of weights2weights
Jupyter Notebook
115
star
15

SnapFusion

HTML
95
star
16

F8Net

[ICLR 2022 Oral] F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization
Python
95
star
17

SF-V

This respository contains the code for SF-V: Single Forward Video Generation Model.
82
star
18

AToM

Official implementation of `AToM: Amortized Text-to-Mesh using 2D Diffusion`
82
star
19

graphless-neural-networks

[ICLR 2022] Code for Graph-less Neural Networks: Teaching Old MLPs New Tricks via Distillation (GLNN)
Python
75
star
20

MLPInit-for-GNNs

[ICLR 2023] MLPInit: Embarrassingly Simple GNN Training Acceleration with MLP Initialization
Jupyter Notebook
69
star
21

unsupervised-volumetric-animation

The repository for paper Unsupervised Volumetric Animation
Python
68
star
22

non-contrastive-link-prediction

[ICLR 2023] Link Prediction with Non-Contrastive Learning
Python
26
star
23

linkless-link-prediction

[ICML 2023] Linkless Link Prediction via Relational Distillation
Python
18
star
24

locomo

Python
15
star
25

LargeGT

Graph Transformers for Large Graphs
Python
13
star
26

efficient-nn-tutorial

Page for the CVPR 2023 Tutorial - Efficient Neural Networks: From Algorithm Design to Practical Mobile Deployments
HTML
13
star
27

SpFDE

[NeurIPs 2022] Layer Freezing & Data Sieving: Missing Pieces of a Generic Framework for Sparse Training
11
star
28

GenAU

Jupyter Notebook
7
star
29

representations-for-creativity

HTML
7
star
30

hpdm

Hierarchical Patch Diffusion Models for High-Resolution Video Synthesis [CVPR 2024]
HTML
7
star
31

video-synthesis-tutorial

HTML
5
star
32

promptable-game-models

4
star
33

snap-research-website

https://research.snap.com/
HTML
2
star
34

NeurT-FDR

NeurT-FDR, a method for controlling false discovery rate by incorporating feature hierarchy
Python
2
star
35

qfar

Official implementation of MobiCom 2023 paper "QfaR: Location-Guided Scanning of Visual Codes from Long Distances"
Python
1
star
36

cabam-graph-generation

[KDD MLG'20] Class-Assortative Barabasi Albert Model for Graph Generation
Jupyter Notebook
1
star
37

cv-call-for-interns-2022

HTML
1
star
38

NodeDup

Node Duplication Improves Cold-start Link Prediction
Python
1
star
39

SPAD

Source code for paper "SPAD: Spatially Aware Multi-View Diffusers"
1
star
40

snapvideo

HTML
1
star