Dreaming to Distill: Data-free Knowledge Transfer via DeepInversion
This repository is the official PyTorch implementation of Dreaming to Distill: Data-free Knowledge Transfer via DeepInversion presented at CVPR 2020.
The code will help to invert images from models of torchvision (pretrained on ImageNet), and run the images over another model to check generalization. We plan to update repo with CIFAR10 examples and teacher-student training.
Useful links:
- Camera Ready PDF
- ArXiv Full
- Dataset - Synthesized ImageNet: from ResNet50v1.5, ~2GB, organized by classes, ~140k images. Were used in Section 4.4 (Data-free Knowledge Transfer), best viewed in gThumb.
License
Copyright (C) 2020 NVIDIA Corporation. All rights reserved.
This work is made available under the Nvidia Source Code License (1-Way Commercial). To view a copy of this license, visit https://github.com/NVlabs/DeepInversion/blob/master/LICENSE
Updates
- 2020 July 7. Added CIFAR10 inversion result for ResNet34 in the folder cifar10. Code on knowledge distillation will follow soon.
- 2020 June 16. Added a new scaling factor
first_bn_multiplier
for first BN layer. This improves fidelity.
Requirements
Code was tested in virtual environment with Python 3.6. Install requirements:
pip install torch==1.4.0
pip install torchvision==0.5.0
pip install numpy
pip install Pillow
Additionally install APEX library for FP16 support (2x less memory, 2x faster): Installing NVIDIA APEX
Provided code was originally designed to invert ResNet50v1.5 model trained for 90 epochs that achieves 77.26% top-1 on ImageNet. We are not able to share the model, but anyone can train it here: ResNet50v1.5. Code works well for the default ResNet50 from torchvision package.
Code was tested on NVIDIA V100 GPU and Titan X Pascal.
Running the code
This snippet will generate 84 images by inverting resnet50 model from torchvision package.
python imagenet_inversion.py --bs=84 --do_flip --exp_name="rn50_inversion" --r_feature=0.01 --arch_name="resnet50" --verifier --adi_scale=0.0 --setting_id=0 --lr 0.25
Arguments:
bs
- batch size, should be close to original batch size during training, but not necessary.lr
- learning rate for the optimizer of input tensor for model inversion.do_flip
- will do random flipping between iterationsexp_name
- name of the experiment, will create folder with this name in./generations/
where intermediate generations will be stored after 100 iterationsr_feature
- coefficient for feature distribution regularization, might need adjustment for other networksarch_name
- name of the network architecture, should be one of pretrained models from torch vision package:resnet50
,resnet18
,mobilenet_v2
etc.fp16
- enables FP16 training if needed, will use FP16 training via APEX AMP (O2 level)verifier
- enables checking accuracy of generated images with another network (defmobilenet_v2
) network after each 100 iterations. Useful to observe generalizability of generated images.setting_id
- settings for optimization: 0 - multi resolution scheme, 1 - 2k iterations full resolution, 2 - 20k iterations (the closes to ResNet50 experiments in the paper). Recommended to use setting_id={0, 1}adi_scale
- competition coefficient. With positive value will lead to images that are good for the original model, but bad for verifier. Value 0.2 was used in the paper.random_label
- randomly select classes for inversion. Without this argument the code will generate hand picked classes.
After 3k iterations (~6 mins on NVIDIA V100) generation is done: Verifier accuracy: 91.6...%
(experiment with >98% verifier accuracy can be found /example_logs
). We generated images by inverting vanilla ResNet50 (not trained for image generation) and classification accuracy by MobileNetv2 is >90%. A grid of images look like (from /final_images/
, reduced quality due to JPEG compression. )
Optimization is sensitive to hyper-parameters. Try local tunings for your setups/applications. Try to change the r_feature coefficient, l2 regularization, betas of Adam optimizer (beta=0 work well). Keep looking at loss_r_feature
as it indicates how close feature statistics are to the training distribution.
Reduce batch size if out of memory or without FP16 optimization. In the paper, we used batch size of 152, and larger batch size is preferred. This code will generate images from 41 hand picked classes. To randomize the target classes, simply use argument --random_label
.
Examples of running code with different arguments and resulting images can be found at /example_logs/
.
Check if you can invert other architectures, or even apply to other applications (keypoints, detection etc.). Method has a room for improvement: (a) improving the loss for feature regularization (we used MSE in paper but that may not be ideal for distribution matching), (b) making it even faster, (c) generating images for which multiple models are confident, (d) increasing diversity.
Share your most exciting images at Twitter with hashtag #Deepinversion and #DeepInvert.
Citation
@inproceedings{yin2020dreaming,
title = {Dreaming to Distill: Data-free Knowledge Transfer via DeepInversion},
author = {Yin, Hongxu and Molchanov, Pavlo and Alvarez, Jose M. and Li, Zhizhong and Mallya, Arun and Hoiem, Derek and Jha, Niraj K and Kautz, Jan},
booktitle = {The IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR)},
month = June,
year = {2020}
}