Table of contents
Official PyTorch implementation of "Wavelet Diffusion Models are fast and scalable Image Generators" (CVPR'23)
VinAI Research
[Paper] ββ [Poster] ββ [Slides] ββ [Video]
WaveDiff is a novel wavelet-based diffusion scheme that employs low-and-high frequency components of wavelet subbands from both image and feature levels. These are adaptively implemented to accelerate the sampling process while maintaining good generation quality. Experimental results on CelebA-HQ, CIFAR-10, LSUN-Church, and STL-10 datasets show that WaveDiff provides state-of-the-art training and inference speed, which serves as a stepping-stone to offering real-time and high-fidelity diffusion models.
Details of the model architecture and experimental results can be found in our following paper:
@InProceedings{phung2023wavediff,
author = {Phung, Hao and Dao, Quan and Tran, Anh},
title = {Wavelet Diffusion Models Are Fast and Scalable Image Generators},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2023},
pages = {10199-10208}
}
Please CITE our paper whenever this repository is used to help produce published results or incorporated into other software.
Installation
Python 3.7.13
and Pytorch 1.10.0
are used in this implementation.
It is recommended to create conda
env from our provided environment.yml:
conda env create -f environment.yml
conda activate wavediff
Or you can install neccessary libraries as follows:
pip install -r requirements.txt
For pytorch_wavelets
, please follow here.
Dataset preparation
We trained on four datasets, including CIFAR10, STL10, LSUN Church Outdoor 256 and CelebA HQ (256 & 512).
For CIFAR10 and STL10, they will be automatically downloaded in the first time execution.
For CelebA HQ (256) and LSUN, please check out here for dataset preparation.
For CelebA HQ (512 & 1024), please download two zip files: data512x512.zip and data1024x1024.zip and then generate LMDB format dataset by Torch Toolbox.
Those two links of high-res data seem to be broken so we provide our processed lmdb files at here.
Once a dataset is downloaded, please put it in data/
directory as follows:
data/
βββ STL-10
βββ celeba
βββ celeba_512
βββ celeba_1024
βββ cifar-10
βββ lsun
How to run
We provide a bash script for our experiments on different datasets. The syntax is following:
bash run.sh <DATASET> <MODE> <#GPUS>
where:
<DATASET>
:cifar10
,stl10
,celeba_256
,celeba_512
,celeba_1024
, andlsun
.<MODE>
:train
andtest
.<#GPUS>
: the number of gpus (e.g. 1, 2, 4, 8).
Note, please set argument --exp
correspondingly for both train
and test
mode. All of detailed configurations are well set in run.sh.
GPU allocation: Our work is experimented on NVIDIA 40GB A100 GPUs. For train
mode, we use a single GPU for CIFAR10 and STL10, 2 GPUs for CelebA-HQ 256, 4 GPUs for LSUN, and 8 GPUs for CelebA-HQ 512 & 1024. For test
mode, only a single GPU is required for all experiments.
Results
Model performance and pretrained checkpoints are provided as below:
Model | FID | Recall | Time (s) | Checkpoints |
---|---|---|---|---|
CIFAR-10 | 4.01 | 0.55 | 0.08 | netG_1300.pth |
STL-10 | 12.93 | 0.41 | 0.38 | netG_600.pth |
CelebA-HQ (256 x 256) | 5.94 | 0.37 | 0.79 | netG_475.pth |
CelebA-HQ (512 x 512) | 6.40 | 0.35 | 0.59 | netG_350.pth |
LSUN Church | 5.06 | 0.40 | 1.54 | netG_400.pth |
CelebA-HQ (1024 x 1024) | 5.98 | 0.39 | 0.59 | netG_350.pth |
Inference time is computed over 300 trials on a single NVIDIA A100 GPU for a batch size of 100, except for the one of high-resolution CelebA-HQ (512 & 1024) is computed for a batch of 25 samples.
Downloaded pre-trained models should be put in saved_info/wdd_gan/<DATASET>/<EXP>
directory where <DATASET>
is defined in How to run section and <EXP>
corresponds to the folder name of pre-trained checkpoints.
Evaluation
Inference
Samples can be generated by calling run.sh with test
mode.
FID
To compute fid of pretrained models at a specific epoch, we can add additional arguments including --compute_fid
and --real_img_dir /path/to/real/images
of the corresponding experiments in run.sh.
Recall
We adopt the official Pytorch implementation of StyleGAN2-ADA to compute Recall of generated samples.
Acknowledgments
Thanks to Xiao et al for releasing their official implementation of the DDGAN paper. For wavelet transformations, we utilize implementations from WaveCNet and pytorch_wavelets.
Contacts
If you have any problems, please open an issue in this repository or ping an email to [email protected].