• Stars
    star
    372
  • Rank 114,858 (Top 3 %)
  • Language
    Python
  • License
    GNU Affero Genera...
  • Created almost 2 years ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Official Pytorch Implementation of the paper: Wavelet Diffusion Models are fast and scalable Image Generators (CVPR'23)
Table of contents
  1. Installation
  2. Dataset preparation
  3. How to run
  4. Results
  5. Evaluation
  6. Acknowledgments
  7. Contacts

Official PyTorch implementation of "Wavelet Diffusion Models are fast and scalable Image Generators" (CVPR'23)

Haoย Phung โ€ƒ ยท โ€ƒ Quanย Dao โ€ƒ ยท โ€ƒ Anhย Tran

VinAI Research

[Paper] โ€ƒโ€ƒ [Poster] โ€ƒโ€ƒ [Slides] โ€ƒโ€ƒ [Video]

teaser

WaveDiff is a novel wavelet-based diffusion scheme that employs low-and-high frequency components of wavelet subbands from both image and feature levels. These are adaptively implemented to accelerate the sampling process while maintaining good generation quality. Experimental results on CelebA-HQ, CIFAR-10, LSUN-Church, and STL-10 datasets show that WaveDiff provides state-of-the-art training and inference speed, which serves as a stepping-stone to offering real-time and high-fidelity diffusion models.

Details of the model architecture and experimental results can be found in our following paper:

@InProceedings{phung2023wavediff,
    author    = {Phung, Hao and Dao, Quan and Tran, Anh},
    title     = {Wavelet Diffusion Models Are Fast and Scalable Image Generators},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2023},
    pages     = {10199-10208}
}

Please CITE our paper whenever this repository is used to help produce published results or incorporated into other software.

Installation

Python 3.7.13 and Pytorch 1.10.0 are used in this implementation.

It is recommended to create conda env from our provided environment.yml:

conda env create -f environment.yml
conda activate wavediff

Or you can install neccessary libraries as follows:

pip install -r requirements.txt

For pytorch_wavelets, please follow here.

Dataset preparation

We trained on four datasets, including CIFAR10, STL10, LSUN Church Outdoor 256 and CelebA HQ (256 & 512).

For CIFAR10 and STL10, they will be automatically downloaded in the first time execution.

For CelebA HQ (256) and LSUN, please check out here for dataset preparation.

For CelebA HQ (512 & 1024), please download two zip files: data512x512.zip and data1024x1024.zip and then generate LMDB format dataset by Torch Toolbox.

Those two links of high-res data seem to be broken so we provide our processed lmdb files at here.

Once a dataset is downloaded, please put it in data/ directory as follows:

data/
โ”œโ”€โ”€ STL-10
โ”œโ”€โ”€ celeba
โ”œโ”€โ”€ celeba_512
โ”œโ”€โ”€ celeba_1024
โ”œโ”€โ”€ cifar-10
โ””โ”€โ”€ lsun

How to run

We provide a bash script for our experiments on different datasets. The syntax is following:

bash run.sh <DATASET> <MODE> <#GPUS>

where:

  • <DATASET>: cifar10, stl10, celeba_256, celeba_512, celeba_1024, and lsun.
  • <MODE>: train and test.
  • <#GPUS>: the number of gpus (e.g. 1, 2, 4, 8).

Note, please set argument --exp correspondingly for both train and test mode. All of detailed configurations are well set in run.sh.

GPU allocation: Our work is experimented on NVIDIA 40GB A100 GPUs. For train mode, we use a single GPU for CIFAR10 and STL10, 2 GPUs for CelebA-HQ 256, 4 GPUs for LSUN, and 8 GPUs for CelebA-HQ 512 & 1024. For test mode, only a single GPU is required for all experiments.

Results

Model performance and pretrained checkpoints are provided as below:

Model FID Recall Time (s) Checkpoints
CIFAR-10 4.01 0.55 0.08 netG_1300.pth
STL-10 12.93 0.41 0.38 netG_600.pth
CelebA-HQ (256 x 256) 5.94 0.37 0.79 netG_475.pth
CelebA-HQ (512 x 512) 6.40 0.35 0.59 netG_350.pth
LSUN Church 5.06 0.40 1.54 netG_400.pth
CelebA-HQ (1024 x 1024) 5.98 0.39 0.59 netG_350.pth

Inference time is computed over 300 trials on a single NVIDIA A100 GPU for a batch size of 100, except for the one of high-resolution CelebA-HQ (512 & 1024) is computed for a batch of 25 samples.

Downloaded pre-trained models should be put in saved_info/wdd_gan/<DATASET>/<EXP> directory where <DATASET> is defined in How to run section and <EXP> corresponds to the folder name of pre-trained checkpoints.

Evaluation

Inference

Samples can be generated by calling run.sh with test mode.

FID

To compute fid of pretrained models at a specific epoch, we can add additional arguments including --compute_fid and --real_img_dir /path/to/real/images of the corresponding experiments in run.sh.

Recall

We adopt the official Pytorch implementation of StyleGAN2-ADA to compute Recall of generated samples.

Acknowledgments

Thanks to Xiao et al for releasing their official implementation of the DDGAN paper. For wavelet transformations, we utilize implementations from WaveCNet and pytorch_wavelets.

Contacts

If you have any problems, please open an issue in this repository or ping an email to [email protected].

More Repositories

1

PhoGPT

PhoGPT: Generative Pre-training for Vietnamese (2023)
Python
720
star
2

PhoBERT

PhoBERT: Pre-trained language models for Vietnamese (EMNLP-2020 Findings)
658
star
3

BERTweet

BERTweet: A pre-trained language model for English Tweets (EMNLP-2020)
Python
573
star
4

CPM

๐Ÿ’„ Lipstick ain't enough: Beyond Color-Matching for In-the-Wild Makeup Transfer (CVPR 2021)
Python
364
star
5

XPhoneBERT

XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech (INTERSPEECH 2023)
Python
292
star
6

Anti-DreamBooth

Anti-DreamBooth: Protecting users from personalized text-to-image synthesis (ICCV 2023)
Python
205
star
7

LFM

Official PyTorch implementation of the paper: Flow Matching in Latent Space
Python
184
star
8

blur-kernel-space-exploring

Exploring Image Deblurring via Blur Kernel Space (CVPR'21)
Python
137
star
9

PhoNLP

PhoNLP: A BERT-based multi-task learning model for part-of-speech tagging, named entity recognition and dependency parsing (NAACL 2021)
Python
137
star
10

dict-guided

Dictionary-guided Scene Text Recognition (CVPR-2021)
Python
126
star
11

VinAI_Translate

A Vietnamese-English Neural Machine Translation System (INTERSPEECH 2022)
123
star
12

MagNet

Progressive Semantic Segmentation (CVPR-2021)
Python
114
star
13

Warping-based_Backdoor_Attack-release

WaNet - Imperceptible Warping-based Backdoor Attack (ICLR 2021)
Python
111
star
14

HyperInverter

HyperInverter: Improving StyleGAN Inversion via Hypernetwork (CVPR 2022)
Python
111
star
15

BARTpho

BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese (INTERSPEECH 2022)
99
star
16

PhoWhisper

PhoWhisper: Automatic Speech Recognition for Vietnamese (2024)
96
star
17

ISBNet

ISBNet: a 3D Point Cloud Instance Segmentation Network with Instance-aware Sampling and Box-aware Dynamic Convolution (CVPR 2023)
Python
93
star
18

Dataset-Diffusion

Dataset Diffusion: Diffusion-based Synthetic Data Generation for Pixel-Level Semantic Segmentation (NeurIPS2023)
Jupyter Notebook
87
star
19

JointIDSF

BERT-based joint intent detection and slot filling with intent-slot attention mechanism (INTERSPEECH 2021)
Python
84
star
20

3D-UCaps

3D-UCaps: 3D Capsules Unet for Volumetric Image Segmentation (MICCAI 2021)
Python
65
star
21

PhoNER_COVID19

COVID-19 Named Entity Recognition for Vietnamese (NAACL 2021)
63
star
22

PCC-pytorch

A pytorch implementation of the paper "Prediction, Consistency, Curvature: Representation Learning for Locally-Linear Control"
Python
59
star
23

Counting-DETR

Few-shot Object Counting and Detection (ECCV 2022)
Python
56
star
24

PSENet-Image-Enhancement

PSENet: Progressive Self-Enhancement Network for Unsupervised Extreme-Light Image Enhancement (WACV 2023)
Python
54
star
25

LeMul

Toward Realistic Single-View 3D Object Reconstruction with Unsupervised Learning from Multiple Images (ICCV 2021)
Python
51
star
26

DSW

Distributional Sliced-Wasserstein distance code
Python
47
star
27

PhoMT

PhoMT: A High-Quality and Large-Scale Benchmark Dataset for Vietnamese-English Machine Translation (EMNLP 2021)
40
star
28

single_image_hdr

Single-Image HDR Reconstruction by Multi-Exposure Generation (WACV 2023)
Python
38
star
29

SwiftBrush

SwiftBrush: One-Step Text-to-Image Diffusion Model with Variational Score Distillation (CVPR 2024)
Python
37
star
30

tise-toolbox

TISE: Bag of Metrics for Text-to-Image Synthesis Evaluation (ECCV 2022)
Python
33
star
31

Point-Unet

Point-Unet: A Context-aware Point-based Neural Network for Volumetric Segmentation (MICCAI 2021)
Python
32
star
32

COVID19Tweet

WNUT-2020 Task 2: Identification of informative COVID-19 English Tweets
Python
30
star
33

CREPS

Efficient Scale-Invariant Generator with Column-Row Entangled Pixel Synthesis (CVPR 2023)
Python
30
star
34

ViText2SQL

ViText2SQL: A dataset for Vietnamese Text-to-SQL semantic parsing (EMNLP-2020 Findings)
28
star
35

input-aware-backdoor-attack-release

Input-aware Dynamic Backdoor Attack (NeurIPS 2020)
Python
27
star
36

QC-StyleGAN

QC-StyleGAN - Quality Controllable Image Generation and Manipulation (NeurIPS 2022)
Python
26
star
37

fsvc-ata

Inductive and Transductive Few-Shot Video Classification via Appearance and Temporal Alignments (ECCV 2022)
Python
23
star
38

GeoFormer

Geodesic-Former: a Geodesic-Guided Few-shot 3D Point Cloud Instance Segmenter (ECCV 2022)
Python
23
star
39

PhoST

A High-Quality and Large-Scale Dataset for English-Vietnamese Speech Translation (INTERSPEECH 2022)
19
star
40

MISCA

MISCA: A Joint Model for Multiple Intent Detection and Slot Filling with Intent-Slot Co-Attention (EMNLP 2023 - Findings)
Python
18
star
41

PC3-pytorch

Predictive Coding for Locally-Linear Control (ICML-2020)
Python
16
star
42

Open3DIS

Open3DIS: Open-vocabulary 3D Instance Segmentation with 2D Mask Guidance (CVPR 2024)
Python
16
star
43

EFHQ

Code and data for the CVPR24 paper "EFHQ: Multi-purpose ExtremePose-Face-HQ dataset" [CVPR'24]
Python
15
star
44

TPC-tensorflow

Temporal Predictive Coding For Model-Based Planning In Latent Space (ICML-2021)
Python
14
star
45

iFS-RCNN

iFS-RCNN: An Incremental Few-shot Instance Segmenter (CVPR 2022)
Python
14
star
46

GaPro

GaPro: Box-Supervised 3D Point Cloud Instance Segmentation Using Gaussian Processes as Pseudo Labelers (ICCV 2023)
Python
13
star
47

HyperCUT

HyperCUT: Video Sequence from a Single Blurry Image using Unsupervised Ordering (CVPR'23)
Python
12
star
48

LP-OVOD

LP-OVOD: Open-Vocabulary Object Detection by Linear Probing (WACV 2024)
Python
11
star
49

selfsup_pcd

Self-Supervised Learning with Multi-View Rendering for 3D Point Cloud Analysis (ACCV 2022)
Python
8
star
50

PointSWD

Point-set Distances for Learning Representations of 3D Point Clouds (ICCV 2021)
Python
7
star
51

PhoATIS_Disfluency

From Disfluency Detection to Intent Detection and Slot Filling (INTERSPEECH 2022)
7
star
52

JPIS

JPIS: A Joint Model for Profile-Based Intent Detection and Slot Filling with Slot-to-Intent Attention (ICASSP 2024)
Python
6
star
53

SA-DPM

Official PyTorch implementation of "On Inference Stability for Diffusion Models" (AAAI'24)
Python
5
star
54

PhoDisfluency

Disfluency Detection for Vietnamese (WNUT 2022)
4
star
55

DiverseDream

DiverseDream: A Technique to Generate Diverse 3D Objects from the Same Text Prompt (ECCV '24)
Python
3
star
56

robust-bayesian-recourse

Robust Bayesian Recourse: a robust model-agnostic algorithmic recourse method (UAI'22)
Python
2
star
57

RDUOT

Official code for ECCV 2024 paper โ€œA high-quality robust diffusion framework for corrupted datasetโ€
Python
1
star
58

LAMPAT

LAMPAT: Low-rank Adaptation Multilingual Paraphrasing using Adversarial Training (AAAI'24)
Python
1
star