PVDM
Official PyTorch implementation of "Video Probabilistic Diffusion Models in Projected Latent Space" (CVPR 2023).
Sihyun Yu1,
Kihyuk Sohn2,
Subin Kim1,
Jinwoo Shin1.
1KAIST, 2Google Research
paper | project page
1. Environment setup
conda create -n pvdm python=3.8 -y
conda activate pvdm
pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu116
pip install natsort tqdm gdown omegaconf einops lpips pyspng tensorboard imageio av moviepy
2. Dataset
Dataset download
Currently, we provide experiments for the following two datasets: UCF-101 and SkyTimelapse. Each dataset should be placed in /data
with the following structures below; you may change the data location directory in tools/dataloadet.py
by adjusting the variable data_location
.
UCF-101
UCF-101
|-- class1
|-- video1.avi
|-- video2.avi
|-- ...
|-- class2
|-- video1.avi
|-- video2.avi
|-- ...
|-- ...
SkyTimelapse
SkyTimelapse
|-- train
|-- video1
|-- frame00000.png
|-- frame00001.png
|-- ...
|-- video2
|-- frame00000.png
|-- frame00001.png
|-- ...
|-- ...
|-- val
|-- video1
|-- frame00000.png
|-- frame00001.png
|-- ...
|-- ...
3. Training
Autoencoder
First, execute the following script:
python main.py
--exp first_stage \
--id [EXP_NAME] \
--pretrain_config configs/autoencoder/base.yaml \
--data [DATASET_NAME] \
--batch_size [BATCH_SIZE]
Then the script will automatically create the folder in ./results
to save logs and checkpoints.
If the loss converges, then execute the following script:
python main.py
--exp first_stage \
--id [EXP_NAME]_gan \
--pretrain_config configs/autoencoder/base_gan.yaml \
--data [DATASET] \
--batch_size [BATCH_SIZE] \
--first_stage_folder [DIRECTOTY OF PREVIOUS EXP]
Here, [EXP_NAME]
is an experiment name you want to specifiy (string), [DATASET]
is either UCF101
or SKY
, and [DIRECTOTY OF PREVIOUS EXP]
is a directory for the previous script. For instance, the entire scripts for training the model on UCF-101 becomes:
python main.py \
--exp first_stage \
--id main \
--pretrain_config configs/autoencoder/base.yaml \
--data UCF101 \
--batch_size 8
python main.py \
--exp first_stage \
--id main_gan \
--pretrain_config configs/autoencoder/base_gan.yaml \
--data UCF101 \
--batch_size 8 \
--first_stage_folder 'results/first_stage_main_UCF101_42/'
You may change the model configs via modifying configs/autoencoder
. Moreover, one needs early-stopping to further train the model with the GAN loss (typically 8k-14k iterations with a batch size of 8).
Diffusion model
python main.py \
--exp ddpm \
--id [EXP_NAME] \
--pretrain_config configs/autoencoder/base.yaml \
--data [DATASET] \
--first_model [AUTOENCODER DIRECTORY]
--diffusion_config configs/latent-diffusion/base.yaml \
--batch_size [BATCH_SIZE]
Here, [EXP_NAME]
is an experiment name you want to specifiy (string), [DATASET]
is either UCF101
or SKY
, and [DIRECTOTY OF PREVIOUS EXP]
is a directory of the autoencoder to be used. For instance, the entire scripts for training the model on UCF-101 becomes:
python main.py \
--exp ddpm \
--id main \
--pretrain_config configs/autoencoder/base.yaml \
--data UCF101 \
--first_model 'results/first_stage_main_gan_UCF101_42/model_last.pth'
--diffusion_config configs/latent-diffusion/base.yaml \
--batch_size 48
4. Evaluation
We will provide checkpoints with the evaluation scripts as soon as possible, once the refactoring is done.
Citation
@inproceedings{yu2023video,
title={Video Probabilistic Diffusion Models in Projected Latent Space},
author={Yu, Sihyun and Sohn, Kihyuk and Kim, Subin and Shin, Jinwoo},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2023}
}
Note
It's possible that this code may not accurately replicate the results outlined in the paper due to potential human errors during the preparation and cleaning of the code for release. If you encounter any difficulties in reproducing our findings, please don't hesitate to inform us. Additionally, we'll make an effort to carry out sanity-check experiments in the near future.
Reference
This code is mainly built upon SiMT, latent-diffusion, and stylegan2-ada-pytorch repositories.
We also used the code from following repositories: StyleGAN-V, VideoGPT, and MDGAN.