MoCoGAN-HD
Project | OpenReview | arXiv | Talk | Slides
(AFHQ, VoxCeleb)
Pytorch implementation of our method for high-resolution (e.g. 1024x1024) and cross-domain video synthesis.
A Good Image Generator Is What You Need for High-Resolution Video Synthesis
Yu Tian1, Jian Ren2, Menglei Chai2, Kyle Olszewski2, Xi Peng3, Dimitris N. Metaxas1, Sergey Tulyakov2
1Rutgers Univeristy, 2Snap Inc., 3University of Delaware
In ICLR 2021, Spotlight.
Pre-trained Image Generator & Video Datasets
In-domain Video Synthesis
UCF-101: image generator, video data, motion generator
FaceForensics: image generator, video data, motion generator
Sky-Timelapse: image generator, video data, motion generator
Cross-domain Video Synthesis
(FFHQ, VoxCeleb): FFHQ image generator, VoxCeleb, motion generator
(AFHQ, VoxCeleb): AFHQ image generator, VoxCeleb, motion generator
(Anime, VoxCeleb): Anime image generator, VoxCeleb, motion generator
(FFHQ-1024, VoxCeleb): FFHQ-1024 image generator, VoxCeleb, motion generator
(LSUN-Church, TLVDB): LSUN-Church image generator, TLVDB
Calculated pca stats are saved here.
Training
Organise the video dataset as follows:
Video dataset
|-- video1
|-- img_0000.png
|-- img_0001.png
|-- img_0002.png
|-- ...
|-- video2
|-- img_0000.png
|-- img_0001.png
|-- img_0002.png
|-- ...
|-- video3
|-- img_0000.png
|-- img_0001.png
|-- img_0002.png
|-- ...
|-- ...
In-domain Video Synthesis
UCF-101
Collect the PCA components from a pre-trained image generator.
python get_stats_pca.py --batchSize 4000 \
--save_pca_path pca_stats/ucf_101 \
--pca_iterations 250 \
--latent_dimension 512 \
--img_g_weights /path/to/ucf_101_image_generator \
--style_gan_size 256 \
--gpu 0
Train the model
python -W ignore train.py --name ucf_101 \
--time_step 2 \
--lr 0.0001 \
--save_pca_path pca_stats/ucf_101 \
--latent_dimension 512 \
--dataroot /path/to/ucf_101 \
--checkpoints_dir checkpoints/ucf_101 \
--img_g_weights /path/to/ucf_101_image_generator \
--multiprocessing_distributed --world_size 1 --rank 0 \
--batchSize 16 \
--workers 8 \
--style_gan_size 256 \
--total_epoch 100 \
Inference
python -W ignore evaluate.py \
--save_pca_path pca_stats/ucf_101 \
--latent_dimension 512 \
--style_gan_size 256 \
--img_g_weights /path/to/ucf_101_image_generator \
--load_pretrain_path /path/to/checkpoints \
--load_pretrain_epoch the_epoch_for_testing (should >= 0) \
--results results/ucf_101 \
--num_test_videos 10 \
FaceForensics
Collect the PCA components from a pre-trained image generator.
sh script/faceforensics/run_get_stats_pca.sh
Train the model
sh script/faceforensics/run_train.sh
Inference
sh script/faceforensics/run_evaluate.sh
Sky-Timelapse
Collect the PCA components from a pre-trained image generator.
sh script/sky_timelapse/run_get_stats_pca.sh
Train the model
sh script/sky_timelapse/run_train.sh
Inference
sh script/sky_timelapse/run_evaluate.sh
Cross-domain Video Synthesis
(FFHQ, VoxCeleb)
Collect the PCA components from a pre-trained image generator.
python get_stats_pca.py --batchSize 4000 \
--save_pca_path pca_stats/ffhq_256 \
--pca_iterations 250 \
--latent_dimension 512 \
--img_g_weights /path/to/ffhq_image_generator \
--style_gan_size 256 \
--gpu 0
Train the model
python -W ignore train.py --name ffhq_256-voxel \
--time_step 2 \
--lr 0.0001 \
--save_pca_path pca_stats/ffhq_256 \
--latent_dimension 512 \
--dataroot /path/to/voxel_dataset \
--checkpoints_dir checkpoints \
--img_g_weights /path/to/ffhq_image_generator \
--multiprocessing_distributed --world_size 1 --rank 0 \
--batchSize 16 \
--workers 8 \
--style_gan_size 256 \
--total_epoch 25 \
--cross_domain \
Inference
python -W ignore evaluate.py \
--save_pca_path pca_stats/ffhq_256 \
--latent_dimension 512 \
--style_gan_size 256 \
--img_g_weights /path/to/ffhq_image_generator \
--load_pretrain_path /path/to/checkpoints \
--load_pretrain_epoch the_epoch_for_testing (should >= 0) \
--results results/ffhq_256 \
--num_test_videos 10 \
(FFHQ-1024, VoxCeleb)
Collect the PCA components from a pre-trained image generator.
sh script/ffhq-vox/run_get_stats_pca_1024.sh
Train the model
sh script/ffhq-vox/run_train_1024.sh
Inference
sh script/ffhq-vox/run_evaluate_1024.sh
(AFHQ, VoxCeleb)
Collect the PCA components from a pre-trained image generator.
sh script/afhq-vox/run_get_stats_pca.sh
Train the model
sh script/afhq-vox/run_train.sh
Inference
sh script/afhq-vox/run_evaluate.sh
(Anime, VoxCeleb)
Collect the PCA components from a pre-trained image generator.
sh script/anime-vox/run_get_stats_pca.sh
Train the model
sh script/anime-vox/run_train.sh
Inference
sh script/anime-vox/run_evaluate.sh
(LSUN-Church, TLVDB)
Collect the PCA components from a pre-trained image generator.
sh script/lsun_church-tlvdb/run_get_stats_pca.sh
Train the model
sh script/lsun_church-tlvdb/run_train.sh
Inference
sh script/lsun_church-tlvdb/run_evaluate.sh
Fine-tuning
If you wish to resume interupted training or fine-tune a pre-trained model, run (use UCF-101 as an example):
python -W ignore train.py --name ucf_101 \
--time_step 2 \
--lr 0.0001 \
--save_pca_path pca_stats/ucf_101 \
--latent_dimension 512 \
--dataroot /path/to/ucf_101 \
--checkpoints_dir checkpoints \
--img_g_weights /path/to/ucf_101_image_generator \
--multiprocessing_distributed --world_size 1 --rank 0 \
--batchSize 16 \
--workers 8 \
--style_gan_size 256 \
--total_epoch 100 \
--load_pretrain_path /path/to/checkpoints \
--load_pretrain_epoch 0
Training Control With Options
--w_residual
controls the step of motion residual, default value is 0.2, we recommand <= 0.5
--n_pca
# of PCA basis, used in the motion residual calculation, default value is 384 (out of 512 dim of StyleGAN2 w space), we recommand >= 256
--q_len
size of queue to save logits used in constrastive loss, default value is 4,096
--video_frame_size
spatial size of video frames for training, all synthesized video clips will be down-sampled to this size before feeding to the video discriminator, default value is 128, larger size may lead to better motion modeling
--cross_domain
activate for cross-domain video synthesis, default value is False
--w_match
weight for feature matching loss, default value is 1.0, large value improves content matching
Long Sequence Generation
LSTM Unrolling
In inference, you can generate long sequence by LSTM unrolling with --n_frames_G
python -W ignore evaluate.py \
--save_pca_path pca_stats/ffhq_256 \
--latent_dimension 512 \
--style_gan_size 256 \
--img_g_weights /path/to/ffhq_image_generator \
--load_pretrain_path /path/to/checkpoints \
--load_pretrain_epoch 0 \
--n_frames_G 32
Interpolation
In inference, you can generate long sequence by interpolation with --interpolation
python -W ignore evaluate.py \
--save_pca_path pca_stats/ffhq_256 \
--latent_dimension 512 \
--style_gan_size 256 \
--img_g_weights /path/to/ffhq_image_generator \
--load_pretrain_path /path/to/checkpoints \
--load_pretrain_epoch 0 \
--interpolation
Examples of Generated Videos
UCF-101
FaceForensics
Sky Timelapse
(FFHQ, VoxCeleb)
(FFHQ-1024, VoxCeleb)
(Anime, VoxCeleb)
(LSUN-Church, TLVDB)
Citation
If you use the code for your work, please cite our paper.
@inproceedings{
tian2021a,
title={A Good Image Generator Is What You Need for High-Resolution Video Synthesis},
author={Yu Tian and Jian Ren and Menglei Chai and Kyle Olszewski and Xi Peng and Dimitris N. Metaxas and Sergey Tulyakov},
booktitle={International Conference on Learning Representations},
year={2021},
url={https://openreview.net/forum?id=6puCSjH3hwA}
}
Acknowledgments
This code borrows StyleGAN2 Image Generator, BigGAN Discriminator, PatchGAN Discriminator.