PoseGPT: Quantization-based 3D Human Motion Generation and Forecasting,
Thomas Lucas*, Fabien Baradel*, Philippe Weinzaepfel, GrΓ©gory Rogez
European Conference on Computer Vision (ECCV), 2022
Pytorch training and evaluation code for PoseGPT on BABEL.
Our code is running using python3.7 and requires the following packages:
- pytorch-1.7.1+cu110
- pytorch3d-0.3.0
- torchvision
- opencv
- PIL
- numpy
- smplx
- einops
- roma
We do not provide support for installation.
You should have AMASS files and BABEL annotations in two seperate repository following this structure:
<babel_dir>
|--- train.json
|--- val.json
|--- test.json
<amass_dir>
|--- smplx
|--- ACCAD # and then it follows the standard AMASS data structure
|--- ...
|--- SSM
|--- smplh
|--- ACCAD
|--- ...
|--- SSM
Then you can preprocess the data by running the following command; and it will create files from the root directories <mocap_dir>
babel_dir='[link_to_babel_dir]'
amass_dir='[link_to_amass_dir]'
mocap_dir='[link_to_preprocessed_data_dir]'
list_split=( 'train' 'test' 'val' )
list_type=( 'smplx' 'smplh' )
for split in "${list_split[@]}"
do
for type in "${list_type[@]}"
do
echo ${type}
echo ${split}
python dataset/preprocessing/babel.py "prepare_annots_trimmed(type='${type}',split='${split}',mocap_dir='${mocap_dir}',babel_dir='${babel_dir}',amass_dir='${amass_dir}')"
done
done
Once the preprocessing is done you should have a data structure such that:
<mocap_dir>
|--- <type> # smplh or smplx
|--- babel_trimmed
|--- <split>_60 # for train, val and test
|--- seqLen64_fps30_overlap0_minSeqLen16
|--- pose.pkl
|--- action.pt
Finally, create simlinks named './babel', './amass', './preprocessed_data' at the root of the git folder (alternatively you can modify the default path arguments in babel.py)
For computing the FID we first need to train a classifier on BABEL:
python3 classify.py --name classifier_ -iter 1000 --classif_method TR -lr 4e-5 --use_bm 0
Different variants exists for the auto-encoder, using the following command you can train the one you want
- Train auto_encoder in debug setting (e.g to debug with a different VQVAE architecture)
python3 auto_encode.py --name auto_encoder_debug --n_codebook 2 --n_e 512 --e_dim 256 --loss l2 --model CausalVQVAE --dropout 0 --freq_vert 2 --learning_rate 5e-5 --alpha_vert 100. --ab1 0.95 --tprop_vert 0.1 --prefetch_factor 4 --alpha_codebook 1. --hid_dim 384 --alpha_codebook 0.25 --train_batch_size 64 --debug 1 --dummy_data 1
- Train an offline (i.e all timesteps generated simultaneously), transformer based VQ-VAE
python3 auto_encode.py --name auto_encoder --n_codebook 2 --n_e 512 --e_dim 256 --loss l2 --model CausalVQVAE --dropout 0 --freq_vert 2 --learning_rate 5e-5 --alpha_vert 100. --ab1 0.95 --tprop_vert 0.1 --prefetch_factor 4 --alpha_codebook 1. --hid_dim 384 --alpha_codebook 0.25 --train_batch_size 64
- Train a transformer based VQ-VAE, with causality in the encoder but not in the decoder (can condition on past observations):
python3 auto_encode.py --name auto_encoder --n_codebook 2 --n_e 512 --e_dim 256 --loss l2 --model CausalVQVAE --dropout 0 --freq_vert 2 --learning_rate 5e-5 --alpha_vert 100. --ab1 0.95 --tprop_vert 0.1 --prefetch_factor 4 --alpha_codebook 1. --hid_dim 384 --alpha_codebook 0.25 --train_batch_size 64
- Train a transformer based VQ-VAE autoencoder, with causality in the encoder and in the decoder (can predict future given past on the fly):
python3 auto_encode.py --name auto_encoder --n_codebook 2 --n_e 512 --e_dim 256 --loss l2 --model CausalVQVAE --dropout 0 --freq_vert 2 --learning_rate 5e-5 --alpha_vert 100. --ab1 0.95 --tprop_vert 0.1 --prefetch_factor 4 --alpha_codebook 1. --hid_dim 384 --alpha_codebook 0.25 --train_batch_size 64
Once the auto-encoder is trained, it is available to train the generator.
- Train a generator (using a previously trained autoencoder)
python3 train_gpt.py --name generator --n_codebook 2 --n_e 512 --e_dim 256 --vq_model CausalVQVAE --hid_dim 384 --dropout 0 --vq_ckpt ./logs/auto_encoder_debug/checkpoints/best_val.pt --model poseGPT --n_visu_to_save 2 --class_conditional 1 --gpt_blocksize 512 --gpt_nlayer 8 --gpt_nhead 4 --gpt_embd_pdrop 0.2 --gpt_resid_pdrop 0.2 --gpt_attn_pdrop 0.2 --seq_len 64 --gen_eos 0 --eval_fid 0 --eos_force 1 --seqlen_conditional 1 --embed_every_step 1 --concat_emb 1
- Train a generator in debug mode (using a previously trained autoencoder)
python3 train_gpt.py --name generator --n_codebook 2 --n_e 512 --e_dim 256 --vq_model CausalVQVAE --hid_dim 384 --dropout 0 --vq_ckpt ./logs/auto_encoder_debug/checkpoints/best_val.pt --model poseGPT --n_visu_to_save 2 --class_conditional 1 --gpt_blocksize 512 --gpt_nlayer 8 --gpt_nhead 4 --gpt_embd_pdrop 0.2 --gpt_resid_pdrop 0.2 --gpt_attn_pdrop 0.2 --seq_len 64 --gen_eos 0 --eval_fid 0 --eos_force 1 --seqlen_conditional 1 --embed_every_step 1 --concat_emb 1 --dummy_data 1 --debug 1
You will soon be able to download our pretrained checkpoint here, if you do not want to train the model by yourself.
wget <todo>
If you find our work useful please cite our paper:
@inproceedings{posegpt,
title={PoseGPT: Quantization-based 3D Human Motion Generation and Forecasting},
author={Lucas*, Thomas and Baradel*, Fabien and Weinzaepfel, Philippe and Rogez, Gr\'egory},
booktitle={European Conference on Computer Vision ({ECCV})},
year={2022}
}
PoseGPT is distributed under the CC BY-NC-SA 4.0 License. See LICENSE for more information.