SMPLer-X: Scaling Up Expressive Human Pose and Shape Estimation
Useful links
News
- [2023-10-23] Support visualization through SMPL-X mesh overlay and add inference docker.
- [2023-10-02] arXiv preprint is online!
- [2023-09-28] Homepage and Video are online!
- [2023-07-19] Pretrained models are released.
- [2023-06-15] Training and testing code is released.
Gallery
Install
conda create -n smplerx python=3.8 -y
conda activate smplerx
conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 cudatoolkit=11.3 -c pytorch -y
pip install mmcv-full==1.7.1 -f https://download.openmmlab.com/mmcv/dist/cu113/torch1.12.0/index.html
pip install -r requirements.txt
# install mmpose
cd main/transformer_utils
pip install -v -e .
cd ../..
Docker Support (Early Stage)
docker pull wcwcw/smplerx_inference:v0.2
docker run --gpus all -v <vid_input_folder>:/smplerx_inference/vid_input \
-v <vid_output_folder>:/smplerx_inference/vid_output \
wcwcw/smplerx_inference:v0.2 --vid <video_name>.mp4
# Currently any customization need to be applied to /smplerx_inference/smplerx/inference_docker.py
- We recently developed a docker for inference at docker hub.
- This docker image uses SMPLer-X-H32 as inference baseline and was tested at RTX3090 & WSL2 (Ubuntu 20.04).
Pretrained Models
Model | Backbone | #Datasets | #Inst. | #Params | MPE | Download | FPS |
---|---|---|---|---|---|---|---|
SMPLer-X-S32 | ViT-S | 32 | 4.5M | 32M | 82.6 | model | 36.17 |
SMPLer-X-B32 | ViT-B | 32 | 4.5M | 103M | 74.3 | model | 33.09 |
SMPLer-X-L32 | ViT-L | 32 | 4.5M | 327M | 66.2 | model | 24.44 |
SMPLer-X-H32 | ViT-H | 32 | 4.5M | 662M | 63.0 | model | 17.47 |
- MPE (Mean Primary Error): the average of the primary errors on five benchmarks (AGORA, EgoBody, UBody, 3DPW, and EHF)
- FPS (Frames Per Second): the average inference speed on a single Tesla V100 GPU, batch size = 1
Preparation
- download all datasets
- process all datasets into HumanData format, except the following:
- AGORA, MSCOCO, MPII, Human3.6M, UBody.
- follow OSX in preparing these 5 datasets.
- follow OSX in preparing pretrained ViTPose models. Download the ViTPose pretrained weights for ViT-small and ViT-huge from here.
- download SMPL-X and SMPL body models.
- download mmdet pretrained model and config for inference.
The file structure should be like:
SMPLer-X/
βββ common/
β βββ utils/
β βββ human_model_files/ # body model
β βββ smpl/
β β βββSMPL_NEUTRAL.pkl
β β βββSMPL_MALE.pkl
β β βββSMPL_FEMALE.pkl
β βββ smplx/
β βββMANO_SMPLX_vertex_ids.pkl
β βββSMPL-X__FLAME_vertex_ids.npy
β βββSMPLX_NEUTRAL.pkl
β βββSMPLX_to_J14.pkl
β βββSMPLX_NEUTRAL.npz
β βββSMPLX_MALE.npz
β βββSMPLX_FEMALE.npz
βββ data/
βββ main/
βββ demo/
β βββ videos/
β βββ images/
β βββ results/
βββ pretrained_models/ # pretrained ViT-Pose, SMPLer_X and mmdet models
β βββ mmdet/
β β βββfaster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth
β β βββmmdet_faster_rcnn_r50_fpn_coco.py
β βββ smpler_x_s32.pth.tar
β βββ smpler_x_b32.pth.tar
β βββ smpler_x_l32.pth.tar
β βββ smpler_x_h32.pth.tar
β βββ vitpose_small.pth
β βββ vitpose_base.pth
β βββ vitpose_large.pth
β βββ vitpose_huge.pth
βββ dataset/
βββ AGORA/
βββ ARCTIC/
βββ BEDLAM/
βββ Behave/
βββ CHI3D/
βββ CrowdPose/
βββ EgoBody/
βββ EHF/
βββ FIT3D/
βββ GTA_Human2/
βββ Human36M/
βββ HumanSC3D/
βββ InstaVariety/
βββ LSPET/
βββ MPII/
βββ MPI_INF_3DHP/
βββ MSCOCO/
βββ MTP/
βββ MuCo/
βββ OCHuman/
βββ PoseTrack/
βββ PROX/
βββ PW3D/
βββ RenBody/
βββ RICH/
βββ SPEC/
βββ SSP3D/
βββ SynBody/
βββ Talkshow/
βββ UBody/
βββ UP3D/
βββ preprocessed_datasets/ # HumanData files
Inference
- Place the video for inference under
SMPLer-X/demo/videos
- Prepare the pretrained models to be used for inference under
SMPLer-X/pretrained_models
- Prepare the mmdet pretrained model and config under
SMPLer-X/pretrained_models
- Inference output will be saved in
SMPLer-X/demo/results
cd main
sh slurm_inference.sh {VIDEO_FILE} {FORMAT} {FPS} {PRETRAINED_CKPT}
# For inferencing test_video.mp4 (24FPS) with smpler_x_h32
sh slurm_inference.sh test_video mp4 24 smpler_x_h32
2D Smplx Overlay
We provide a lightweight visualization script for mesh overlay based on pyrender.
- Use ffmpeg to split video into images
- The visualization script takes inference results (see above) as the input.
ffmpeg -i {VIDEO_FILE} -f image2 -vf fps=30 \
{SMPLERX INFERENCE DIR}/{VIDEO NAME (no extension)}/orig_img/%06d.jpg \
-hide_banner -loglevel error
cd main && python render.py \
--data_path {SMPLERX INFERENCE DIR} --seq {VIDEO NAME} \
--image_path {SMPLERX INFERENCE DIR}/{VIDEO NAME} \
--render_biggest_person False
Training
cd main
sh slurm_train.sh {JOB_NAME} {NUM_GPU} {CONFIG_FILE}
# For training SMPLer-X-H32 with 16 GPUS
sh slurm_train.sh smpler_x_h32 16 config_smpler_x_h32.py
- CONFIG_FILE is the file name under
SMPLer-X/main/config
- Logs and checkpoints will be saved to
SMPLer-X/output/train_{JOB_NAME}_{DATE_TIME}
Testing
# To eval the model ../output/{TRAIN_OUTPUT_DIR}/model_dump/snapshot_{CKPT_ID}.pth.tar
# with confing ../output/{TRAIN_OUTPUT_DIR}/code/config_base.py
cd main
sh slurm_test.sh {JOB_NAME} {NUM_GPU} {TRAIN_OUTPUT_DIR} {CKPT_ID}
- NUM_GPU = 1 is recommended for testing
- Logs and results will be saved to
SMPLer-X/output/test_{JOB_NAME}_ep{CKPT_ID}_{TEST_DATSET}
FAQ
-
RuntimeError: Subtraction, the '-' operator, with a bool tensor is not supported. If you are trying to invert a mask, use the '~' or 'logical_not()' operator instead.
Follow this post and modify
torchgeometry
-
KeyError: 'SinePositionalEncoding is already registered in position encoding'
or any other similar KeyErrors due to duplicate module registration.Manually add
force=True
to respective module registration undermain/transformer_utils/mmpose/models/utils
, e.g.@POSITIONAL_ENCODING.register_module(force=True)
in this file -
How do I animate my virtual characters with SMPLer-X output (like that in the demo video)?
- We are working on that, please stay tuned! Currently, this repo supports SMPL-X estimation and a simple visualization (overlay of SMPL-X vertices).