BodyNet: Volumetric Inference of 3D Human Body Shapes
GΓΌl Varol, Duygu Ceylan, Bryan Russell, Jimei Yang, Ersin Yumer, Ivan Laptev and Cordelia Schmid, BodyNet: Volumetric Inference of 3D Human Body Shapes, ECCV 2018.
Contents
1. Preparation
1.1. Requirements
- Datasets
- Download SURREAL and/or Unite the People (UP) dataset(s)
- Training
- Install Torch with cuDNN support.
- Install matio by
luarocks install matio
- Install OpenCV-Torch by
luarocks install cv
- Tested on Linux with cuda v8 and cudNN v5.1.
- Pre-processing and fitting python scripts
- Python 2 environment with the following installed:
- SMPL related
- Download SMPL for python and set
SMPL_PATH
- Fix the naming:
mv basicmodel_m_lbs_10_207_0_v1.0.0 basicModel_m_lbs_10_207_0_v1.0.0
- Do the following changes in the code
smpl_webuser/verts.py
:
- v_template, J, weights, kintree_table, bs_style, f, + v_template, J_regressor, weights, kintree_table, bs_style, f, - if sp.issparse(J): - regressor = J - J_tmpx = MatVecMult(regressor, v_shaped[:,0]) - J_tmpy = MatVecMult(regressor, v_shaped[:,1]) - J_tmpz = MatVecMult(regressor, v_shaped[:,2]) + if sp.issparse(J_regressor): + J_tmpx = MatVecMult(J_regressor, v_shaped[:,0]) + J_tmpy = MatVecMult(J_regressor, v_shaped[:,1]) + J_tmpz = MatVecMult(J_regressor, v_shaped[:,2]) + assert(ischumpy(J_regressor)) - assert(ischumpy(J)) + result.J_regressor = J_regressor
- Fix the naming:
- Download neutral SMPL model and place under
models
folder of SMPL - Download SMPLify and set
SMPLIFY_PATH
- Download SMPL for python and set
- Voxelization related
- Download binvox executable and set
BINVOX_PATH
- Download binvox python package and set
BINVOX_PYTHON_PATH
- Download binvox executable and set
1.2. Pre-processing for training
SURREAL voxelization
Loop over the dataset and run preprocess_surreal_voxelize.py
for each _info.mat
file by setting it with the --input
option (for foreground and/or part voxels with the --parts
option). The surface voxels are filled with imfill
with the preprocess_surreal_fillvoxels.m
script, but you could do it in python (e.g. ndimage.binary_fill_holes(binvoxModel.data)
). Sample preprocessed data is included in preprocessing/sample_data/surreal
.
Preparing UP data
Loop over the dataset by running preprocess_up_voxelize.py
to voxelize and to re-organize the dataset. Fill the voxels with preprocess_up_fillvoxels.m
. Preprocess the segmentation maps with preprocess_up_segm.m
. Sample preprocessed data is included in preprocessing/sample_data/up
.
1.3. Setup paths for training
Place the data under ~/datasets/SURREAL
and ~/datasets/UP
or change the opt.dataRoot
in opts.lua. The outputs will be written to ~/cnn_saves/<datasetname>/<experiment>
, you can change the opt.logRoot
to change the cnn_saves
location.
1.4. Download pre-trained models
We provide several pre-trained models used in the paper bodynet.tar.gz (980MB). The content is explained in the training section. Extract the .t7
files and place them under models/t7
directory.
# Trained on SURREAL
model_segm_cmu.t7
model_joints3D_cmu.t7
model_voxels_cmu.t7
model_voxels_FVSV_cmu.t7
model_partvoxels_FVSV_cmu.t7
model_bodynet_cmu.t7
# Trained on UP
model_segm_UP.t7
model_joints3D_UP.t7
model_voxels_FVSV_UP.t7
model_voxels_FVSV_UP_manualsegm.t7
model_bodynet_UP.t7
# Trained on MPII
model_joints2D.t7
2. Training
There are sample scripts under training/exp/backup
directory. These were created automatically using the training/exp/run.sh
script. For example the following run.sh
script:
source create_exp.sh -h
input="rgb"
supervision="segm15joints2Djoints3Dvoxels"
inputtype="gt"
extra_args="_FVSV"
running_mode="train"
#modelno=1
dataset="cmu"
create_cmd
cmd="${return_str} \\
-batchSize 4 \\
-modelVoxels models/t7/model_voxels_FVSV_cmu.t7 \\
-proj silhFVSV \\
"
run_cmd
generates and runs the following script:
cd ..
qlua main.lua \
-dirName segm15joints2Djoints3Dvoxels/rgb/gt_FVSV \
-input rgb \
-supervision segm15joints2Djoints3Dvoxels \
-datasetname cmu \
-batchSize 4 \
-modelVoxels models/t7/model_voxels_FVSV_cmu.t7 \
-proj silhFVSV \
This trains the final version of the model described in the paper, i.e., training end-to-end network with pre-trained subnetworks with multi-task losses and multi-view re-projection losses. If you manage to run this on the SURREAL dataset, the standard output should resemble the following:
Epoch: [1][1/2000] Time: 66.197, Err: 0.170 PCK: 87.50, PixelAcc: 68.36, IOU: 55.03, RMSE: 0.00, PE3Dvol: 33.39, IOUvox: 66.56, IOUprojFV: 92.89, IOUprojSV: 75.56, IOUp
artvox: 0.00, LR: 1e-03, DataLoadingTime 192.286
Epoch: [1][2/2000] Time: 1.240, Err: 0.472 PCK: 87.50, PixelAcc: 21.38, IOU: 18.79, RMSE: 0.00, PE3Dvol: 44.63, IOUvox: 44.89, IOUprojFV: 73.05, IOUprojSV: 65.19, IOUp
artvox: 0.00, LR: 1e-03, DataLoadingTime 0.237
Epoch: [1][3/2000] Time: 1.040, Err: 0.318 PCK: 65.00, PixelAcc: 49.58, IOU: 35.99, RMSE: 0.00, PE3Dvol: 52.92, IOUvox: 57.04, IOUprojFV: 86.97, IOUprojSV: 66.29, IOUp
artvox: 0.00, LR: 1e-03, DataLoadingTime 0.570
Epoch: [1][4/2000] Time: 1.678, Err: 0.771 PCK: 50.00, PixelAcc: 42.95, IOU: 36.04, RMSE: 0.00, PE3Dvol: 99.04, IOUvox: 52.74, IOUprojFV: 83.87, IOUprojSV: 64.07, IOUp
artvox: 0.00, LR: 1e-03, DataLoadingTime 0.101
2D pose (PCK), 2D body part segmentation (PixelAcc
, IOU
), depth (RMSE
), 3D pose (PE3Dvol
), voxel prediction (IOUvox
), side-view and front-view re-projection (IOUprojFV
, IOUprojSV
) performances are reported at each iteration.
The final network is a result of a multi-stage training.
- SubNet1 -
model_segm_cmu.t7
. RGB -> Segm- obtained from here and the first two stacks are extracted
- SubNet2 -
model_joints2D.t7
. RGB -> Joints2D- trained on MPII with 8 stacks, and the first two stacks are extracted
- SubNet3 -
model_joints3D_cmu.t7
. RGB + Segm + Joints2D -> Joints3D- trained from scratch with 2 stacks using predicted segmentation (SubNet1) and 2D pose (SubNet2)
- SubNet4 -
model_voxels_cmu.t7
. RGB + Segm + Joints2D + Joints3D -> Voxels- trained from scratch with 2 stacks using predicted segmentation (SubNet1), 2D pose (SubNet2), and 3D pose (SubNet3)
- SubNet5 -
model_voxels_FVSV_cmu.t7
. RGB + Segm + Joints2D + Joints3D -> Voxels + FV + SV- pre-trained from SubNet4 with the additional losses on re-projection
- BodyNet -
model_bodynet_cmu.t7
. RGB -> Segm + Joints2D + Joints3D + Voxels + FV + SV- a combination of SubNet1, SubNet2, SubNet3, SubNet4, and SubNet5
- fine-tuned end-to-end with multi-task losses
Note that the performance with 8 stacks is generally better, but we preferred to reduce the complexity with the cost of a little performance.
Above recipe is used for the SURREAL dataset. For the UP dataset, we first fine-tuned the SubNet1 model_segm_UP.t7
(SubNet1_UP). Then, we fine-tuned SubNet3 model_joints3D_UP.t7
(SubNet3_UP) using SubNet1_UP and SubNet2. Finally, we fine-tuned SubNet5 model_voxels_FVSV_UP.t7
(SubNet5_UP) using SubNet1_UP, SubNet2, and SubNet3_UP. All these are fine-tuned end-to-end to obtain model_bodynet_UP.t7
. The model used in the paper for experimenting with the manual segmentations is also provided model_voxels_FVSV_UP_manualsegm.t7
.
Part Voxels
We use the script models/init_partvoxels.lua
to copy the last layer weights 7 times (6 body parts + 1 background) to initialize the part voxels model (models/t7/init_partvoxels.t7
). After training this model without re-projection losses, we fine-tune it with re-projection loss. model_partvoxels_cmu.t7
is the best model obtained. With end-to-end fine-tuning, we had divergence problems and did not put too much effort to make it work. Note that this model is preliminary and needs improvement.
Misc
A few functionalities of the code are not used in the paper; however, still provided. These include training 3D pose and voxels networks using ground truth (GT) segmentation/2D pose/3D pose inputs, as well as mixing the predicted and GT inputs at each batch. This is achieved by setting the mix
option to true. The results of only using predicted inputs are often comparable to using a mix, therefore we always used only predictions. Predictions are passed as input using the applyHG
option, which is not very efficient.
3. Testing
Use the demo script to apply the provided models on sample images.
qlua demo/demo.lua
You can also use demo/demo.m
Matlab script to produce visualizations.
4. Fitting SMPL model
Fitting scripts for SURREAL (fitting/fit_surreal.py
) and UP (fitting/fit_up.py
) datasets are provided with sample experiment outputs. The scripts use the optimization functions from tools/smpl_utils.py
.
Citation
If you use this code, please cite the following:
@INPROCEEDINGS{varol18_bodynet,
title = {{BodyNet}: Volumetric Inference of {3D} Human Body Shapes},
author = {Varol, G{\"u}l and Ceylan, Duygu and Russell, Bryan and Yang, Jimei and Yumer, Ersin and Laptev, Ivan and Schmid, Cordelia},
booktitle = {ECCV},
year = {2018}
}
Acknowledgements
The training code is an extension of the SURREAL training code which is largely built on the ImageNet training example https://github.com/soumith/imagenet-multiGPU.torch by Soumith Chintala, and Stacked Hourglass Networks by Alejandro Newell.
The fitting code is an extension of the SMPLify code.