Accurate 3D Hand Pose Estimation for Whole-Body 3D Human Mesh Estimation (Hand4Whole codes)
High-resolution video link: here
Introduction
This repo is official PyTorch implementation of Accurate 3D Hand Pose Estimation for Whole-Body 3D Human Mesh Estimation (CVPRW 2022 Oral.). This repo contains whole-body codes. For the body-only, hand-only, and face-only codes, visit here.
Quick demo
- Slightly change
torchgeometry
kernel code following here. - Download the pre-trained Hand4Whole from here.
- Prepare
input.png
and pre-trained snapshot atdemo
folder. - Prepare
human_model_files
folder following belowDirectory
part and place it atcommon/utils/human_model_files
. - Go to any of
demo
folders and editbbox
. - Run
python demo.py --gpu 0
. - If you run this code in ssh environment without display device, do follow:
1、Install oemesa follow https://pyrender.readthedocs.io/en/latest/install/
2、Reinstall the specific pyopengl fork: https://github.com/mmatl/pyopengl
3、Set opengl's backend to egl or osmesa via os.environ["PYOPENGL_PLATFORM"] = "egl"
Directory
Root
The ${ROOT}
is described as below.
${ROOT}
|-- data
|-- demo
|-- main
|-- tool
|-- output
|-- common
| |-- utils
| | |-- human_model_files
| | | |-- smpl
| | | | |-- SMPL_NEUTRAL.pkl
| | | |-- smplx
| | | | |-- MANO_SMPLX_vertex_ids.pkl
| | | | |-- SMPL-X__FLAME_vertex_ids.npy
| | | | |-- SMPLX_NEUTRAL.pkl
| | | | |-- SMPLX_to_J14.pkl
| | | |-- mano
| | | | |-- MANO_LEFT.pkl
| | | | |-- MANO_RIGHT.pkl
| | | |-- flame
| | | | |-- flame_dynamic_embedding.npy
| | | | |-- flame_static_embedding.pkl
| | | | |-- FLAME_NEUTRAL.pkl
data
contains data loading codes and soft links to images and annotations directories.demo
contains demo codes.main
contains high-level codes for training or testing the network.tool
contains pre-processing codes of AGORA and pytorch model editing codes.output
contains log, trained models, visualized outputs, and test result.common
contains kernel codes for Hand4Whole.human_model_files
containssmpl
,smplx
,mano
, andflame
3D model files. Download the files from [smpl] [smplx] [SMPLX_to_J14.pkl] [mano] [flame].
Data
You need to follow directory structure of the data
as below.
${ROOT}
|-- data
| |-- AGORA
| | |-- data
| | | |-- AGORA_train.json
| | | |-- AGORA_validation.json
| | | |-- AGORA_test_bbox.json
| | | |-- 1280x720
| | | |-- 3840x2160
| |-- EHF
| | |-- data
| | | |-- EHF.json
| |-- Human36M
| | |-- images
| | |-- annotations
| |-- MPII
| | |-- data
| | | |-- images
| | | |-- annotations
| |-- MPI_INF_3DHP
| | |-- data
| | | |-- images_1k
| | | |-- MPI-INF-3DHP_1k.json
| | | |-- MPI-INF-3DHP_camera_1k.json
| | | |-- MPI-INF-3DHP_joint_3d.json
| | | |-- MPI-INF-3DHP_SMPL_NeuralAnnot.json
| |-- MSCOCO
| | |-- images
| | | |-- train2017
| | | |-- val2017
| | |-- annotations
| |-- PW3D
| | |-- data
| | | |-- 3DPW_train.json
| | | |-- 3DPW_validation.json
| | | |-- 3DPW_test.json
| | |-- imageFiles
- Download AGORA parsed data [data][parsing codes]
- Download EHF parsed data [data]
- Download Human3.6M parsed data and SMPL-X parameters [data][SMPL-X parameters from NeuralAnnot]
- Download MPII parsed data and SMPL-X parameters [data][SMPL-X parameters from NeuralAnnot]
- Download MPI-INF-3DHP parsed data and SMPL-X parameters [data][SMPL-X parameters from NeuralAnnot]
- Download MSCOCO data and SMPL-X parameters [data][SMPL-X parameters from NeuralAnnot]
- Download 3DPW parsed data [data]
- All annotation files follow MSCOCO format. If you want to add your own dataset, you have to convert it to MSCOCO format.
Output
You need to follow the directory structure of the output
folder as below.
${ROOT}
|-- output
| |-- log
| |-- model_dump
| |-- result
| |-- vis
- Creating
output
folder as soft link form is recommended instead of folder form because it would take large storage capacity. log
folder contains training log file.model_dump
folder contains saved checkpoints for each epoch.result
folder contains final estimation files generated in the testing stage.vis
folder contains visualized results.
Running Hand4Whole
- In the
main/config.py
, you can change datasets to use.
Train
The training consists of three stages.
1st: pre-train Hand4Whole
In the main
folder, run
python train.py --gpu 0-3 --lr 1e-4 --continue
to train Hand4Whole on the GPU 0,1,2,3. --gpu 0,1,2,3
can be used instead of --gpu 0-3
. To train Hand4Whole from the pre-trained 2D human pose estimation network, download this and place it at tool
. Then, run python convert_simple_to_pose2pose.py
, which produces snapshot_0.pth.tar
. Finally, place snapshot_0.pth.tar
to output/model_dump
.
2nd: pre-train hand-only Pose2Pose
Download pre-trained hand-only Pose2Pose from here.
Place the hand-only Pose2Pose to tool/snapshot_12_hand.pth.tar
.
Also, place the pre-trained Hand4Whole of the first stage to tool/snapshot_6_all.pth.tar
.
Then, go to tool
folder and run python merge_hand_to_all.py
.
Place the generated snapshot_0.pth.tar
to output/model_dump
.
Or, you can pre-train hand-only Pose2Pose by yourself. Switch to Pose2Pose branch and train hand-only Pose2Pose on MSCOCO, FreiHAND, InterHand2.6M.
3rd: combine pre-trained Hand4Whole and hand-only Pose2Pose and fine-tune it
Move snapshot_6.pth.tar
of the 1st stage to tool/snapshot_6_all.pth.tar
.
Then, move snapshot_12.pth.tar
of the 2nd stage to tool/snapshot_12_hand.pth.tar
.
Run python merge_hand_to_all.py
at the tool
folder.
Move generated snapshot_0.pth.tar
to output/model_dump
.
In the main
folder, run
python train.py --gpu 0-3 --lr 1e-5 --continue
to train Hand4Whole on the GPU 0,1,2,3. --gpu 0,1,2,3
can be used instead of --gpu 0-3
.
Test
Place trained model at the output/model_dump/
.
In the main
folder, run
python test.py --gpu 0-3 --test_epoch 6
to test Hand4Whole on the GPU 0,1,2,3 with60th epoch trained model. --gpu 0,1,2,3
can be used instead of --gpu 0-3
.
Models
- Download Hand4Whole trained on H36M+MPII+MSCOCO from here.
- Download Hand4Whole fine-tuned on AGORA (without gender classification) from here.
- To fine-tine Hand4Whole on AGORA, move
snapshot_6.pth.tar
, generated after the 3rd training stage, totool
and runpython reset_epoch.py
. Then, move the generatedsnapshot_0.pth.tar
tooutput/model_dump
and runpython train.py --gpu 0-3 --lr 1e-4
after changingtrainset_3d=['AGORA']
,trainset_2d[]
,testset='AGORA
,lr_dec_epoch=[40,60]
, andend_epoch = 70
atconfig.py
.
Results
3D whole-body results
3D body-only and hand-only results
For the 3D body-only and hand-only codes, visit here.
Troubleshoots
RuntimeError: Subtraction, the '-' operator, with a bool tensor is not supported. If you are trying to invert a mask, use the '~' or 'logical_not()' operator instead.
: Go to here
Reference
@InProceedings{Moon_2022_CVPRW_Hand4Whole,
author = {Moon, Gyeongsik and Choi, Hongsuk and Lee, Kyoung Mu},
title = {Accurate 3D Hand Pose Estimation for Whole-Body 3D Human Mesh Estimation},
booktitle = {Computer Vision and Pattern Recognition Workshop (CVPRW)},
year = {2022}
}