MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation [CVPR 2022]
MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation,
Wenhao Li, Hong Liu, Hao Tang, Pichao Wang, Luc Van Gool,
In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022
Updates
- 06/27/2022: The pretrained models of 27, 81, 243, and 351 frames are all released!
- 03/24/2022: Demo and in-the-wild inference code are released!
Installation
- Create a conda environment:
conda create -n mhformer python=3.9
- Install PyTorch 1.7.1 and Torchvision 0.8.2 following the official instructions
pip3 install -r requirements.txt
Dataset setup
Please download the dataset from Human3.6M website and refer to VideoPose3D to set up the Human3.6M dataset ('./dataset' directory). Or you can download the processed data from here.
${POSE_ROOT}/
|-- dataset
| |-- data_3d_h36m.npz
| |-- data_2d_h36m_gt.npz
| |-- data_2d_h36m_cpn_ft_h36m_dbb.npz
Download pretrained model
The pretrained model can be found in here, please download it and put it in the './checkpoint/pretrained' directory.
Test the model
To test on a 351-frames pretrained model on Human3.6M:
python main.py --test --previous_dir 'checkpoint/pretrained/351' --frames 351
Here, we compare our MHFormer with recent state-of-the-art methods on Human3.6M dataset. Evaluation metric is Mean Per Joint Position Error (MPJPE) in mm​.
Models | MPJPE |
---|---|
VideoPose3D | 46.8 |
PoseFormer | 44.3 |
MHFormer | 43.0 |
Train the model
To train a 351-frames model on Human3.6M:
python main.py --frames 351 --batch_size 128
To train a 81-frames model on Human3.6M:
python main.py --frames 81 --batch_size 256
Demo
First, you need to download YOLOv3 and HRNet pretrained models here and put it in the './demo/lib/checkpoint' directory. Then, you need to put your in-the-wild videos in the './demo/video' directory.
Run the command below:
python demo/vis.py --video sample_video.mp4
Sample demo output:
Citation
If you find our work useful in your research, please consider citing:
@inproceedings{li2022mhformer,
title={MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation},
author={Li, Wenhao and Liu, Hong and Tang, Hao and Wang, Pichao and Van Gool, Luc},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
pages={13147-13156},
year={2022}
}
@article{li2023multi,
title={Multi-Hypothesis Representation Learning for Transformer-Based 3D Human Pose Estimation},
author={Li, Wenhao and Liu, Hong and Tang, Hao and Wang, Pichao},
journal={Pattern Recognition},
pages={109631},
year={2023},
}
Acknowledgement
Our code is extended from the following repositories. We thank the authors for releasing the codes.
Licence
This project is licensed under the terms of the MIT license.