Exploiting Temporal Contexts with Strided Transformer for 3D Human Pose Estimation
This is the official implementation of the approach described in the paper:
Wenhao Li, Hong Liu, Runwei Ding, Mengyuan Liu, Pichao Wang, and Wenming Yang. Exploiting Temporal Contexts with Strided Transformer for 3D Human Pose Estimation. IEEE Transactions on Multimedia, 2022.
News
- 03/24/2022: Demo and in-the-wild inference code are released!
- 03/15/2022: Our method has been verified in self-supervised pre-training as a backbone network!
Installation
Our code is tested on Ubuntu 18 with Pytorch 1.7.1 and Python 3.9.
- Install PyTorch 1.7.1 and Torchvision 0.8.2 following the official instructions
pip3 install -r requirements.txt
Dataset setup
Please download the dataset from Human3.6M website and refer to VideoPose3D to set up the Human3.6M dataset ('./dataset' directory). Or you can download the processed data from here.
${POSE_ROOT}/
|-- dataset
| |-- data_3d_h36m.npz
| |-- data_2d_h36m_gt.npz
| |-- data_2d_h36m_cpn_ft_h36m_dbb.npz
Download pretrained model
The pretrained model can be found in here, please download it and put in the './checkpoint/pretrained' directory.
Test the model
To test on pretrained model on Human3.6M:
python main.py --test --refine --reload --refine_reload --previous_dir 'checkpoint/pretrained'
Train the model
To train on Human3.6M:
python main.py
After training for several epochs, add refine module:
python main.py --refine --lr 1e-5 --reload --previous_dir [your model saved path]
Demo
First, you need to download YOLOv3 and HRNet pretrained models here and put it in the './demo/lib/checkpoint' directory. Then, you need to put your in-the-wild videos in the './demo/video/' directory.
Run the command below:
python demo/vis.py --video sample_video.mp4
Sample demo output:
Citation
If you find our work useful in your research, please consider citing:
@article{li2023exploiting,
title={Exploiting Temporal Contexts with Strided Transformer for 3D Human Pose Estimation},
author={Li, Wenhao and Liu, Hong and Ding, Runwei and Liu, Mengyuan and Wang, Pichao and Yang, Wenming},
journal={IEEE Transactions on Multimedia},
year={2023},
volume={25},
pages={1282-1293},
}
Acknowledgement
Our code is built on top of ST-GCN and is extended from the following repositories. We thank the authors for releasing the codes.
Licence
This project is licensed under the terms of the MIT license.