• Stars
    star
    110
  • Rank 316,770 (Top 7 %)
  • Language
    Python
  • License
    MIT License
  • Created over 2 years ago
  • Updated over 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

[CVPR 2022 Oral] Versatile Multi-Modal Pre-Training for Human-Centric Perception

Versatile Multi-Modal Pre-Training for
Human-Centric Perception

Fangzhou Hong1โ€ƒ Liang Pan1โ€ƒ Zhongang Cai1,2,3โ€ƒ Ziwei Liu1*
1S-Lab, Nanyang Technological Universityโ€ƒ 2SenseTime Researchโ€ƒ 3Shanghai AI Laboratory

Accepted to CVPR 2022 (Oral)

This repository contains the official implementation of Versatile Multi-Modal Pre-Training for Human-Centric Perception. For brevity, we name our method HCMoCo.


arXiv โ€ข Project Page โ€ข Dataset

Citation

If you find our work useful for your research, please consider citing the paper:

@article{hong2022hcmoco,
  title={Versatile Multi-Modal Pre-Training for Human-Centric Perception},
  author={Hong, Fangzhou and Pan, Liang and Cai, Zhongang and Liu, Ziwei},
  journal={arXiv preprint arXiv:2203.13815},
  year={2022}
}

Updates

[03/2022] Code release!

[03/2022] HCMoCo is accepted to CVPR 2022 for Oral presentation๐Ÿฅณ!

Installation

We recommend using conda to manage the python environment. The commands below are provided for your reference.

git clone [email protected]:hongfz16/HCMoCo.git
cd HCMoCo
conda create -n HCMoCo python=3.6
conda activate HCMoCo
conda install -c pytorch pytorch=1.6.0 torchvision=0.7.0 cudatoolkit=10.1
pip install -r requirements.txt

Other than the above steps, if you want to run the PointNet++ experiments, please remember to compile the pointnet operators.

cd pycontrast/networks/pointnet2
python setup.py install

Dataset Preparation

1. NTU RGB-D Dataset

This dataset is for the pre-train process. Download the 'NTU RGB+D 60' dataset here. Extract the data to pycontrast/data/NTURGBD/NTURGBD. The folder structure should look like:

./
โ”œโ”€โ”€ ...
โ””โ”€โ”€ pycontrast/data/NTURGBD/
    โ”œโ”€โ”€NTURGBD/
        โ”œโ”€โ”€ nturgb+d_rgb/
        โ”œโ”€โ”€ nturgb+d_depth_masked/
        โ”œโ”€โ”€ nturgb+d_skeletons/
        โ””โ”€โ”€ ...

Preprocess the raw data using the following two python scripts which could produce calibrated RGB frames in nturgb+d_rgb_warped_correction and extracted skeleton information in nturgb+d_parsed_skeleton.

cd pycontrast/data/NTURGBD
python generate_skeleton_data.py
python preprocess_nturgbd.py

2. NTURGBD-Parsing-4K Dataset

This dataset is for both the pre-train process and depth human parsing task. Follow the instructions here for the preparation of NTURGBD-Parsing-4K dataset.

3. MPII Human Pose Dataset

This dataset is for the pre-train process. Download the 'MPII Human Pose Dataset' here. Extract them to pycontrast/data/mpii. The folder structure should look like:

./
โ”œโ”€โ”€ ...
โ””โ”€โ”€ pycontrast/data/mpii
    โ”œโ”€โ”€ annot/
    โ””โ”€โ”€ images/

4. COCO Keypoint Detection Dataset

This dataset is for both the pre-train process and DensePose estimation. Download the COCO 2014 train/val images/annotations here. Extract them to pycontrast/data/coco. The folder structure should look like:

./
โ”œโ”€โ”€ ...
โ””โ”€โ”€ pycontrast/data/coco
    โ”œโ”€โ”€ annotations/
        โ””โ”€โ”€ *.json
    โ””โ”€โ”€ images/
        โ”œโ”€โ”€ train2014/
            โ””โ”€โ”€ *.jpg
        โ””โ”€โ”€ val2014/
            โ””โ”€โ”€ *.jpg

5. Human3.6M Dataset

This dataset is for the RGB human parsing task. Download the Human3.6M dataset here and extract under HRNet-Semantic-Segmentation/data/human3.6m. Use the provided script mp_parsedata.py for the pre-processing of the raw data. The folder structure should look like:

./
โ”œโ”€โ”€ ...
โ””โ”€โ”€ HRNet-Semantic-Segmentation/data/human3.6m
    โ”œโ”€โ”€ protocol_1/
        โ”œโ”€โ”€ rgb
        โ””โ”€โ”€ seg
    โ”œโ”€โ”€ flist_2hz_train.txt
    โ”œโ”€โ”€ flist_2hz_eval.txt
    โ””โ”€โ”€ ...

6. ITOP Dataset

This dataset is for the depth 3D pose estimation. Download the ITOP dataset here and extract under A2J/data. Use the provided script data_preprocess.py for the pre-processing of the raw data. The folder structure should look like:

./
โ”œโ”€โ”€ ...
โ””โ”€โ”€ A2J/data
    โ”œโ”€โ”€ side_train/
    โ”œโ”€โ”€ side_test/
    โ”œโ”€โ”€ itop_size_mean.npy
    โ”œโ”€โ”€ itop_size_std.npy
    โ”œโ”€โ”€ bounding_box_depth_train.pkl
    โ”œโ”€โ”€ itop_side_bndbox_test.mat
    โ””โ”€โ”€ ...

Model Zoo

TBA

HCMoCo Pre-train

Finally, let's start the pre-training process. We use slurm to manage the distributed training. You might need to modify the below mentioned scripts according to your own distributed training method. We develop HCMoCo based on the CMC repository. The codes for this part are provided under pycontrast.

1. First Stage

For the first stage, we only perform 'Sample-level modality-invariant representation learning' for 100 epoch. We provide training scripts for this stage under pycontrast/scripts/FirstStage. Specifically, we provide the scripts for training with 'NTURGBD+MPII': train_ntumpiirgbd2s_hrnet_w18.sh and 'NTURGBD+COCO': train_ntucocorgbd2s_hrnet_w18.sh.

cd pycontrast
sh scripts/FirstStage/train_ntumpiirgbd2s_hrnet_w18.sh

2. Second Stage

For the second stage, all three proposed learning targets in HCMoCo are used to continue training for another 100 epoch. We provide training scripts for this stage under pycontrast/scripts/SecondStage. The naming of scripts are corresponding to that of the first stage.

3. Extract pre-trained weights

After the two-stage pre-training, we need to extract pre-trained weights of RGB/depth encoders for transfering to downstream tasks. Specifically, please refer to pycontrast/transfer_ckpt.py for extracting pre-trained weights of the RGB encoder and pycontrast/transfer_ckpt_depth.py for that of the depth encoder.

Evaluation on Downstream Tasks

1. DensePose Estimation

The DensePose estimation is performed on COCO dataset. Please refer to detectron2 for the training and evaluation of DensePose estimation. We provide our config files under DensePose-Config for your reference. Fill the config option MODEL.WEIGHTS with the path to the pre-trained weights.

2. RGB Human Parsing

The RGB human parsing is performed on Human3.6M dataset. We develop the RGB human parsing task based on the HRNet-Semantic-Segmentation repository and include the our version in this repository. We provide a config template HRNet-Semantic-Segmentation/experiments/human36m/config-template.yaml. Remember to fill the config option MODEL.PRETRAINED with the path to the pre-trained weights. The training and evaluation commands are provided below.

cd HRNet-Semantic-Segmentation
# Training
python -m torch.distributed.launch \
  --nproc_per_node=2 \
  --master_port=${port} \
  tools/train.py \
      --cfg ${config_file}
# Evaluation
python tools/test.py \
    --cfg ${config_file} \
    TEST.MODEL_FILE ${path_to_trained_model}/best.pth \
    TEST.FLIP_TEST True \
    TEST.NUM_SAMPLES 0

3. Depth Human Parsing

The depth human parsing is performed on our proposed NTURGBD-Parsing-4K dataset. Similarly, the code for depth human parsing is developed based on the HRNet-Semantic-Segmentation repository. We provide a config template HRNet-Semantic-Segmentation/experiments/nturgbd_d/config-template.yaml. Please refer to the above 'RGB Human Parsing' section for detailed usages.

4. Depth 3D Pose Estimation

The depth 3D pose estimation is evaluated on ITOP dataset. We develop the codes based on the A2J repository. Since the original repository does not provide the training codes, we implemented it by ourselves. The training and evaluation commands are provided below.

cd A2J
python main.py \
    --pretrained_pth ${path_to_pretrained_weights} \
    --output ${path_to_the_output_folder}

Experiments on the Versatility of HCMoCo

1. Cross-Modality Supervision

The experiments for the versatility of HCMoCo are evaluated on NTURGBD-Parsing-4K datasets. For the 'RGB->Depth' cross-modality supervision, please refer to pycontrast/scripts/Versatility/train_ntusegrgbd2s_hrnet_w18_sup_rgb_cmc1_other1.sh. For the 'Depth->RGB' cross-modality supervision, please refer to pycontrast/scripts/Versatility/train_ntusegrgbd2s_hrnet_w18_sup_d_cmc1_other1.sh.

cd pycontrast
sh scripts/Versatility/train_ntusegrgbd2s_hrnet_w18_sup_rgb_cmc1_other1.sh
sh scripts/Versatility/train_ntusegrgbd2s_hrnet_w18_sup_d_cmc1_other1.sh

2. Missing-Modality Inference

Please refer to the provided script pycontrast/scripts/Versatility/train_ntusegrgbd2s_hrnet_w18_sup_rgbd_cmc1_other1.sh

cd pycontrast
sh scripts/Versatility/train_ntusegrgbd2s_hrnet_w18_sup_rgbd_cmc1_other1.sh

License

Distributed under the MIT License. See LICENSE for more information.

Acknowledgements

This work is supported by NTU NAP, MOE AcRF Tier 2 (T2EP20221-0033), and under the RIE2020 Industry Alignment Fund โ€“ Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from the industry partner(s).

We thank the following repositories for their contributions in our implementation: CMC, HRNet-Semantic-Segmentation, SemGCN, PointNet2.PyTorch, and A2J.

More Repositories

1

AvatarCLIP

[SIGGRAPH 2022 Journal Track] AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars
Python
939
star
2

EVA3D

[ICLR 2023 Spotlight] EVA3D: Compositional 3D Human Generation from 2D Image Collections
Python
458
star
3

DS-Net

[CVPR 2021] Rank 1st in the public leaderboard of SemanticKITTI Panoptic Segmentation (2020-11-16)
Python
219
star
4

Garment4D

[NeurIPS 2021] Garment4D: Garment Reconstruction from Point Cloud Sequences
Python
121
star
5

Pointnet-TextCNN

Point cloud classification and segmentation based on pointnet and textcnn
Python
5
star
6

AutomonousDrivingHW

Homework of Autonomous Driving
C++
4
star
7

Tornado-Monitor-System

2018 Summer Python Project. An Intelligent Monitor Platform Based on Tornado.
Python
4
star
8

neural_renderer

Modified version of neural_renderer for AvatarCLIP
Python
3
star
9

AssemblyPlayStation

This is a repo for our assembly project which is a bootable os with multiple games.
Assembly
3
star
10

ShadowCompiler

Project for Computer Network(2) -- Compiler Part
C++
2
star
11

xv6-e1000-driver

add nic support for xv6
C
2
star
12

minimum-ftp

A simple ftp server and client which only implement minimum function.
Python
2
star
13

FZLambda

Lambda Expression Evaluation and Type Check Implemented by Haskell
Haskell
2
star
14

2017_spring_notepad_playground

C++
1
star
15

mysite-LearningDjango

A simple blog build by django while learn it.
Python
1
star
16

DA-A2

Data Structure & Algorithm 2
C++
1
star
17

Notepad

A very simple text editor based on MFC (Software Engineering 2 team project)
HTML
1
star
18

Data-Structure-2-Project

Repository for Data Structure and Algorithm projects
C++
1
star
19

Astroid-Wechat-Game

Classic game Astroid in Wechat-Game platform
JavaScript
1
star
20

20171021-ETC-RiYueShenJiao

1
star
21

FlightControl

A fight control system based on STM32
C
1
star
22

xv6-comment

Commented xv6 source code.
C
1
star
23

TfMnistDataset

TensorFlow Mnist Dataset
1
star
24

DataStructureAndAlgorithm-1-

2017็ง‹ๅญฃๅญฆๆœŸๆ•ฐๆฎ็ป“ๆž„ไธŽ็ฎ—ๆณ•๏ผˆ1๏ผ‰ๅฎž้ชŒ
C++
1
star
25

2016_nand2tetris

Assembly
1
star
26

WeChatTicket-THSSOJ

Python
1
star