• Stars
    star
    301
  • Rank 137,546 (Top 3 %)
  • Language
    Python
  • License
    MIT License
  • Created over 2 years ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

[CVPR'22] Lite Pose: Efficient Architecture Design for 2D Human Pose Estimation

Lite Pose

slides|paper|video

demo

Abstract

Pose estimation plays a critical role in human-centered vision applications. However, it is difficult to deploy state-of-the-art HRNet-based pose estimation models on resource-constrained edge devices due to the high computational cost (more than 150 GMACs per frame). In this paper, we study efficient architecture design for real-time multi-person pose estimation on edge. We reveal that HRNet's high-resolution branches are redundant for models at the low-computation region via our gradual shrinking experiments. Removing them improves both efficiency and performance. Inspired by this finding, we design LitePose, an efficient single-branch architecture for pose estimation, and introduce two simple approaches to enhance the capacity of LitePose, including Fusion Deconv Head and Large Kernel Convs. Fusion Deconv Head removes the redundancy in high-resolution branches, allowing scale-aware feature fusion with low overhead. Large Kernel Convs significantly improve the model's capacity and receptive field while maintaining a low computational cost. With only 25% computation increment, $7\times7$ kernels achieve $+14.0$ mAP better than $3\times 3$ kernels on the CrowdPose dataset. On mobile platforms, LitePose reduces the latency by up to $5.0\times$ without sacrificing performance, compared with prior state-of-the-art efficient pose estimation models, pushing the frontier of real-time multi-person pose estimation on edge.

Results

CrowdPose Test

image

Model mAP #MACs Latency (ms)
Nano Mobile Pi
HigherHRNet-W24 57.4 25.3G 330 289 1414
EfficientHRNet-H-1 56.3 14.2G 283 267 1229
LitePose-Auto-S (Ours) 58.3 5.0G 97 76 420
LitePose-Auto-XS (Ours) 49.4 1.2G 22 27 109

COCO Val/Test 2017

Model mAP
(val)
mAP
(test-dev)
#MACs Latency (ms)
Nano Mobile Pi
EfficientHRNet-H-1 59.2 59.1 14.4G 283 267 1229
Lightweight OpenPose 42.8 - 9.0G - 97 -
LitePose-Auto-M (Ours) 59.8 59.7 7.8G 144 97 588

Note: For more details, please refer to our paper.

Usage

Prerequisites

  1. Install PyTorch and other dependencies:
pip install -r requirements.txt
  1. Install COCOAPI and CrowdPoseAPI following Official HigherHRNet Repository.

Data Preparation

  1. Please download from COCO download, 2017 Train/Val is needed for training and evalutation.
  2. Please download from CrowdPose download, Train/Val is training and evaluation.
  3. Refer to Official HigherHRNet Repository for more details about the data arrangement.

Training Process Overview

Super-net Training

To train a supernet from scratch with the search space specified by arch_manager.py, use

python dist_train.py --cfg experiments/crowd_pose/mobilenet/supermobile.yaml

Weight Transfer

After training the super-net, you may want to extract a specific sub-network (e.g. search-XS) from the super-net. The following script will be useful:

python weight_transfer.py --cfg experiments/crowd_pose/mobilenet/supermobile.yaml --superconfig mobile_configs/search-XS.json TEST.MODEL_FILE your_supernet_checkpoint_path

Normal Training

To train a normal network with a specific architecture (e.g. search-XS), please use the following script:

Note: Please change the resolution in configuration (e.g. experiments/crowd_pose/mobilenet/mobile.yaml) in accord with the architecture configuration (e.g. search-XS.json) before training.

python dist_train.py --cfg experiments/crowd_pose/mobilenet/mobile.yaml --superconfig mobile_configs/search-XS.json

Evaluation

To evaluate the model with a specific architecture (e.g. search-XS), please use the following script:

python valid.py --cfg experiments/crowd_pose/mobilenet/mobile.yaml --superconfig mobile_configs/search-XS.json TEST.MODEL_FILE your_checkpoint_path

Models

Pre-trained Models

To re-implement results in the paper, we need to load pre-trained checkpoints before training super-nets. These checkpoints are provided in COCO-Pretrain and CrowdPose-Pretrain.

APK for Android Phones

goodle drive link

Result Models

We provide the checkpoints corresponding to the results in our paper.

Dataset Model #MACs mAP
CrowdPose LitePose-Auto-L 13.7 61.9
LitePose-Auto-M 7.8 59.9
LitePose-Auto-S 5.0 58.3
LitePose-Auto-XS 1.2 49.5
COCO LitePose-Auto-L 13.8 62.5
LitePose-Auto-M 7.8 59.8
LitePose-Auto-S 5.0 56.8
LitePose-Auto-XS 1.2 40.6

Acknowledgements

Lite Pose is based on HRNet-family, mainly on HigherHRNet. Thanks for their well-organized code!

About Large Kernel Convs, several recent papers have found similar conclusions: ConvNeXt, RepLKNet. We are looking forward to more applications of large kernels on different tasks!

Citation

If Lite Pose is useful or relevant to your research, please kindly recognize our contributions by citing our paper:

@article{wang2022lite,
  title={Lite Pose: Efficient Architecture Design for 2D Human Pose Estimation},
  author={Wang, Yihan and Li, Muyang and Cai, Han and Chen, Wei-Ming and Han, Song},
  journal={arXiv preprint arXiv:2205.01271},
  year={2022}
}

More Repositories

1

streaming-llm

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
Python
6,323
star
2

bevfusion

[ICRA'23] BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation
Python
2,153
star
3

temporal-shift-module

[ICCV 2019] TSM: Temporal Shift Module for Efficient Video Understanding
Python
2,040
star
4

once-for-all

[ICLR 2020] Once for All: Train One Network and Specialize it for Efficient Deployment
Python
1,860
star
5

llm-awq

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Python
1,687
star
6

proxylessnas

[ICLR 2019] ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware
C++
1,415
star
7

data-efficient-gans

[NeurIPS 2020] Differentiable Augmentation for Data-Efficient GAN Training
Python
1,272
star
8

torchquantum

A PyTorch-based framework for Quantum Classical Simulation, Quantum Machine Learning, Quantum Neural Networks, Parameterized Quantum Circuits with support for easy deployments on real quantum computers.
Jupyter Notebook
1,270
star
9

efficientvit

EfficientViT is a new family of vision models for efficient high-resolution vision.
Python
1,218
star
10

torchsparse

[MICRO'23, MLSys'22] TorchSparse: Efficient Training and Inference Framework for Sparse Convolution on GPUs.
Cuda
1,181
star
11

smoothquant

[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
Python
1,175
star
12

gan-compression

[CVPR 2020] GAN Compression: Efficient Architectures for Interactive Conditional GANs
Python
1,102
star
13

anycost-gan

[CVPR 2021] Anycost GANs for Interactive Image Synthesis and Editing
Python
778
star
14

tinyml

Python
732
star
15

tinyengine

[NeurIPS 2020] MCUNet: Tiny Deep Learning on IoT Devices; [NeurIPS 2021] MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning; [NeurIPS 2022] MCUNetV3: On-Device Training Under 256KB Memory
C
717
star
16

TinyChatEngine

TinyChatEngine: On-Device LLM Inference Library
C++
695
star
17

fastcomposer

[IJCV] FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention
Python
644
star
18

pvcnn

[NeurIPS 2019, Spotlight] Point-Voxel CNN for Efficient 3D Deep Learning
Python
636
star
19

lite-transformer

[ICLR 2020] Lite Transformer with Long-Short Range Attention
Python
589
star
20

spvnas

[ECCV 2020] Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution
Python
577
star
21

distrifuser

[CVPR 2024 Highlight] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
Python
538
star
22

mcunet

[NeurIPS 2020] MCUNet: Tiny Deep Learning on IoT Devices; [NeurIPS 2021] MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning
Python
423
star
23

amc

[ECCV 2018] AMC: AutoML for Model Compression and Acceleration on Mobile Devices
Python
422
star
24

tiny-training

On-Device Training Under 256KB Memory [NeurIPS'22]
Python
414
star
25

dlg

[NeurIPS 2019] Deep Leakage From Gradients
Python
375
star
26

offsite-tuning

Offsite-Tuning: Transfer Learning without Full Model
Python
365
star
27

haq

[CVPR 2019, Oral] HAQ: Hardware-Aware Automated Quantization with Mixed Precision
Python
362
star
28

hardware-aware-transformers

[ACL'20] HAT: Hardware-Aware Transformers for Efficient Natural Language Processing
Python
321
star
29

inter-operator-scheduler

[MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration
C++
189
star
30

amc-models

[ECCV 2018] AMC: AutoML for Model Compression and Acceleration on Mobile Devices
Python
165
star
31

apq

[CVPR 2020] APQ: Joint Search for Network Architecture, Pruning and Quantization Policy
Python
156
star
32

parallel-computing-tutorial

C++
123
star
33

flatformer

[CVPR'23] FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer
Python
119
star
34

patch_conv

Patch convolution to avoid large GPU memory usage of Conv2D
Python
72
star
35

6s965-fall2022

Jupyter Notebook
64
star
36

sparsevit

[CVPR'23] SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer
Python
48
star
37

bnn-icestick

Binary Neural Network on IceStick FPGA.
Jupyter Notebook
47
star
38

e3d

Efficient 3D Deep Learning
46
star
39

neurips-micronet

[JMLR'20] NeurIPS 2019 MicroNet Challenge Efficient Language Modeling, Champion
Jupyter Notebook
40
star
40

spatten-llm

[HPCA'21] SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning
Scala
32
star
41

tinychat-tutorial

C++
28
star
42

pruning-sparsity-publications

14
star
43

iccad-tinyml-open

[ICCAD'22 TinyML Contest] Efficient Heart Stroke Detection on Low-cost Microcontrollers
C
14
star
44

calo-cluster

Jupyter Notebook
5
star
45

ml-blood-pressure

Python
5
star
46

gan-compression-dynamic

Python
3
star
47

data-efficient-gans-dynamic

Python
3
star