• Stars
    star
    582
  • Rank 76,801 (Top 2 %)
  • Language
    Python
  • Created over 2 years ago
  • Updated 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

HAnd Gesture Recognition Image Dataset

HaGRID - HAnd Gesture Recognition Image Dataset

hagrid

We introduce a large image dataset HaGRID (HAnd Gesture Recognition Image Dataset) for hand gesture recognition (HGR) systems. You can use it for image classification or image detection tasks. Proposed dataset allows to build HGR systems, which can be used in video conferencing services (Zoom, Skype, Discord, Jazz etc.), home automation systems, the automotive sector, etc.

HaGRID size is 716GB and dataset contains 552,992 FullHD (1920 × 1080) RGB images divided into 18 classes of gestures. Also, some images have no_gesture class if there is a second free hand in the frame. This extra class contains 123,589 samples. The data were split into training 92%, and testing 8% sets by subject user_id, with 509,323 images for train and 43,669 images for test.

gestures

The dataset contains 34,730 unique persons and at least this number of unique scenes. The subjects are people from 18 to 65 years old. The dataset was collected mainly indoors with considerable variation in lighting, including artificial and natural light. Besides, the dataset includes images taken in extreme conditions such as facing and backing to a window. Also, the subjects had to show gestures at a distance of 0.5 to 4 meters from the camera.

Example of sample and its annotation:

example

For more information see our arxiv paper HaGRID - HAnd Gesture Recognition Image Dataset.

Installation

Clone and install required python packages:

git clone https://github.com/hukenovs/hagrid.git
# or mirror link:
cd hagrid
# Create virtual env by conda or venv
conda create -n gestures python=3.9 -y
conda activate gestures
# Install requirements
pip install -r requirements.txt

Docker Installation

docker build -t gestures .
docker run -it -d -v $PWD:/gesture-classifier gestures

Downloads

We split the train dataset into 18 archives by gestures because of the large size of data. Download and unzip them from the following links:

Tranval

Gesture Size Gesture Size
call 39.1 GB peace 38.6 GB
dislike 38.7 GB peace_inverted 38.6 GB
fist 38.0 GB rock 38.9 GB
four 40.5 GB stop 38.3 GB
like 38.3 GB stop_inverted 40.2 GB
mute 39.5 GB three 39.4 GB
ok 39.0 GB three2 38.5 GB
one 39.9 GB two_up 41.2 GB
palm 39.3 GB two_up_inverted 39.2 GB

train_val annotations: ann_train_val

Test

Test Archives Size
images test 60.4 GB
annotations ann_test 27.3 MB

Subsample

Subsample has 100 items per gesture.

Subsample Archives Size
images subsample 2.5 GB
annotations ann_subsample 1.2 MB

HaGRID 512px - lightweight version of the full dataset with max_side = 512p

or by using python script

python download.py --save_path <PATH_TO_SAVE> \
                   --train \
                   --test \
                   --subset \
                   --annotations \
                   --dataset

Run the following command with key --subset to download the small subset (100 images per class). You can download the train subset with --trainval or test subset with --test. Download annotations for selected stage by --annotations key. Download dataset with images by --dataset.

usage: download.py [-h] [--train] [--test] [--subset] [-a] [-d] [-t TARGETS [TARGETS ...]] [-p SAVE_PATH]

Download dataset...

optional arguments:
  -h, --help            show this help message and exit
  --train               Download trainval set
  --test                Download test set
  --subset              Download subset with 100 items of each gesture
  -a, --annotations     Download annotations
  -d, --dataset         Download dataset
  -t TARGETS [TARGETS ...], --targets TARGETS [TARGETS ...]
                        Target(s) for downloading train set
  -p SAVE_PATH, --save_path SAVE_PATH
                        Save path

Models

We provide some pre-trained models as the baseline with the classic backbone architectures and two output heads - for gesture classification and leading hand classification.

Classifiers F1 Gestures F1 Leading hand
ResNet18 98.80 98.80
ResNet152 99.04 98.92
ResNeXt50 98.95 98.87
ResNeXt101 99.16 98.71
MobileNetV3_small 96.50 97.31
MobileNetV3_large 98.03 97.99
Vitb32 98.35 98.63
Lenet 84.58 91.16

Also we provide some models to solve hand detection problem.

Detector mAP
SSDLiteMobileNetV3Large 71.49
SSDLiteMobileNetV3Small 53.38
FRCNNMobilenetV3LargeFPN 78.05
YoloV7Tiny 81.1

However, if you need a single gesture, you can use pre-trained full frame classifiers instead of detectors. To use full frame models, set the configuration parameter full_frame: True and remove the no_gesture class

Full Frame Classifiers F1 Gestures
ResNet18 93.51
ResNet152 94.49
ResNeXt50 95.20
ResNeXt101 95.67
MobileNetV3_small 87.09
MobileNetV3_large 90.96

Train

You can use downloaded trained models, otherwise select a classifier and parameters for training in default.yaml. To train the model, execute the following command:

python -m classifier.run --command 'train' --path_to_config <PATH>
python -m detector.run --command 'train' --path_to_config <PATH>

Every step, the current loss, learning rate and others values get logged to Tensorboard. See all saved metrics and parameters by opening a command line (this will open a webpage at localhost:6006):

tensorboard --logdir=experiments

Test

Test your model by running the following command:

python -m classifier.run --command 'test' --path_to_config <PATH>
python -m detecotr.run --command 'test' --path_to_config <PATH>

Demo

python demo.py -p <PATH_TO_CONFIG> --landmarks

demo

Demo Full Frame Classifiers

python demo_ff.py -p <PATH_TO_CONFIG> --landmarks

Annotations

The annotations consist of bounding boxes of hands in COCO format [top left X position, top left Y position, width, height] with gesture labels. Also, annotations have 21 landmarks in format [x,y] relative image coordinates, markups of leading hands (left or right for gesture hand) and leading_conf as confidence for leading_hand annotation. We provide user_id field that will allow you to split the train / val dataset yourself.

"0534147c-4548-4ab4-9a8c-f297b43e8ffb": {
  "bboxes": [
    [0.38038597, 0.74085361, 0.08349486, 0.09142549],
    [0.67322755, 0.37933984, 0.06350809, 0.09187757]
  ],
  "landmarks"[
    [
      [
        [0.39917091, 0.74502739],
        [0.42500172, 0.74984396],
        ...
      ],
        [0.70590734, 0.46012364],
        [0.69208878, 0.45407018],
        ...
    ],
  ],
  "labels": [
    "no_gesture",
    "one"
  ],
  "leading_hand": "left",
  "leading_conf": 1.0,
  "user_id": "bb138d5db200f29385f..."
}
  • Key - image name without extension
  • Bboxes - list of normalized bboxes [top left X pos, top left Y pos, width, height]
  • Labels - list of class labels e.g. like, stop, no_gesture
  • Landmarks - list of normalized hand landmarks [x, y]
  • Leading hand - right or left for hand which showing gesture
  • Leading conf - leading confidence for leading_hand
  • User ID - subject id (useful for split data to train / val subsets).

Bounding boxes

Object Train + Val Test Total
gesture ~ 28 300 ~ 2 400 30 629
no gesture 112 740 10 849 123 589
total boxes 622 063 54 518 676 581

Landmarks

We annotate 21 hand keypoints by using MediaPipe open source framework. Due to auto markup empty lists may be present in landmarks.

Object Train + Val Test Total
leading hand 503 872 43 167 547 039
not leading hand 98 766 9 243 108 009
total landmarks 602 638 52 410 655 048

Converters

Yolo

We provide a script to convert annotations to YOLO format. To convert annotations, run the following command:

python -m converters.hagrid_to_yolo --path_to_config <PATH>

after conversion, you need change original definition img2labels to:

def img2label_paths(img_paths):
    img_paths = list(img_paths)
    # Define label paths as a function of image paths
    if "subsample" in img_paths[0]:
        return [x.replace("subsample", "subsample_labels").replace(".jpg", ".txt") for x in img_paths]
    elif "train_val" in img_paths[0]:
        return [x.replace("train_val", "train_val_labels").replace(".jpg", ".txt") for x in img_paths]
    elif "test" in img_paths[0]:
        return [x.replace("test", "test_labels").replace(".jpg", ".txt") for x in img_paths]
Coco

Also, we provide a script to convert annotations to Coco format. To convert annotations, run the following command:

python -m converters.hagrid_to_coco --path_to_config <PATH>

License

Creative Commons License
This work is licensed under a variant of Creative Commons Attribution-ShareAlike 4.0 International License.

Please see the specific license.

Authors and Credits

Citation

You can cite the paper using the following BibTeX entry:

@article{hagrid,
    title={HaGRID - HAnd Gesture Recognition Image Dataset},
    author={Kapitanov, Alexander and Makhlyarchuk, Andrey and Kvanchiani, Karina},
    journal={arXiv preprint arXiv:2206.08219},
    year={2022}
}

More Repositories

1

dsp-theory

Theory of digital signal processing (DSP): signals, filtration (IIR, FIR, CIC, MAF), transforms (FFT, DFT, Hilbert, Z-transform) etc.
Jupyter Notebook
981
star
2

easyportrait

EasyPortrait - Face Parsing and Portrait Segmentation Dataset
Python
237
star
3

coursera_ml_da_specialization

Coursera Specialization: Machine Learning and Data Analysis (Yandex & MIPT)
Jupyter Notebook
183
star
4

chaospy

Chaotic attractors with python (Lorenz, Rossler, Rikitake etc.)
Python
93
star
5

hh_research

Автоматизация поиска и исследования вакансий с сайта hh.ru (Headhunter) с помощью методов Python. Классификация данных, поиск статистических параметров.
Jupyter Notebook
86
star
6

intfftk

Fully pipelined Integer Scaled / Unscaled Radix-2 Forward/Inverse Fast Fourier Transform (FFT) IP-core for newest Xilinx FPGAs (Source language - VHDL / Verilog). GNU GPL 3.0.
VHDL
76
star
7

slovo

Slovo: Russian Sign Language Dataset and Models
Python
61
star
8

fp23fftk

Floating point Forward/Inverse Fast Fourier Transform (FFT) IP-core for newest Xilinx FPGAs (Source lang. - VHDL).
VHDL
54
star
9

tcl_for_fpga

TCL scripts for FPGA (Xilinx)
Tcl
31
star
10

math

Useful m-scripts for DSP (CIC, FIR, FFT, Fast convolution, Partial Filters etc.)
MATLAB
26
star
11

github_backup

GitHub saver for stargazers, forks, repos
Python
26
star
12

dsppy

Python Digital signal processing (DSP) modules
Python
19
star
13

adc_configurator

ADC configurator to 7-series Xilinx FPGA (has parameters: NCHAN, SERDES MODE, SDR/DDR, DATA WIDTH, DEPTH and so on)
VHDL
12
star
14

coursera_deep_learning_ai

Coursera Specialization: Deep Learning (Andrew Ng, deeplearning.ai)
Jupyter Notebook
12
star
15

blackman_harris_win

Blackman-Harris Window functions (3-, 5-, 7-term etc.) from 1K to 64M points based only on LUTs and DSP48s FPGA resources. Main core - CORDIC like as DDS (sine / cosine generator)
VHDL
9
star
16

fp23_logic

Floating point FP23 core on VHDL. For Xilinx FPGAs. Include base converters and some math functions.
VHDL
8
star
17

fp32_logic

Floating point FP32 core HDL. For Xilinx FPGAs. Include base converters and some math functions.
VHDL
8
star
18

kapitanov.github.io

CV. Kapitanov Alexander. Deep Learning Engineer, Ex. Lead FPGA developer.
6
star
19

intfft_spdf

Integer (Scaled / Unscaled) Radix-2 Single Path Delay Feedback (SPDF) FFT / IFFT cores
VHDL
5
star
20

Stupid_watch

LCD1602 and timer (DS1302) on Xilinx FPGA
VHDL
5
star
21

Chaotic_Attractors

Four Chaotic Attractors in MATLAB (Lorenz, Rossler, Nose-Hoover, Rikitake)
4
star
22

pre_commit_hooks

Simple Python project for testing pre-commit hooks
Shell
3
star
23

hagrid-models

Hand classifiers & detectors for HaGRID
Python
2
star
24

dspbash

Useful bash scripts for saving time while using your repositories
Shell
2
star
25

MinesweeperFPGA

Minesweeper project for FPGA
VHDL
2
star
26

fpga_heart

Simple project on Xilinx FPGA (Spartan3e) and LED8x8. Gift to my wife on Women's Day. 8/3/17.
VHDL
1
star