• Stars
    star
    861
  • Rank 52,968 (Top 2 %)
  • Language
    Python
  • License
    MIT License
  • Created over 5 years ago
  • Updated over 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Apply ML to the skeletons from OpenPose; 9 actions; multiple people. (WARNING: I'm sorry that this is only good for course demo, not for real world applications !!! Those ary very difficult !!!)

Multi-person Real-time Action Recognition Based-on Human Skeleton

Highlights: 9 actions; multiple people (<=5); Real-time and multi-frame based recognition algorithm.

Updates: On 2019-10-26, I refactored the code; added more comments; and put all settings into the config/config.yaml file, including: classes of actions, input and output of each file, OpenPose settings, etc.

Project: This is my final project for EECS-433 Pattern Recognition in Northwestern Univeristy on March 2019. A simpler version where two teammates and I worked on is here.

Warning: Since I used the 10 fps video and 0.5s-window for training, you must also limit your video fps to be about 10 fps (7~12 fps) if you want to test my pretrained model on your own video or web camera.

Contents:

1. Algorithm

I collected videos of 9 Types of actions: ['stand', 'walk', 'run', 'jump', 'sit', 'squat', 'kick', 'punch', 'wave']. The total video lengths are about 20 mins, containing about 10000 video frames recorded at 10 frames per second.

The workflow of the algorithm is:

  • Get the joints' positions by OpenPose.
  • Track each person. Euclidean distance between the joints of two skeletons is used for matching two skeletons. See class Tracker in lib_tracker.py
  • Fill in a person's missing joints by these joints' relative pos in previous frame. See class FeatureGenerator in lib_feature_proc.py. So does the following.
  • Add noise to the (x, y) joint positions to try to augment data.
  • Use a window size of 0.5s (5 frames) to extract features.
  • Extract features of (1) body velocity and (2) normalized joint positions and (3) joint velocities.
  • Apply PCA to reduce feature dimension to 80. Classify by DNN of 3 layers of 50x50x50 (or switching to other classifiers in one line). See class ClassifierOfflineTrain in lib_classifier.py
  • Mean filtering the prediction scores between 2 frames. Add label above the person if the score is larger than 0.8. See class ClassifierOnlineTest in lib_classifier.py

For more details about how the features are extracted, please see my report.

2. Install Dependency (OpenPose)

We need Python >= 3.6.

2.1. Download tf-pose-estimation

This project uses a OpenPose program developped by ildoonet. The source project has been deleted. I've managed to fork it to here: tf-pose-estimation.

Please download it:

export MyRoot=$PWD
cd src/githubs  
git clone https://github.com/felixchenfy/ildoonet-tf-pose-estimation
mv ildoonet-tf-pose-estimation tf-pose-estimation

2.2. Download pretrained models

The mobilenet_thin models are already included in the project. No need to download. See folder:

src/githubs/tf-pose-estimation/models/graphβœ— ls
cmu  mobilenet_thin  mobilenet_v2_large  mobilenet_v2_small

If you want to use the original OpenPose model which is named "cmu" here, you need to download it:

cd $MyRoot/src/githubs/tf-pose-estimation/models/graph/cmu  
bash download.sh  

2.3. Insteall libraries

Basically you have to follow the tutorial of tf-pose-estimation project. If you've setup the env for that project, then it's almost the same env to run my project.

Please follow its tutorial here. I've copied what I ran to below:

conda create -n tf tensorflow-gpu
conda activate tf

cd $MyRoot/src/githubs/tf-pose-estimation
pip3 install -r requirements.txt
pip3 install jupyter tqdm

# Install tensorflow.
# You may need to take a few tries and select the version that is compatible with your cuDNN. If the version mismatches, you might get this error: "Error : Failed to get convolution algorithm."
pip3 install tensorflow-gpu==1.13.1

# Compile c++ library as described [here](https://github.com/felixchenfy/ildoonet-tf-pose-estimation#install-1):
sudo apt install swig
pip3 install "git+https://github.com/philferriere/cocoapi.git#egg=pycocotools&subdirectory=PythonAPI"
cd $MyRoot/src/githubs/tf-pose-estimation/tf_pose/pafprocess
swig -python -c++ pafprocess.i && python3 setup.py build_ext --inplace

Then install some small libraries used by me:

cd $MyRoot
pip3 install -r requirements.txt

2.4. Verify installation

Make sure you can successfully run its demo examples:

cd $MyRoot/src/githubs/tf-pose-estimation
python run.py --model=mobilenet_thin --resize=432x368 --image=./images/p1.jpg

If you encounter error, you may try to search in google or tf-pose-estimation's issue. The problem is probably due to the dependency of that project.

3. Program structure

Diagram

Trouble shooting:

  • How to change features?

    In utils/lib_feature_proc.py, in the class FeatureGenerator, change the function def add_cur_skeleton!

    The function reads in a raw skeleton and outputs the feature generated from this raw skeleton as well as previous skeletons. The feature will then be saved to features_X.csv by the script s3_preprocess_features.py for the next training step.

  • How to include joints of the head?

    You need to change the aforementioned add_cur_skeleton function.

    I suggest you to write a new function to extract the head features, and then append them to the returned variable(feature) of add_cur_skeleton.

    Please read def retrain_only_body_joints in utils/lib_feature_proc.py if you want to add the head joints.

  • How to change the classifier to RNN?

    There are two major changes to do:

    First, change the aforementioned add_cur_skeleton. Instead of manually extracing time-serials features as does by the current script, you may simply stack the input skeleton with previous skeletons and then output it.

    Second, change the def __init__ and def predict function of class ClassifierOfflineTrain in utils/lib_classifier.py to add an RNN model.

Main scripts

The 5 main scripts are under src/. They are named under the order of excecution:

src/s1_get_skeletons_from_training_imgs.py    
src/s2_put_skeleton_txts_to_a_single_txt.py
src/s3_preprocess_features.py
src/s4_train.py 
src/s5_test.py

The input and output of these files as well as some parameters are defined in the configuration file config/config.yaml. I paste part of it below just to provide an intuition:

classes: ['stand', 'walk', 'run', 'jump', 'sit', 'squat', 'kick', 'punch', 'wave']

image_filename_format: "{:05d}.jpg"
skeleton_filename_format: "{:05d}.txt"

features:
  window_size: 5 # Number of adjacent frames for extracting features. 

s1_get_skeletons_from_training_imgs.py:
  openpose:
    model: cmu # cmu or mobilenet_thin. "cmu" is more accurate but slower.
    img_size: 656x368 #  656x368, or 432x368, 336x288. Bigger is more accurate.
  input:
    images_description_txt: data/source_images3/valid_images.txt
    images_folder: data/source_images3/
  output:
    images_info_txt: data_proc/raw_skeletons/images_info.txt # This file is not used.
    detected_skeletons_folder: &skels_folder data_proc/raw_skeletons/skeleton_res/
    viz_imgs_folders: data_proc/raw_skeletons/image_viz/

s2_put_skeleton_txts_to_a_single_txt.py:
  input:
    # A folder of skeleton txts. Each txt corresponds to one image.
    detected_skeletons_folder: *skels_folder
  output:
    # One txt containing all valid skeletons.
    all_skeletons_txt: &skels_txt data_proc/raw_skeletons/skeletons_info.txt

s3_preprocess_features.py:
  input: 
    all_skeletons_txt: *skels_txt
  output:
    processed_features: &features_x data_proc/features_X.csv
    processed_features_labels: &features_y data_proc/features_Y.csv

s4_train.py:
  input:
    processed_features: *features_x
    processed_features_labels: *features_y
  output:
    model_path: model/trained_classifier.pickle

For how to run the main scripts, please see the Section 4. How to run: Inference and 6. How to run: Training.

4. How to run: Inference

Introduction

The script src/s5_test.py is for doing real-time action recognition.

The classes are set in config/config.yaml by the key classes.

The supported input includes video file, a folder of images, and web camera, which is set by the command line arguments --data_type and --data_path.

The trained model is set by --model_path, e.g.:model/trained_classifier.pickle.

The output is set by --output_folder, e.g.: output/.

The test data (a video, and a folder of images) are already included under the data_test/ folder.

An example result of the input video "exercise.avi" is:

output/exercise/
β”œβ”€β”€ skeletons
β”‚   β”œβ”€β”€ 00000.txt
β”‚   β”œβ”€β”€ 00001.txt
β”‚   └── ...
└── video.avi

Also, the result will be displayed by cv2.imshow().

Example commands are given below:

Test on video file

python src/s5_test.py \
    --model_path model/trained_classifier.pickle \
    --data_type video \
    --data_path data_test/exercise.avi \
    --output_folder output

Test on a folder of images

python src/s5_test.py \
    --model_path model/trained_classifier.pickle \
    --data_type folder \
    --data_path data_test/apple/ \
    --output_folder output

Test on web camera

python src/s5_test.py \
    --model_path model/trained_classifier.pickle \
    --data_type webcam \
    --data_path 0 \
    --output_folder output

5. Training data

Download my data

Follow the instructions in data/download_link.md to download the data. Or, you can create your own. The data and labelling format are described below.

Data format

Each data subfolder (e.g. data/source_images3/jump_03-02-12-34-01-795/) contains images named as 00001.jpg, 00002.jpg, etc.
The naming format of each image is defined in config/config.yaml by the sentence: image_filename_format: "{:05d}.jpg".

The images to be used as training data and their label are configured by this txt file: data/source_images3/valid_images.txt.
A snapshot of this txt file is shown below:

jump_03-02-12-34-01-795
52 59
72 79

kick_03-02-12-36-05-185
54 62

In each paragraph,
the 1st line is the data folder name, which should start with "${class_name}_". The 2nd and following lines specify the staring index and ending index of the video that corresponds to that class.

Let's take the 1st paragraph of the above snapshot as an example: jump is the class, and the frames 52~59 & 72~79 of the video are used for training.

Classes

The classes are set in config/config.yaml under the key word classes. No matter how many classes you put in the training data (set by the folder name), only the ones that match with the classes in config.yaml are used for training and inference.

6. How to run: Training

First, you may read

to know the training data format and the input and output of each script.

Then, follow the following steps to do the training:

  • Collect your own data and label them, or use my data. Here is tool to record images from web camera.
  • If you are using your data, change the values of classes and images_description_txt and images_folder inside config/config.yaml.
  • Depend on your need, you may change parameters in config/config.yaml.
  • Finally, run the following scripts one by one:
    python src/s1_get_skeletons_from_training_imgs.py
    python src/s2_put_skeleton_txts_to_a_single_txt.py 
    python src/s3_preprocess_features.py
    python src/s4_train.py 

By default, the intermediate data are saved to data_proc/, and the model is saved to model/trained_classifier.pickle.
After training is done, you can run the inference script src/s5_test.py as described in Section 4. How to run: Inference.

7. Result and Performance

Unfortunately this project only works well on myself, because I only used the video of myself.

The performance is bad for (1) people who have different body shape, (2) people are far from the camera. How to improve? I guess the first thing to do is to collect larger training set from different people. Then, improve the data augmentation and featuer selection.

Besides, my simple tracking algorithm only works for a few number of people (maybe 5).

Due to the not-so-good performance of action recognition, I guess you can only use this project for course demo, but not for any commercial applications ... T.T

More Repositories

1

Monocular-Visual-Odometry

A simple monocular visual odometry (part of vSLAM) by ORB keypoints with initialization, tracking, local map and bundle adjustment. (WARNING: Hi, I'm sorry that this project is tuned for course demo, not for real world applications !!!)
C++
381
star
2

open3d_ros_pointcloud_conversion

2 Python API functions for point cloud conversion between Open3D and ROS. Compatible for XYZ and XYZRGB point type.
CMake
94
star
3

ros_yolo_as_template_matching

Run 3 scripts to (1) Synthesize images (by putting few template images onto backgrounds), (2) Train YOLOv3, and (3) Detect objects for: one image, images, video, webcam, or ROS topic.
Python
57
star
4

3D-Scanner-by-Baxter

Use a robot arm (Baxter) mounted with a depth camera to scan an object's 3D model.
Python
56
star
5

practice_motion_planning

Coding: β‘ Path planning: RRT*, A*; β‘‘ Tracking: Optimization, PurePursuit, FollowLine. β‘’Planning and control on a mobile manipulator
Python
48
star
6

ros_openpose_rgbd

Visualize 3d humans' skeletons(body+hands) in ros rviz. The 2d joints are detected by openpose; The depth is from depth image.
Python
39
star
7

ros_3d_pointing_detection

Which object a person is pointing at? Detect it by using YOLO, Openpose and depth image (under customized scene).
Python
37
star
8

Speech-Commands-Classification-by-LSTM-PyTorch

Classification of 11 types of audio clips using MFCCs features and LSTM. Pretrained on Speech Command Dataset with intensive data augmentation.
Jupyter Notebook
36
star
9

ros_detect_planes_from_depth_img

A python node to detect planes from depth image by using RANSAC algorithm. Input/Output from/to ROS topics.
Python
34
star
10

Detect-Object-and-6D-Pose

(1) 3D scan object by Baxter. (2) Label objects automatically by depth camera and (3) train Yolo. (4) [TODO; NOT DONE YET!!!] Finally, detect object and fit 3D model to know the 6D pose.
26
star
11

Mask-Objects-from-RGBD

Put objects on a plane. Use depth camera to find them and add label (for training Yolo).
Python
17
star
12

Data-Augment-and-Train-Yolo

Put masked object onto background images randomly to generate images. Train Yolo3.
Jupyter Notebook
16
star
13

API_for_Simulating_Multi-Link_System

Mathematica API for simulating the dynamics and collision of planar multi-link objects (by Euler-Lagrange equation).
Mathematica
14
star
14

Detect-Hand-Grasping-Object

A toy project: Detect my hand grasping object in the video. Backbone algorithms: SiamMask, Mask_RCNN, OpenPose
Jupyter Notebook
13
star
15

Data-Storage

Store some images, gifs, etc.
13
star
16

ros_pub_and_sub_rgbd_and_cloud

Python nodes to publish/subscribe RGB-D images and their point clouds (or any of them) to/from ROS topics.
Python
13
star
17

record_images_from_usbcam

Run one script and press 's'/'d' to save your laptop's camera images to disk. Two versions: (1) Python, and (2) ROS node.
Python
9
star
18

cpp_practice_image_processing

Implement: Sobel; Canny; Harris; Hough line; Fit line; RANSAC.
C++
8
star
19

Command_Robot_to_Move

Use voice to tell robot the target, then the robot detects it and moves there. (LSTM, YOLO, Plane detection, Motion planning, ...)
Jupyter Notebook
6
star
20

ros_turtlebot_control

ROS services for controlling Turtlebot3 to target pose by `Move to Pose` algorithm.
Python
5
star
21

ros_speech_commands_classification

(1) Press key to record audio; (2) Speak a word to microphone; (3) Finally, see the classification result on GUI and ROS topic.
Python
5
star
22

Voice_Control_Turtlebot--Masters_Final

A toy project of using voice to tell a Turtlebot Robot to detect and move to target, achieved by 4 components (1) speech classification, (2) object detection, (3) plane detection, and (4) control of wheel motion.
5
star
23

DQN_SwingUpPendulum

Using Deep Q-network to train an AI to play swing-up pendulum game
Python
3
star
24

ros_record_rgbd_images

Press key to record color/depth images from ROS topics or Realsense. Key `a` for saving single image; `s` for starting continuous recording; `d` for stop recording. `q` for quit.
Python
3
star
25

ros_images_publisher

A python script to publish color or depth images from a folder to ROS topic.
Python
2
star
26

Monocular-Visual-Odometry-Data

Only my VO project's test data and results
2
star
27

Baxter_Picks_Up_Dices

A readme for the CV in ME495's final project β€œBaxter Robot picking up dices”. In short, (1) detecting dices using graph cut algorithm, and (2) locating their pos by geometry.
1
star
28

keyboard_input

4 functions for reading keyboard input : Read char or string; With or without time out.
Python
1
star