• This repository has been archived on 21/Mar/2024
  • Stars
    star
    733
  • Rank 61,835 (Top 2 %)
  • Language
    Python
  • License
    MIT License
  • Created about 4 years ago
  • Updated almost 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Enhance your application with the ability to see and interact with humans using any RGB camera.

State-of-the-art Real-time Action Recognition


Website • Blogpost • Getting Started • Build Your Own Classifier • iOS Deployment • Gallery • Datasets • SDK License

Documentation GitHub GitHub release Contributor Covenant


sense is an inference engine to serve powerful neural networks for action recognition, with a low computational footprint. In this repository, we provide:

  • Two models out-of-the-box pre-trained on millions of videos of humans performing actions in front of, and interacting with, a camera. Both neural networks are small, efficient, and run smoothly in real time on a CPU.
  • Demo applications showcasing the potential of our models: action recognition, gesture control, fitness activity tracking, live calorie estimation.
  • A pipeline to record and annotate your own video dataset and train a custom classifier on top of our models with an easy-to-use script to fine-tune our weights.
Action Recognition

Fitness Activity Tracker and Calorie Estimation

Gesture Control


Requirements and Installation

The following steps are confirmed to work on Linux (Ubuntu 18.04 LTS and 20.04 LTS) and macOS (Catalina 10.15.7).

Step 1: Clone the repository

To begin, clone this repository to a local directory of your choice:

git clone https://github.com/TwentyBN/sense.git
cd sense

Step 2: Install Dependencies

We recommend creating a new virtual environment to install our dependencies using conda or virtualenv. The following instructions will help create a conda environment.

conda create -y -n sense python=3.6
conda activate sense

Install Python dependencies:

pip install -r requirements.txt

Note: pip install -r requirements.txt only installs the CPU-only version of PyTorch. To run inference on your GPU, another version of PyTorch should be installed (e.g. conda install pytorch torchvision cudatoolkit=10.2 -c pytorch). See all available install commands here.

Step 3: Download the SenseKit Weights

Pre-trained weights can be downloaded from here, subject to separate terms. Follow the instructions to create an account, agree to evaluation license and download the weights. Once downloaded, unzip the folder and move the contents into sense/resources. In the end, your resources folder structure should look like this:

resources
├── backbone
│   ├── strided_inflated_efficientnet.ckpt
│   └── strided_inflated_mobilenet.ckpt
├── fitness_activity_recognition
│   └── ...
├── action_recognition
│   └── ...
└── ...

Note: The remaining folders in resources/ will already have the necessary files -- only some additional larger folders need to be downloaded separately.


Getting Started

To get started, try out the demos we've provided. Inside the sense/examples directory, you will find multiple Python scripts that each apply our pre-trained models to a specific use-case. Launching each demo is as simple as running the script in terminal as described below.

The examples will display information on the achieved frame rate in the lower left corner, so you can verify that your installation is running well.

  • Camera FPS is the rate at which frames are read from the webcam or from the provided file. Per default, 16fps is the maximum that was configured as a trade-off between high input frame rate and low computational footprint of the model. The input video stream will be up- or down-sampled accordingly, so that all processing happens in real-time.
  • Model FPS is the rate at which the model produces predictions. In order to keep computations low, our model always collects four frames before passing them through the network, so the expected output frame rate is 4fps. Through temporal convolutions with striding, the model still maintains a larger receptive field.

Demo 1: Action Recognition

examples/run_action_recognition.py applies our pre-trained models to action recognition. 30 actions are supported (see full list here).

Usage:

PYTHONPATH=./ python examples/run_action_recognition.py

Demo 2: Fitness Activity Tracking

examples/run_fitness_tracker.py applies our pre-trained models to real-time fitness activity recognition and calorie estimation. In total, 80 different fitness exercises are recognized (see full list here).

Usage:

PYTHONPATH=./ python examples/run_fitness_tracker.py --weight=65 --age=30 --height=170 --gender=female

Weight, age, height should be respectively given in kilograms, years and centimeters. If not provided, default values will be used.

Some additional arguments can be used to change the streaming source:

  --camera_id=CAMERA_ID           ID of the camera to stream from
  --path_in=FILENAME              Video file to stream from. This assumes that the video was encoded at 16 fps.

It is also possible to save the display window to a video file using:

  --path_out=FILENAME             Video file to stream to

For the best performance, the following is recommended:

  • Place your camera on the floor, angled upwards with a small portion of the floor visible
  • Ensure your body is fully visible (head-to-toe)
  • Try to be in a simple environment (with a clean background)

Demo 3: Gesture Control

examples/run_gesture_control.py applies our pre-trained models to the detection of 8 hand gesture events (6 swiping gestures + thumbs up + thumbs down). Compared to Demo 1, the model used in this case was trained to trigger the correct class for a short period of time right after the hand gesture occurred. This behavior policy makes it easier to quickly trigger multiple hand gestures in a row.

Usage:

PYTHONPATH=./ python examples/run_gesture_control.py

Demo 4: Calorie Estimation

In order to estimate burned calories, we trained a neural net to convert activity features to the corresponding MET value. We then post-process these MET values (see correction and aggregation steps performed here) and convert them to calories using the user's weight.

If you're only interested in the calorie estimation part, you might want to use examples/run_calorie_estimation.py which has a slightly more detailed display (see video here which compares two videos produced by that script).

Usage:

PYTHONPATH=./ python examples/run_calorie_estimation.py --weight=65 --age=30 --height=170 --gender=female

The estimated calorie estimates are roughly in the range produced by wearable devices, though they have not been verified in terms of accuracy. From our experiments, our estimates correlate well with the workout intensity (intense workouts burn more calories) so, regardless of the absolute accuracy, it should be fair to use this metric to compare one workout to another.

Demo 5: Repetition Counting

This demo turns our models into a repetition counter for 2 fitness exercises: jumping jacks and squats.

Usage:

PYTHONPATH=./ python examples/run_fitness_rep_counter.py

Build Your Own Classifier with SenseStudio

This section will describe how you can use our SenseStudio tool to build your own custom classifier on top of our models. Our models will serve as a powerful feature extractor that will reduce the amount of data you need to build your project.

Step 1: Project Setup

First, run the tools/sense_studio/sense_studio.py script and open http://127.0.0.1:5000/ in your browser. There you can set up a new project in a location of your choice and specify the classes that you want to collect.

The tool will prepare the following file structure for your project:

/path/to/your/dataset/
├── videos_train
│   ├── class1
│   ├── class2
│   └── ...
├── videos_valid
│   ├── class1
│   ├── class2
│   └── ...
└── project_config.json
  • Two top-level folders: one for the training data, one for the validation data.
  • One sub-folder for each class that you specify.

Step 2: Data Collection

You can record videos for each class right in your browser by pressing the "Record" button. Make sure that you have ffmpeg installed for that.

Otherwise, you can also just move existing videos into the corresponding project folders. Those should have a framerate of 16 fps or higher.

In the end you should have at least one video per class and train/valid split, but preferably more. In some cases, as few as 2-5 videos per class have been enough to achieve excellent performance with our models!

Step 3: Training

Once your data is prepared, go to the training page in SenseStudio to train a custom classifier. You can specify, which of our pretrained feature extractors should be used and how many of its layers should be fine-tuned. Setting this parameter to 0 means that only your new classification head will be trained.

Step 4: Running your model

The training script will produce a checkpoint file called best_classifier.checkpoint in the checkpoints/<your-output-folder-name>/ directory of your project. You can now run it live using the following script:

PYTHONPATH=./ python tools/run_custom_classifier.py --custom_classifier=/path/to/your/checkpoint/ [--use_gpu]

Advanced Options

You can further improve your model's performance by training on top of temporally annotated data; individually tagged frames that identify the event locally in the video versus treating every frame with the same label. For instructions on how to prepare your data with temporal annotations, refer to this page.

After preparing the temporal annotations for your dataset in SenseStudio, you can run the training with the Temporal Annotations flag enabled to train on those frame-wise tags instead of the whole-video classes.


iOS Deployment

If you're interested in mobile app development and want to run our models on iOS devices, please check out sense-iOS for step by step instructions on how to get our gesture demo to run on an iOS device. One of the steps involves converting our Pytorch models to the TensorFlow Lite format.

Conversion to TensorFlow Lite

Our models can be converted to TensorFlow Lite using the following script:

python tools/conversion/convert_to_tflite.py --backbone_name=StridedInflatedEfficientNet --backbone_version=pro --classifier=gesture_recognition --output_name=model

If you want to convert a custom classifier, set the classifier name to "custom_classifier", and provide the path to the dataset directory used to train the classifier using the "--path_in" argument.

python tools/conversion/convert_to_tflite.py --classifier=custom_classifier --path_in=/path/to/your/checkpoint/ --output_name=model

Gallery

Our gallery lists cool external projects that were built using Sense. Check it out!

Citation

We now have a blogpost you can cite:

@misc{sense2020blogpost,
    author = {Guillaume Berger and Antoine Mercier and Florian Letsch and Cornelius Boehm and 
              Sunny Panchal and Nahua Kang and Mark Todorovich and Ingo Bax and Roland Memisevic},
    title = {Towards situated visual AI via end-to-end learning on video clips},
    howpublished = {\url{https://medium.com/twentybn/towards-situated-visual-ai-via-end-to-end-learning-on-video-clips-2832bd9d519f}},
    note = {online; accessed 23 October 2020},
    year=2020,
}

License

The code is copyright (c) 2020 Twenty Billion Neurons GmbH under an MIT Licence. See the file LICENSE for details. Note that this license only covers the source code of this repo. Pretrained weights come with a separate license available here.

The code makes use of these sounds from freesound:

More Repositories

1

aimet

AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
Python
2,115
star
2

ai-hub-models

The Qualcomm® AI Hub Models are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) and ready to deploy on Qualcomm® devices.
Python
448
star
3

gunyah-hypervisor

Gunyah is a Type-1 hypervisor designed for strong security, performance and modularity.
C
302
star
4

aimet-model-zoo

Python
296
star
5

sample-apps-for-robotics-platforms

C
120
star
6

AFLTriage

Rust
111
star
7

qidk

C
95
star
8

snapdragon-gsr

GLSL
94
star
9

adreno-gpu-opengl-es-code-sample-framework

This repository contains an OpenGL ES Framework designed to enable developers to get up and running quickly for creating sample content and rapid prototyping. It is designed to be easy to build and have the basic building blocks needed for creating an Android APK with OpenGL ES functionality, input system, as well as other helper utilities for loading resources, etc. This Framework has been extracted and is a subset of the Adreno GPU SDK.
C++
58
star
10

cloud-ai-sdk

Qualcomm Cloud AI SDK (Platform and Apps) enable high performance deep learning inference on Qualcomm Cloud AI platforms delivering high throughput and low latency across Computer Vision, Object Detection, Natural Language Processing and Generative AI models.
Jupyter Notebook
52
star
11

adreno-gpu-vulkan-code-sample-framework

This repository contains a Vulkan Framework designed to enable developers to get up and running quickly for creating sample content and rapid prototyping. It is designed to be easy to build and have the basic building blocks needed for creating an Android APK with Vulkan functionality, input system, as well as other helper utilities for loading resources, etc.
C++
43
star
12

upstream-wifi-fw

42
star
13

efficient-transformers

This library empowers users to seamlessly port pretrained models and checkpoints on the HuggingFace (HF) hub (developed using HF transformers library) into inference-ready formats that run efficiently on Qualcomm Cloud AI 100 accelerators.
Python
39
star
14

qbox

Qbox
C++
35
star
15

ai-hub-apps

The Qualcomm® AI Hub apps are a collection of state-of-the-art machine learning applications ready to deploy on Qualcomm® devices.
Java
31
star
16

qca-sdk-nss-fw

27
star
17

fastrpc

C
21
star
18

sense-iOS

Enhance your iOS app with the ability to see and interact with humans using the RGB camera.
Swift
20
star
19

vasp

VASP is a framework to simulate attacks on V2X networks. It works on top of the VEINS simulator.
C++
19
star
20

toolchain_for_hexagon

Shell
18
star
21

software-kit-for-qualcomm-cloud-ai-100

Software kit for Qualcomm Cloud AI 100
C++
16
star
22

gunyah-resource-manager

A Root VM supporting virtualization with the Gunyah Hypervisor.
C
15
star
23

ai-engine-direct-helper

C++
15
star
24

lid

License Identifier
Python
14
star
25

vdds

Highly-optimized intra-process PubSub library with DDS-like interface
C++
13
star
26

android-on-snapdragon

Sample code for 3rd party developers working on Android On Snapdragon
Java
11
star
27

gunyah-c-runtime

A small C runtime for bare-metal VMs on the Gunyah Hypervisor.
C
11
star
28

comment-filter

A Python library and command-line utility that filters comments from a source file
Python
10
star
29

software-kit-for-qualcomm-cloud-ai-100-cc

Software kit for Qualcomm Cloud AI 100 cc
C++
10
star
30

gunyah-support-scripts

Shell
9
star
31

wos-ai-plugins

C++
9
star
32

iodme

IODME (IO Data Mover Engine) is a library, and some tools, for optimizing typical IO operations that involve copying / moving data between memory and file descriptors.
C++
8
star
33

startupkits

Platform Documentation - a collection of documentations (user guides) for startup-kits published on QDN (https://developer.qualcomm.com/hardware/startup-kits)
7
star
34

autopen

Autopen is an open-source toolkit designed to assist security analysts, manufacturers, and various professionals to detect potential vulnerabilities in vehicles.
Python
7
star
35

qccsdk-qcc711

C
7
star
36

license-text-normalizer

License Text Normalizer
Python
6
star
37

aimet-pages

AIMET GitHub pages documentation
HTML
6
star
38

bstruct-mininet

Python
5
star
39

wifi-commonsys

Java
5
star
40

license-text-normalizer-js

License Text Normalizer (JavaScript)
TypeScript
5
star
41

quic.github.io

Landing page for QuIC GitHub
SCSS
4
star
42

musl

musl libc fork for Hexagon support
C
4
star
43

snapdragon-game-plugins-for-unreal-engine

4
star
44

lockers

The lockers package contains various locking mechanism and building blocks.
Shell
4
star
45

sshash

Library and tools for hashing sensitive strings in ELF libraries and executables
C++
4
star
46

hexagonMVM

Assembly
4
star
47

game-assets-for-adreno-gpu-code-samples

Game assets for Adreno GPU code samples
3
star
48

lsbug

lsbug - A collection of Linux kernel tests for arm64 servers
Python
3
star
49

.github

QuIC GitHub organization action templates and config
C
3
star
50

mink-idl-compiler

Rust
3
star
51

ghe-policy-check

Python
2
star
52

quic-usb-drivers

C
2
star
53

sample-apps-for-qualcomm-linux

C++
2
star
54

vsf-service

Python
2
star
55

tps-location-sdk-android

1
star
56

tps-location-sdk-native

HTML
1
star
57

tps-location-quick-start-android

Java
1
star
58

tps-location-quick-start-native

C++
1
star
59

cloud-ai-sdk-pages

1
star
60

sbom-check

Python library and CLI application that check a provided SPDX SBOM for adherence to the official specification SPDX 2.3 specification and for the presence of a configurable set of required field values.
Python
1
star
61

aic-operator

Go
1
star
62

v4l-video-test-app

C++
1
star