• Stars
    star
    129
  • Rank 279,262 (Top 6 %)
  • Language
    Python
  • License
    MIT License
  • Created over 6 years ago
  • Updated over 6 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Gesture recognition neural network to classify various hand gestures

Wave! by Neural Ninja

Wave! is a gesture recognition neural network that uses tf-pose-estimation, a tensorflow wrapper of a pose estimation neural network (OpenPose), to classify various hand gestures in images.

Wave! runs on top of OpenPose, which identifies various joints of humans in a given camera frame, such as their elbow or nose. In most cases, it returns the x and y positions of each bodypart in terms of percentage of the frame, a confidence score for each of these bodyparts, and a confidence score for each human it identifies. During Data Collection, bodypart movement of the human with the highest confidence score is tracked and summed over a series of frames. This data, along with the confidence scores for each bodypart, is passed on to Wave! for classification.

The most successful classification model at the moment is the Post OpenPose Neural Network (POPNN) in TensorFlow. This model is a fully connected feed forward binary classification model. It classifies images as either a "wave" or "no wave".

We also worked on PyTorch LSTM and POPNN models that are more modular than POPNN in TensorFlow and support multiclass classification. We are adding this capability to the TensorFlow models as well.

Application

This project was initially used with a humanoid robot to create an interactive user experience. Wave! does not require to be paired with a robot. In this interactive display, the robot waved if a human in its field of view waved at it. Data from each network, Wave! and tf-pose-estimation, was sent using Robot Operating System (ROS). Inference is done primarily on the Nvidia Jetson TX2, but can also be run on a PC if needed. Wave! can run on any robot that support ROS.

Note: Wave! is only compatible with Python 2, due to ROS requirements and syntax.

Setup

Unless specified, perform the following setup commands on your Jetson TX2 and PC (used for training).

Follow the instructions here to setup Ubuntu and flash additional software on your TX2.

Clone the Wave! repository:

$ git clone --recursive https://github.com/NVIDIA-Jetson/Gesture-Recognition.git

Install program dependencies:

This installs argparse, dill, matplotlib, psutil, requests, scikit-image, scipy, slidingwindow, tqdm, setuptools, Cython, pyTorch, and sklearn.

$ sudo pip install -r requirements.txt

Install OpenCV:

$ sudo apt-get install python-opencv

Clone the tf-pose-estimation repository:

$ git clone --recursive https://github.com/ildoonet/tf-pose-estimation.git

Configure tf-pose-estimation for our purposes:

This replaces tf-openpose's estimator.py with a custom version for Wave!, and then installs tf-openpose.

$ cp -f wavenet/estimator.py tf-openpose/tf_pose/estimator.py
$ cd tf-openpose
$ python setup.py install

If you get an error along the lines of "can't find Arrow package" when installing tensorpack in the setup.py install, particularly when installing tf-pose on a Jetson our suggested fix is to delete the install of tensorpack (Take out any references to tensorpack or the github link for tensorpack in the setup.py) and then manually install tensorflow with TRT support on your Jetson with this link. After downloading the pip wheel, install it as follows:

$ sudo pip install tensorflow-1.7.0-cp27-cp27mu-linux_aarch64.whl

Install rospy on the Jetson TX2:

Make sure you restart your terminal after setting up rospy (on both the Jetson TX2 and PC) before running anything with rospy to ensure everything runs smoothly.

$ git clone --recursive https://github.com/jetsonhacks/installROSTX2
$ cd installROSTX2
$ ./installROS.sh -p ros-kinetic-desktop-full
$ ./setupCatkinWorkspace.sh

Install rospy on your PC by following the instructions here

Install pyTorch here, and follow the instructions on site to install the right version depending on your version of CUDA.

Project Pipeline

Network Pipeline

Using generate_video.py, we generate videos of people performing the activities of different data classes ("waving", "X-Posing", etc.). These videos are given to generate_data_from_video.py, which runs inference on tf-openpose and extracts data for the networks. You can train a network by either running lstm.py, popnn_torch.py, or popnn4.py, with their corresponding arguments. Finally, you can run inference on this model by running either lstm_multi_inference.sh, popnn_torch_inference.sh, or popnn_inference.sh. These bash scripts run thread_inf.py, which creates multiple threads for collecting data for inference, and publishes them for inference, and either lstm_inference.py ,popnn_torch_inference.py, or popnn4inference.py, which run inference on the models.

Multi-Threaded Inference

As mentioned in Project Pipeline, thread_inf.py, the program that runs inference on tf-openpose to collect real-time data to feed to the network, is multi-threaded to increase the FPS of inference. The main thread starts 3 different threads which capture webcam frames, run pose estimation on these frames, and display the frame with pose estimation drawn on it. The main thread then publishes the pose estimation data on ROS for the main inference script. This functionality is also available on the TensorFlow Models.

LSTM

The Wave! LSTM was made to take advantage of the sequential nature of camera feed data. One of the newer features of the PyTorch models (not in the TensorFlow models) is their adaptability and multiclass ability. Thanks to data collection and loading that can accomadate as many data features as needed (default: position, score, movement vector components). If you want to add your own data features or modify the PyTorch models or inference, check out the PyTorch Modification portion of this README. The parameter centralization class, var, in var.py, allows for tinkering of model hyperparameters, such as number of classes, over all scripts that use them. As a result, it is easy to rapidly train and deploy Wave! PyTorch nets. At the moment, both of the PyTorch Models are not as accurate as POPNN (Post OpenPose Neural Network) in TensorFlow (by a small margin). As mentioned before, feel free to run, modify, or reference them.

LSTM Network Architecture

The LSTM is comprised of a two layer, stacked LSTM, followed by 3 linear layers. After each of the linear layers is a dropout layer, during training, and an activation function, RELU after the first two layers, and a log sigmoid after the final layer to get the output between 0 and 1.

Note: Due to the increased complexity of the model, the LSTM runs slower than both POPNNs.

POPNN

The Wave! Post OpenPose Neural Network (POPNN) is the original, and most accurate model (the TensorFlow version). At the moment, it is currently integrated with the var class, but does not allow for multiclass classification or more than one feature. The model itself trains much faster than the LSTM, with 1200 epochs finishing in around five minutes. As mentioned before, feel free to run, modify, or reference this code, and contribute back if you would like.

POPNN Network Architecture

Just like the TensorFlow version, the PyTorch POPNN consists of 4 linear layers. After the first three of these layers is a dropout layer and a RELU, and the final linear layer has a softmax after it to squeeze the output between 0 and 1. It also has dropout layers during training to prevent overfitting.

Var.py

var.py is the parameter hub of the repository. It contains the hyperparameters of the models and other often referenced by many other scripts, such as the classification classes and number of data features. var.py offers control over the more nitpicky hyperparameters of the models that their argparse flags do not, like hidden layer sizes, while allowing you to do so in only one place. Because so many file reference var.py, it is imperative that any modifications made be reflected in var.py (This mainly applies to data collection programs). For example, if you add a new feature to data collection that you want to train on, you must change the features variable in var.py accordingly.

Usage

var.py is a class that when initialized in a program, contains data relating to many different paraments.

To import and create an instance of the var class (if you need to):

from var import var
v = var(use_arm)

use_arm in this case is a boolean which if True, means that only 8 joints (out of 18 total) are being used to collect data on or train on (arm joints, nose).

Var comes with several helpful methods to extract parameters from itself.

input_size = v.get_size(): # Get the input size of data/Number of joints being saved
lstm_vars  = v.get_LSTM(): # Get full list of LSTM variables'''
popnn_vars  = v.get_POPNN() # Get full list of POPNN variables
classes  = v.get_classes() # Get classification classes
num_features  = v.get_num_features(): '''Get number of features of data collection. For example, using x position, y position, and score would be 3 features.'''
num_classes = v.get_num_classes(): # Get number of classes

Data Collection

Wave! uses the difference in position of joints, as well as their confidence scores, to classify an image. During data collection, Wave! runs inference on OpenPose, and saves the positions of bodyparts from each iteration. The differences between bodypart positions over 4 (this number can be changed in the data collection phase) consecutive frames is summed and saved as an .npz, along with the averages of bodypart scores over the same frames. Every datapoint is saved in terms of percentage of frame in the GestureData directory.

Our data collection process consists of two steps : video generation, and data generation. Because the videos are generated and saved separately, you can reuse videos to extract different features from them with different argparse flags as shown below.

Collecting Data via Video

Collects Data and stores it in the .avi format. Data can be used to create data for Wave!

To collect video data, run generate_video.py:

$ python generateVideo.py # --camera=<port number> --time=<recording seconds> --dir=<datafile>
# Uncomment the flags above to either manually set camera number (default=1), change video length or change save directory

When prompted, type in the data class ("wave", "no wave", etc.) that you want to collect data for.

To generate training data from these videos, run generate_data_from_video.py:

$ python generateDataFromVideo.py #--exit_fps=<fps> -f=<numFrames> -a -o -s=<startVideoNum> -e=<endVideoNum> 

Uncomment exit_fps to set the desired fPS of the data collected (inference on Jetson can be slower than inference on a PC). The default is 5. Uncomment -f to set the the number of frames that movement is aggregated over. The default is 4. Add the -a flag to collect data on net distance and angles (vectors) instead of change in x and y. When running this file, the default data that is generated is for every keypoint on every person's body. However, when recognizing waves, the network only cares about the shoulders, elbows, wrists, and nose. So, to only generate data for these seven keypoints, add the -o flag. By default, generate_data_from_video.py generates data from all files in your data directory. To only generate data from a couple of videos in the data directory, specify a video number to start with using the -s flag and a video number to end on with the -e flag.

Smart Labeler

Just like POPNN, there is a smart labeler, which can detect edge cases that fail and create valid data that you can transfer learn off of.

To collect data with the smart labeler, run this command:

$ bash smartLabeler.sh # --camera=<port number> --debug --wave
# Uncomment the flags above to either manually set camera number (default=1) or say that the user is waving (no wave without flag).
# If using all flags, write in the above order

If you are generating videos on multiple computers and are compiling all video data on git, you may need to rename your label and data files to avoid merge conflicts or overwritten data. For your convenience, we have created renamer.py, which you can use to change the numbering of files in the data/video/videos and data/video/labels folders. So, you will be able to merge data from different machines seamlessly. To rename the video files, run renamer.py:

$ python renamer.py -b=<buffer>

The -b flag should be an integer by which the program increments the filenames.

Training

Before training, ensure all your data is in the appropriate folders.

The best results are usually obtained when the amount of "no wave" data is close the amount of "wave" data. To check how much of each type of data you have accumulated, run data_count.py as follows:

$ python data_count.py -f=<numFrames> 

Set the -f flag to the number of frames specified when creating data from videos.

LSTM

Run lstm.py in order to start training the LSTM (run popnn.py with the same arguments to train a POPNN model). You can adjust the learning rate, hidden layer size, and more to experiment with different net structures.

$ python lstm.py # -s=<save_fn> -t -c=<ckpt_fn> -lr=<learning_rate> -d=<learning_rate_decay> -e=<numEpochs> -f=<numFrames> -b=<batch_size> -o

To keep more than one checkpoint file, specify a filename to save the generated checkpoint file to save to. The default is lstm000.ckpt, and is saved in the lstmpts directory. If left unchanged, newly generated checkpoint files will overwrite one another. Checkpoint files are given the .ckpt filename extension in this project. The LSTM gives you control over the network's finer details, including learning rate, number of epochs, epoch batch size, and the rate of decay of the learning rate (if not specified, learning rate will stay constant).

If you would like to transfer learn from one model to the next, uncomment -t in the command above. Specify a checkpoint filename after the -c flag to transfer learn from. The default is lstm307.ckpt, our best performing model. Also, if you only want to train on the data from the arm keypoints, uncomment the -o flag.

To save a model while it's training, press Ctrl + C.

POPNN (TensorFlow)

Before training, ensure all your data is in the appropriate folders.

To train the network, run popnn4.py:

$ python popnn4.py --dest_ckpt_name=<saveFileName> # -t -f=<numFrames> --debug --bad_data

The --dest_ckpt_name or -d flag is required, and should be set the the filename the generated checkpoint file will be saved to. Checkpoint files are given the .ckpt filename extension in this project.

The program will print the loss after every epoch, as well as the training accuracy and validation accuracy.

For transfer learning, add the commented out -t tag, and the program will prompt you for a file to transfer learn off of. This loads the previously trained weights from the pretrained checkpoint file, and trains with them.

Uncomment the --debug flag for some helpful print statements for debugging purposes. Finally, uncomment the --bad_data or -bd flag to only use data in which humans in frame are not flickering in and out of frame.

Note: POPNN is tensorboard compatible. We have currently implemented some basic tensorboard summaries so you can visualize your graphs as specified in the TensorBoard API.

POPNN (PyTorch)

Graphs

After training a model, a graph of the checkpoint file will be generated (for PyTorch models), showing the loss, training accuracy, and validation accuracy for every 25 epochs. This can help you visualize how good your model is. Below, you can see an example of a generated graph from model 307.

Inference

LSTM

To run inference on the LSTM, run the lstm_multi_inference.sh with the following command:

$ bash lstmMultiInference.sh  # <port number> <numFrames>-c=<checkpoint file number> -o -a

Like the other inferences, uncomment the flags to specify a camera port number, or debug the code. Keep the flags in the same order. Even if you are not using them all, you must type out all flags before the ones you want to use (i.e if you want to specify a checkpoint file, you must specify a camera port number and the number of frames to aggregate over). Change the -c flag to the name of the new checkpoint file you would like to run inference on. If you only want to run inference on the seven keypoints necessary to recognize a wave, simply uncomment the -o flag.

POPNN (TensorFlow)

Wave! can run inference on multiple people in a frame. To run inference, run the following command:

$ bash popnn_inference.sh

popnn_inference.sh launches the roscore, runs inference on Wave!, and sends data through ROS publishers. However, the data collection process in popnn_inference.sh is modified to return data of every human tf-pose-estimation detects in each frame. If the number of humans changes, the data collector looks at a new batch of 4 images, because we do not know who left or entered the image. To specify which camera port is being used, add the --camera=<port number> flag. To debug, add the --debug flag after the --camera flag.

POPNN (PyTorch)

Graphs

After training a model, a graph of the checkpoint file will be generated (for PyTorch models), showing the loss, training accuracy, and validation accuracy for every 25 epochs. This can help you visualize how good your model is. Below, you can see an example of a generated graph from model 307.

Other Notes

For all the networks we have made, there have been a few naming conventions we have used within our program, some root files we have modified, and other short things to note for the user.

PyTorch Modification

As mentioned earlier, the PyTorch models, data collection methods, and inference code was made to make modification as easy as possible. A lot of this is thanks to the var class, which allows for all programs to refer to a single list of important parameters.

Data Collection

The primary data creation program is generate_data_from_video.py, which extracts certain data features, such as x and y position, score, and distance traveled. If you want to extract other features, or extract less features, modify the code loop to accumulate data in a numpy array, and add it to the list, features in the following line.

    features = [xDifferences, yDifferences]

By default, the scoreAvgs variable is added to the end of features to allow for the mulitiplication of score and other features. Additionally, change the num_features variable in var.py to reflect the number of features you are using. After training the network, you must modify the post_n_pub method in the PosePub class to collect the same data during inference and publish it, as shown here. You can also follow the usage of publishers in our code. In order to do so, you must : create a rospy publisher, accumuluate data in a numpy array, and publish the data (most likely as a string).

To collect data for different classes, add them to the classes dictionary in var.py in the following format.

classes = {previous classes, number : "class name"}

When generating video using generate_video.py, you will be prompted for what data class you are recording data for.

Inference

If you add more features in the data collection phase, train the network on these features, and send these features through post_n_pub, you must change the inference scripts, lstm_inference.py and popnn_torch_inference.py to accomadate the new data. First, you must create a rospy subscriber to listen to the features from post_n_pub, as shown here. You can also follow the usage of subscribers in our code.

Data Collection Naming Conventions

To avoid confusion between the two POPNN models, the popnn models in PyTorch, as well as their accompanying files, have _torch in their name. For all video data collected we use the form #.avi, where # is the number of the latest file stored. Thus, we encourage you to keeping using that so that generate_data_from_video.py does not incure bugs.

For all processed data in npz format, we use the format gestureData#.npz, where # is the number of the latest file stored. Again, for the sake of bugs, we encourage you to keep using that in the event you inject your own custom npz files. The standard dataloader.py should take care of this kind of data well.

All labels are in the format #.txt, similar to the video data, for the same reasons.

Transfer Learning and Model Saving Conventions

After training a model, they will save a checkpoint file which stores the previous weights of the model and can be used to inference on the previous model and to transfer learn for new models.

The naming convention for our checkpoint files is in the format fooxyz.ckpt, where foo is the name of the network type, either lstm or popnn depending on your network, x is for the network architecture (e.g. currently 4 for the best stable version of popnn), y is for the type of data collected (e.g. currently 3 for our 3rd major data change in version 4) and can also represent image frames per data instance (e.g. 7 for 7 frames per data instance and 10 for 10 frames per data instance), and z is for the latest training iteration. .ckpt is the suffix used fo rthe checkpoint files.

Some of our most stable popnn versions include 4.2.7, 4.3.4, and 4.3.11. There tend to be gaps between many versions due to many bad training iterations, and/or versions we felt were inferior to the other versions. The most stable lstm version is 3.0.7.

General File and Code Naming Conventions

When naming python files, we use underscores to separate words, rather than using camel case. This helps with file readibilty.

When adding detailed comments to the code, the comments are in the format ''' comment ''', instead of using the format #comment.

Modifications to TF-Pose

If the tf_pose library has been giving you issues, make sure you refer to our setup portion up above. tf_pose is fantastic for tensorflow pose-estimation (the alternatives are written in the less general user-friendly caffe) and thus can be easily ported to other tensorflow networks or models.

If you are modifying code in TF-Pose, you may need to see which numbers correspond to which joints in numJoints[]. Here is a list of numbers and their corresponding joints:

Joint Number Joint Name
0 Nose
1 Neck
2 Right Shoulder
3 Right Elbow
4 Right Wrist
5 Left Shoulder
6 Left Elbow
7 Left Wrist
8 Right Hip
9 Right Knee
10 Right Ankle
11 Left Hip
12 Left Knee
13 Left Ankle
14 Right Eye
15 Left Eye
16 Right Ear
17 Left Ear

GPU Configuration

tf_pose takes up a lot of compute, and sometimes can eat of your memory if you're not careful with limiting how much memory it can take per operation. If you are getting GPU sync errors or fatal errors (NOT warnings) to do with RAM or GPU RAM allocation, you can limit the memory used by tf_pose per process for inference in the wave_inference file as follows:

Line 93: s_tf_config = tf.ConfigProto(gpu_options=tf.GPUOptions(per_process_gpu_memory_fraction=n))

Where n is any fraction between 0 and 1, set to 0.2 by default. Reduce this if you run into RAM allocation problems

X-Pose, Y-Pose, and Dab Recognition

In addition to detecting "waves" and "no waves," we have built in recognition for other gestures - x-poses, y-poses, and dabs. To make an x-pose, cross your arms in front of your body to make an "x" shape. To make a y-pose, put your arms above your head in a "y" shape. In order to recognize these poses, we took two different approaches-one where we hard coded recognition, and another where we trained a neural network to recognize the poses.

Hard Coded Recognition

When running lstm_wave_inference, we print whether someone in the frame is doing one of these gestures.

Neural Net Recognition

Bad Data Nuances

When looking through our code you may see numerous references to 'bad data', and as could probably be inferred, bad data refers to data mistakes that TF_Pose produces when doing the preprocessing for our network. While, of course, not all data mistakes can be accounted for, there are a few big ones we can account for, that do in fact affect our dataset and our inference.

Up and Down Arms

The first major data deficiency we need to account for is that when arms are down in a frame, we multiple the joint scores by -1 as a preprocess to our network. However, when lifting that arm to the point where its positive, this results in a large perceived shift in distance by our network, even when there is none. In fact, if you look through our code in thread_inf.py, you will see that we've accounted for this by labeling changes from positive to negative arm position and vice versa so that the network doesn't receive that bad data (we send a null array instead using pub_null()). This is something we have not left as an option for the user to change simply with a flag because it is a very big deficiency. If you are indeed curious, however, please do check it out in the code!

Flashing Arms

This deficiency, while minor, occurs when TF_Pose can't properly identify arms from frame to frame, so the arms are sometimes there and sometiems nonexistent. As a result, we've added a flag which can be toggled to judge whether or not the inference should account for that and send a null array when it happens. The upside of enabling the flag is that you will have less messy data, but the downside is that your inference can take a bit longer if the person isn't sitting right in the frame for a while, and the network sensitivity can decrease as well.

License

MIT, see LICENSE

Authors

Wave! was made Maddie Waldie, Nikhil Suresh, Jackson Moffet, and Abhinav Ayalur, four NVIDIA High School Interns, during the summer of 2018.

More Repositories

1

torch2trt

An easy to use PyTorch to TensorRT converter
Python
4,547
star
2

jetbot

An educational AI robot based on NVIDIA Jetson Nano.
Jupyter Notebook
3,012
star
3

deepstream_python_apps

DeepStream SDK Python bindings and sample applications
Jupyter Notebook
1,439
star
4

Lidar_AI_Solution

A project demonstrating Lidar related AI solutions, including three GPU accelerated Lidar/camera DL networks (PointPillars, CenterPoint, BEVFusion) and the related libs (cuPCL, 3D SparseConvolution, YUV2RGB, cuOSD,).
Python
1,249
star
5

deepstream_reference_apps

Samples for TensorRT/Deepstream for Tesla & Jetson
C++
1,127
star
6

jetracer

An autonomous AI racecar using NVIDIA Jetson Nano
Jupyter Notebook
1,059
star
7

redtail

Perception and AI components for autonomous mobile robotics.
C++
1,013
star
8

trt_pose

Real-time pose estimation accelerated with NVIDIA TensorRT
Python
974
star
9

tf_trt_models

TensorFlow models accelerated with NVIDIA TensorRT
Python
683
star
10

nanosam

A distilled Segment Anything (SAM) model capable of running real-time with NVIDIA TensorRT
Python
616
star
11

cuPCL

A project demonstrating how to use the libs of cuPCL.
C++
551
star
12

yolo_deepstream

yolo model qat and deploy with deepstream&tensorrt
Python
534
star
13

CUDA-PointPillars

A project demonstrating how to use CUDA-PointPillars to deal with cloud points data from lidar.
Python
525
star
14

tf_to_trt_image_classification

Image classification with NVIDIA TensorRT from TensorFlow models.
Python
454
star
15

jetcam

Easy to use Python camera interface for NVIDIA Jetson
Jupyter Notebook
426
star
16

deepstream_tao_apps

Sample apps to demonstrate how to deploy models trained with TAO on DeepStream
C++
369
star
17

jetson_benchmarks

Jetson Benchmark
Python
363
star
18

deepstream_360_d_smart_parking_application

Describes the full end to end smart parking application that is available with DeepStream 5.0
JavaScript
340
star
19

deepstream_pose_estimation

This is a DeepStream application to demonstrate a human pose estimation pipeline.
C++
290
star
20

jetson_dla_tutorial

A tutorial for getting started with the Deep Learning Accelerator (DLA) on NVIDIA Jetson
Python
272
star
21

face-mask-detection

Face Mask Detection using NVIDIA Transfer Learning Toolkit (TLT) and DeepStream for COVID-19
Python
243
star
22

nanoowl

A project that optimizes OWL-ViT for real-time inference with NVIDIA TensorRT.
Python
230
star
23

deepstream-occupancy-analytics

This is a sample application for counting people entering/leaving in a building using NVIDIA Deepstream SDK, Transfer Learning Toolkit (TLT), and pre-trained models. This application can be used to build real-time occupancy analytics applications for smart buildings, hospitals, retail, etc. The application is based on deepstream-test5 sample application.
C
217
star
24

tensorrt_plugin_generator

A simple tool that can generate TensorRT plugin code quickly.
Python
215
star
25

jetcard

An SD card image for web programming AI projects with NVIDIA Jetson Nano
Python
210
star
26

trt_pose_hand

Real-time hand pose estimation and gesture classification using TensorRT
Jupyter Notebook
207
star
27

redaction_with_deepstream

An example of using DeepStream SDK for redaction
C
205
star
28

deepstream_lpr_app

Sample app code for LPR deployment on DeepStream
C
203
star
29

jetson-cloudnative-demo

Multi-container demo for Jetson Xavier NX and Jetson AGX Xavier
Shell
186
star
30

cuDLA-samples

YOLOv5 on Orin DLA
Python
177
star
31

jetson-multicamera-pipelines

Python
158
star
32

jetson-intro-to-distillation

A tutorial introducing knowledge distillation as an optimization technique for deployment on NVIDIA Jetson
Python
143
star
33

clip-distillation

Zero-label image classification via OpenCLIP knowledge distillation
Python
104
star
34

ros2_torch_trt

ROS 2 packages for PyTorch and TensorRT for real-time classification and object detection on Jetson Platforms
Python
101
star
35

yolov5_gpu_optimization

This repository provides YOLOV5 GPU optimization sample
Python
100
star
36

Foresee-Navigation

Semantic-Segmentation based autonomous indoor navigation for mobile robots
C++
91
star
37

deepstream_parallel_inference_app

A project demonstrating how to use nvmetamux to run multiple models in parallel.
C++
90
star
38

deepstream_4.x_apps

deepstream 4.x samples to deploy TLT training models
C++
85
star
39

tao-toolkit-triton-apps

Sample app code for deploying TAO Toolkit trained models to Triton
Python
84
star
40

ros2_deepstream

ROS 2 package for NVIDIA DeepStream applications on Jetson Platforms
Python
82
star
41

argus_camera

Simple Python / C++ interface to CSI camera connected to NVIDIA Jetson.
C++
81
star
42

turtlebot3

Autonomous delivery robot with turtlebot3 and Jetson TX2
C++
79
star
43

ros2_jetson

Shell
79
star
44

jetson-copilot

A reference application for a local AI assistant with LLM and RAG
Python
79
star
45

jetson-stereo-depth

Python
78
star
46

my-jetson-nano-baseboard

An open source Jetson Nano baseboard and tools to design your own.
Python
77
star
47

nvidia-tao

Jupyter Notebook
77
star
48

jetnet

Easy to use neural networks for NVIDIA Jetson (and desktop too!)
Python
75
star
49

deepstream_triton_model_deploy

How to deploy open source models using DeepStream and Triton Inference Server
C++
73
star
50

jetson-generative-ai-playground

71
star
51

ros2_tao_pointpillars

ROS2 node for 3D object detection using TAO-PointPillars.
C++
70
star
52

Formula1Epoch

An autonomous R.C. racecar which detects people.
Makefile
66
star
53

ros2_trt_pose

ROS 2 package for "trt_pose": real-time human pose estimation on NVIDIA Jetson Platform
Python
63
star
54

Electron

An autonomous deep learning indoor delivery robot made with Jetson
C++
62
star
55

deepstream_dockers

A project demonstrating how to make DeepStream docker images.
Shell
57
star
56

ros2_jetson_stats

ROS 2 package for monitoring and controlling NVIDIA Jetson Platform resources
Python
56
star
57

isaac_ros_apriltag

CUDA-accelerated Apriltag detection
C++
55
star
58

jetson-trashformers

Autonomous humanoid that picks up and throws away trash
C++
52
star
59

NVIDIA-Optical-Character-Detection-and-Recognition-Solution

This repository provides optical character detection and recognition solution optimized on Nvidia devices.
C++
51
star
60

sdg_pallet_model

A pallet model trained with SDG optimized for NVIDIA Jetson.
Python
48
star
61

JEP_ChatBot

ChatBot: sample for TensorRT inference with a TF model
Python
46
star
62

jetson-min-disk

Shell
45
star
63

whisper_trt

A project that optimizes Whisper for low latency inference using NVIDIA TensorRT
Python
44
star
64

Deepstream-Dewarper-App

This project demonstrate how to infer and track from a 360 videos by using the dewarper plugin.
C
43
star
65

deepstream-retail-analytics

A DeepStream sample application demonstrating end-to-end retail video analytics for brick-and-mortar retail.
C++
42
star
66

isaac_ros_image_pipeline

Isaac ROS image_pipeline package for hardware-accelerated image processing in ROS2.
C++
41
star
67

gesture_recognition_tlt_deepstream

A project demonstrating how to train your own gesture recognition deep learning pipeline. We start with a pre-trained detection model, repurpose it for hand detection using Transfer Learning Toolkit 3.0, and use it together with the purpose-built gesture recognition model. Once trained, we deploy this model on NVIDIA® Jetson™ using Deepstream SDK.
C
40
star
68

synthetic_data_generation_training_workflow

Workflow for generating synthetic data and training CV models.
Jupyter Notebook
38
star
69

YOLOv5-with-Isaac-ROS

Sample showing how to use YOLOv5 with Nvidia Isaac ROS DNN Inference
Python
38
star
70

retinanet_for_redaction_with_deepstream

This sample shows how to train and deploy a deep learning model for the real time redaction of faces from video streams using the NVIDIA DeepStream SDK
C
37
star
71

scene-text-recognition

Python
34
star
72

deep_nav_layers

A series of plugins to the ROS navigation stack to incorporate deep learning inputs.
Makefile
33
star
73

Nav2-with-Isaac-ROS-GEMs

Python
33
star
74

tao_toolkit_recipes

Jupyter Notebook
32
star
75

GreenMachine

AI kiosk with a camera and a projector to visualize waste type of cafeteria objects
Python
32
star
76

viz_3Dbbox_ros2_pointpillars

Visualization tool for 3D bounding box results of TAO-PointPillars
Python
28
star
77

isaac_demo

Set of demo to try Isaac ROS with Isaac SIM
Python
27
star
78

tlt-iva-examples

A notebook that demonstrates how to use the NVIDIA Intelligent Video Analytics suite to detect objects in real-time. We use Transfer Learning Toolkit to train a fast and accurate detector and DeepStream to run that detector on an NVIDIA Jetson edge device.
Jupyter Notebook
27
star
79

mmj_genai

A reference example for integrating NanoOwl with Metropolis Microservices for Jetson
Python
25
star
80

TAO-Toolkit-Whitepaper-use-cases

TAO best practices. How to adapt for a new domain, new classes, and generalize the model with a small dataset using Nvidia's TAO toolkit
Jupyter Notebook
24
star
81

ros2_nanollm

ROS2 nodes for LLM, VLM, VLA
Python
24
star
82

caffe_ros

Package containing nodes for deep learning in ROS.
C++
23
star
83

jetson_isaac_ros_visual_slam_tutorial

Hosting a tutorial documentation for running Isaac ROS Visual SLAM on Jetson device.
23
star
84

jetbot_mini

Python
22
star
85

centernet_kinect

Real-time CenterNet based object detection on fused IR/Depth images from Kinect sensor. Works on NVIDIA Jetson.
Python
19
star
86

deepstream_libraries

DeepStream Libraries offer CVCUDA, NvImageCodec, and PyNvVideoCodec modules as Python APIs for seamless integration into custom frameworks.
Python
19
star
87

robot_freespace_seg_Isaac_TAO

In this workflow we demonstrate using SDG + TAO for a freespace segmentation application
Python
17
star
88

deepstream-yolo3-gige-apps

A project demonstration on how to use the GigE camera to do the DeepStream Yolo3 object detection, how to set up the GigE camera, and deployment for the DeepStream apps.
C
16
star
89

ros2_torch2trt_examples

ros2 packages for torch2trt examples
Python
15
star
90

ros2_trt_pose_hand

ROS2 package for trt_pos_hand, "Real-time hand pose estimation and gesture classification using TensorRT"
Python
14
star
91

deepstream_triton_migration

Triton Migration Guide for DeepStreamSDK.
14
star
92

ROS2-NanoOWL

ROS 2 node for open-vocabulary object detection using NanoOWL.
Python
14
star
93

jetson-platform-services

A collection of reference AI microservices and workflows for Jetson Platform Services
Jupyter Notebook
13
star
94

jetson_virtual_touchpanel

Enables Jetson to be controlled with handpose using trt_pose
Python
12
star
95

deepstream-segmentation-analytics

A project demonstration to do the industrial defect segmentation based on loading the image from directory and generate the output ground truth.
C
11
star
96

isaac_ros_common

Isaac ROS common utilities, Dockerfiles, and testing code.
Python
11
star
97

tao_byom_examples

Examples of converting different open-source deep learning models to TAO compatible format through TAO BYOM package.
Python
11
star
98

husky_demo

Husky Simulation and Hardware In the Loop simulation on Isaac SIM with Isaac ROS
Python
10
star
99

mmj_utils

A utility library to help integrate Python applications with Metropolis Microservices for Jetson
Python
9
star
100

a2j_handpose_3d

Python
8
star