• Stars
    star
    143
  • Rank 257,007 (Top 6 %)
  • Language
    Python
  • License
    Other
  • Created over 1 year ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A tutorial introducing knowledge distillation as an optimization technique for deployment on NVIDIA Jetson

Jetson Introduction to Knowledge Distillation

This repository contains tutorial that is an introduction to the subject of knowledge distillation, which can be a helpful tool to use when deploying large models to the edge.

In this tutorial, we explore knowledge distillation by transfering the knowledge from the OpenCLIP vision-language model into a ResNet18 model for the purpose of classification on the STL10 dataset. We'll explore how the data used for distillation, the methods used for distillation, and model architecture impact the final accuracy. We'll also discuss how you can profile and optimize models for final real-time deployment on NVIDIA Jetson Orin Nano.

This tutorial can help you learn how to take large existing models, and transfer knoweldge for use with architectures that are better suited for edge deployment because they have lower memory consumption, higher throughput, or better architecture support (ie: running on Jetson AGX Orin's Deep Learning Accelerator).

This tutorial is intended to introduce knowledge distillation conceptually, and help guide you on the process. We've included the code for reproducing the results, and we include some inline code for clarity in expressing the concepts. However, this tutorial assumes you have some familiarity with training neural networks, so we won't cover all of the code explicitly in detail.

If you're familiar with deep learning and model training, and are looking for ways to bring large models to the edge, this tutorial may be a helpful introduction for you!

Let's get started!

Check out our other project clip-distillation to see how you can use knoweldge distillation to create your own custom image classifier without any labeled data!

Table of Contents

What is knowledge distillation?

Knowledge distillation is a technique for transferring the knowledge from one neural network, the teacher, to another neural network, the student.

Image Credit: Knowledge Distillation: A Survey

This process can take a variety of forms that may be classified as

  1. Response Knowledge Distillation: Training the output class probability distribution to match the teacher probability distribution using a divergence loss (ie: using KL divergence).
  2. Feature Knowledge Distillation: Training the internal features of a student model to directly match the internal features of a teacher model (ie: using Mean Squared Error).
  3. Relative Knowledge Distillation: Training the relative distribution of features in the teacher model to match the relative distribution of features in the student.

In this tutorial, we'll be exploring (1, 2) given their simplicty compared to relative knowledge distillation We're particularily interested in exploring how we can use knowledge distillation to take a large transformer based teacher model (OpenCLIP), and train a faster, lower memory model (ResNet18) that is better suited for edge deployment. We'll explore this concept by aligning OpenCLIP as an image classifier targeting the STL10 classification dataset. We'll explore how the data and techniques used for distillation impact the final model accuracy.

For a more in-depth introduction to knowledge distillation, we recommend reviewing Knowledge Distillation: A Survey.

What is OpenCLIP?

The teacher model we'll be exploring in this tutorial is OpenCLIP. In case you're unfamiliar, OpenCLIP is an open source implementation of OpenAI's CLIP (Contrastive Language-Image Pre-training) . CLIP models are trained to match images with text. To be specific, the model is comprised of

  1. An image encoder, that takes an image and produces an embedding represents the image
  2. A text encoder, that takes a text prompt and produces an embedding that represents the text prompt

The model is trained so that the image and text embeddings of paired images and text are similar (low distance) and the image and text embeddings of non-paired images and text are very different (high distance).

Image Credit: https://github.com/openai/CLIP

An interesting aspect of this model is that it's trained on a large amount of unstructured data and has learned features that can transfer to a wide variety of tasks. In fact, it can achieve good zero-shot accuracy on classification tasks by simply providing descriptions of the classes like:

  1. "A dog"
  2. "A cat"

However, the downside of this model is that it has relatively high runtime and memory consumption compared to CNN based architectures like ResNet. This begs the question: can we leverage the capabilities of OpenCLIP while obtaining lower memory consumption and latency? In this tutorial, we'll explore how we can use the OpenCLIP model as a teacher model, to train a CNN model for a classification task. We'll explore how data used for training, methods for aligning the model for the classification task, and methods fo distilling the model impact the final accuracy.

What is the STL10 Dataset?

To explore how we can use OpenCLIP for classification and knowledge transfer, we'll use the STL10 dataset from Stanford. The STL10 dataset is a classification dataset with 10 classes:

  1. Airplane
  2. Bird
  3. Car
  4. Cat
  5. Deer
  6. Dog
  7. Horse
  8. Monkey
  9. Ship
  10. Truck
Image Credit: https://cs.stanford.edu/~acoates/stl10/

We chose this dataset over other classification datasets like MNIST and CIFAR10 because:

  1. Compared to MNIST, it contains natural images, which are suited towards the use of OpenCLIP.
  2. Compared to CIFAR10, the images are 96x96 resolution, rather than 32x32. Which is a closer fit to the training resolution of OpenCLIP.
  3. It contains a large number of unlabeled images (100,000), which allows us to explore the benefit of using unlabeled data during training.

As a disclaimer, because this tutorial only explores the STL10 dataset, some results may be dependent on this particular data distribution and task. While we can't guarantee the results will translate to other tasks and data sources, we hope that you will find this tutorial helpful as an introduction to this subject so you can explore knowledge distillation with your own data.

Evaluating OpenCLIP for image classification on the STL10 dataset

Now that we have an introduction to knowledge distillation, the teacher model we're using, and the dataset and task we're targeting. Let's get started!

Before we begin towards using our teacher model to train the student, it's good to have an initial expectation how well our teacher model will work for the task at hand. This can roughly help set expectations for what the best performance we hope to achieve with our student model is.

Because OpenCLIP isn't directly trained on the STL10 dataset, there are a few ways we can align this model to perform classification. Let's explore these.

Using text prompts for classification

The first, and simplest, way we can use OpenCLIP for classification on the STL10 dataset, is to define the classes using text prompts, run the prompts through the text encoder, and compare each encoded text prompt with the ground truth label.

For STL10 dataset, we can generate the text embeddings as follows

import open_clip

model, _, preprocess = open_clip.create_model_and_transforms(
    "ViT-B-32", 
    pretrained="laion2b_s34b_b79k"
)

tokenizer = open_clip.get_tokenizer("ViT-B-32")

labels = [
    "an airplane",
    "a bird",
    "a car",
    "a cat",
    "a deer",
    "a dog",
    "a horse",
    "a monkey",
    "a ship",
    "a truck"
]

text = tokenizer(labels)
text_embeddings = model.encode_text(text)

Now, embeddings contains a 512 length vector for each text prompt. This vector has the same size as the vision encoder output. The dot product of the vector with vision features indicates the similarity, so we can determine the class probabilities for our dataset as follows

import torch.nn.functional as F

def embeddings_to_class_probs(vision_embeddings, text_embeddings)
    vision_embeddings = vision_embeddings / vision_embeddings.norm(dim=-1, keepdim=True)
    text_embeddings = text_embeddings / text_embeddings.norm(dim=-1, keepdim=True)
    logits = vision_embeddings @ text_embeddings.T
    class_probs = F.softmax(100. * logits, dim=-1)
    return class_probs

Now that we have the text embeddings for our target task, and a method to compare these against the image embeddings, all that's left to do is run the STL10 dataset through OpenCLIP vision encoder, compute the output class probabilities, and compare the result against the ground truth label.

import tqdm
from torchvision.datasets import STL10

dataset = STL10(
    root=dataset_path,
    download=True,
    split="test"
)

num_correct = 0

for image, label in tqdm.tqdm(dataset):
    input_tensor = preprocess(image).unsqueeze(0)
    vision_embeddings = model.encode_image(input_tensor)
    output_class_probs = embeddings_to_class_probs(vision_embeddings, text_embeddings)
    output_label = torch.argmax(dim=-1)
    num_correct += int(torch.count_nonzero(output_label == label))

accuracy = 100. * num_correct / len(dataset)

And after this, out of the box, the OpenCLIP encoder, without any additional training, get's 96.68% accuracy on the STL10 test dataset! With no tricks, we achieved fairly competitive accuracy on the STL10 dataset, for comparison you can see other competitive results on the STL10 dataset here

Using linear head for classification

As shown, using the text prompts as class labels, we were able to achieve pretty good accuracy on the STL10 dataset without any training or ground truth labels. But what if we have ground truth labels available? Can we use this to improve the accuracy?

With this option, we'll explore how we can use some ground truth data to train a tiny logistic regression layer (linear layer followed by softmax) at the end of the OpenCLIP model and see if this improves the accuracy.

To do this, we define our linear layer as follows

import torch.nn as nn

linear_probe = nn.Linear(512, len(labels))

We then need to train our model. This involves

  1. Read a batch from the dataset
  2. Run the OpenCLIP vision encoder (no gradients)
  3. Run the linear layer on the output of OpenCLIP
  4. Compute cross entropy between the linear layer output and ground truth label
  5. Update the linear layer
optimizer = torch.optim.Adam(linear_probe.parameters(), lr=3e-4)

for epoch in range(num_epochs):
    for image, label in iter(train_loader):
        # ... run open-clip to get vision embeddings
        optimizer.zero_grad()
        output_logits = linear_probe(vision_embeddings)
        output_logprob = F.log_softmax(output_logits, dim=-1)
        loss = F.nll_loss(output_logprob, label)
        loss.backward()
        optimizer.step()

After training the linear probe, we evaluate it on the STL10 dataset, similar to before, and our accuracy is now 98.57!

Great! By using some labeled data, we were able to train a small logistic regression layer that improves the accuracy of OpenCLIP on the STL10 dataset by nearly +2%!

This improvement is likely because our text prompts, like "an airplane", might not perfectly match the labels as they appear in the STL10 dataset. But by seeing a few examples for each label, we can learn reference embeddings that more accurately represent the class labels.

Training a student model to mimic OpenCLIP

We've now seen that using the large OpenCLIP model, we can achieve competetive results on the STL10 image classification dataset with little effort. But OpenCLIP is large, and is likely to have high memory consumption and latency compared to other model architectures. In addition, as a vision transformer model, OpenCLIP is less capable of exploiting the Deep Learning Accelerator (DLA) on Jetson AGX Orin, given the matrix multiplication in the attention layers. CNN models, like resnet18, on the other hand are highly optimized by both the GPU and DLA on Jetson, and allow us to run models at higher throughput and with less memory.

However, knowledge distillation can impact the accuracy of the model, so we'd like to better understand what factors are most important. To do this, we've run a few experiments that seek to answer a few questions:

  1. How does distillation compare to training with ground truth labels?
  2. How does the data distribution used for training impact model accuracy?
  3. How does the distillation method impact model accuracy? Is it better to train on the class probabilities or internal features?
  4. How does the student model architecture impact model accuracy? Will resnet50 obtain higher accuracy than resnet18?

How does distillation compare to training with ground truth labels?

For this set of experiments, we will train resnet18 from scratch, using ground truth labels, and compare this to training resnet18 on the output probabilities of OpenCLIP (with text prompt labels, and with the linear regression method).

In each experiment, we use only the STL10 train dataset (5000 images), to provide a fair comparison against the baseline training.

Experiment Name Student Model Teacher Model Data Used Test Accuracy
resnet18_from_scratch resnet18 None stl10_train 57.93
resnet18_text_train resnet18 openclip_text stl10_train 44.04
resnet18_linear_train resnet18 openclip_linear stl10_train 60.65

To reproduce these results, you can create a python file and call the folowing.

from stl10 import (
    precompute_clip_stl10_train_image_embeddings,
    precompute_clip_stl10_test_image_embeddings,
    precompute_clip_stl10_text_embeddings,
    train_resnet18_from_scratch,
    train_resnet18_zero_shot_train_only,
    train_resnet18_linear_probe_train_only
)

precompute_clip_stl10_train_image_embeddings()
precompute_clip_stl10_test_image_embeddings()
precompute_clip_stl10_text_embeddings()
train_resnet18_from_scratch()
train_resnet18_zero_shot_train_only()
train_resnet18_linear_probe_train_only()

As we can see, with equal data used for training, the distilled model using the linear regression head achieves higher accuracy than using the ground truth labels directly, even though each method uses the same set of data and labels available. However, the text prompt method was unable to reach the accuracy of training resnet18 from scratch.

This shows that knowledge distillation has the capability to improve model accuracy, even under the same data distribution. However, we'll see in the next section how we can take this much further, by utilizing unlabeled data during the distillation process. In fact, using unlabeled data, the text prompting method (which doesn't require any ground truth labels in the process) is able to exceed training the model from scratch by a huge margin!

How does the data used for distillation impact accuracy?

Now we've seen knowledge distillation has the capability of improving the model accuracy. But our best student model (60.65% accuracy) is still far below the accuracy of our teacher model (98.57% accuracy).

Why is this? Does the student model, resnet18, lack the capacity to mimic the teacher model? Or is it something else, perhaps the data that we used for distillation?

To help answer this question, we ran a series of experiments that perform knowledge distillation using not only the 5000 images in the STL10 train dataset, but also 100,000 unlabeled images that are provided in a supplementary STL10 dataset. The results of these experiments are sumarized in the table below

Experiment Name Student Model Teacher Model Data Used Test Accuracy
resnet18_text_train_unlabeled resnet18 openclip_text stl10_train + stl10_unlabeled 94.32
resnet18_linear_train_unlabeled resnet18 openclip_linear stl10_train + stl10_unlabeled 96.88

To reproduce these results, call the following in Python

from stl10 import (
    precompute_clip_stl10_train_image_embeddings,
    precompute_clip_stl10_unlabeled_image_embeddings,
    precompute_clip_stl10_test_image_embeddings,
    precompute_clip_stl10_text_embeddings,
    train_resnet18_zero_shot,
    train_resnet18_linear_probe
)

precompute_clip_stl10_train_image_embeddings()
precompute_clip_stl10_unlabeled_image_embeddings()
precompute_clip_stl10_test_image_embeddings()
precompute_clip_stl10_text_embeddings()
train_resnet18_zero_shot()
train_resnet18_linear_probe()

Wow! Simply by including a large number of unlabeled images from the STL10 unlabeled datasetsplit during the distillation process, the accuracy of the model improved substantially! Now, the resnet18 model with text-prompt labels far exceeds the accuracy of the resnet18 trained with ground-truth labels, and falls just ~2% below the original OpenCLIP model with text-prompt labels. The distilled model with linear classification head goes even further, at 96.88%, exceeding the original OpenCLIP model with text prompting, and falling less than 2% below our best OpenCLIP varianet with a linear classification head.

So in summary,

  1. With no labels, just a large amount of unlabeled data we can achieve 94.32% accuracy with a resnet18 model by distilling a text-prompt OpenCLIP classifier
  2. With some labels and a large amount of unlabeled data we can achieve 96.88% accuracy with a resnet18 model by distilling a OpenCLIP classifier with linear classification head

The lesson learned is that the data used for distillation is very important for obtaining good accuracy.

But what about our student model architecture? Could we achieve better results using a higher capacity model like resnet50?

How does the student model architecture impact model accuracy?

To explore the impact of the student model architecture on final model accuracy, we ran a series of experiments where we used our best distillation configuration, with three different model architectures; resnet18, resnet34, resnet50. The results of these experiments are summarized below

Experiment Name Student Model Teacher Model Data Used Test Accuracy
resnet18_linear_train_unlabeled resnet18 openclip_linear stl10_train + stl10_unlabeled 96.88
resnet34_linear_train_unlabeled resnet34 openclip_linear stl10_train + stl10_unlabeled 96.69
resnet50_linear_train_unlabeled resnet50 openclip_linear stl10_train + stl10_unlabeled 96.76

To reproduce these results, call the following in Python

from stl10 import (
    precompute_clip_stl10_train_image_embeddings,
    precompute_clip_stl10_unlabeled_image_embeddings,
    precompute_clip_stl10_test_image_embeddings,
    precompute_clip_stl10_text_embeddings,
    train_resnet18_linear_probe,
    train_resnet34_linear_probe,
    train_resnet50_linear_probe
)

precompute_clip_stl10_train_image_embeddings()
precompute_clip_stl10_unlabeled_image_embeddings()
precompute_clip_stl10_test_image_embeddings()
precompute_clip_stl10_text_embeddings()
train_resnet18_linear_probe()
train_resnet34_linear_probe()
train_resnet50_linear_probe()

As seen above, we saw a relatively negligable difference when switching between student model architectures. This means, at least for this task (STL10 classification), the data used for distillation was much more important than the student model architecture.

It's quite possible that for other tasks, this may not be the case, but we wanted to include these results to share this finding at least in this scenario, so you can understand and prioritize which factors to explore first.

While for STL10 we didn't see a substantial change by switching model architectures, there is still one question we wanted to explore for this tutorial. That is, how does the distillation method impact model accuracy?

How does the distillation method impact model accuracy?

So far we've seen that the data used for training has a large impact on model accuracy, but what about the method used for distillation? As mentioned previously, knowledge distillation can be performed in a few ways

  1. Response distillation (done above): Fit a model to learn the output class probabilities
  2. Feature distillation: Fit a model to learn the internal features

To explore the impact of these decisions we ran a couple experiments where we trained resnet18 to learn the vision feature embedding (512 dimension) output by OpenCLIP, rather than the class probabilities. We feed these into the text-prompt or linear regression head as we do with the original OpenCLIP model. The results are detailed below

Experiment Name Student Model Teacher Model Data Used Test Accuracy
resnet18_embedding_text_train_unlabeled resnet18 openclip_embedding stl10_train + stl10_unlabeled 94.575
resnet18_embedding_linear_train_unlabeled resnet34 openclip_embedding stl10_train + stl10_unlabeled 96.912

To reproduce these results, call the following in Python

from stl10 import (
    precompute_clip_stl10_train_image_embeddings,
    precompute_clip_stl10_unlabeled_image_embeddings,
    precompute_clip_stl10_test_image_embeddings,
    precompute_clip_stl10_text_embeddings,
    train_resnet18_embedding_text,
    eval_resnet18_embedding_linear
)

precompute_clip_stl10_train_image_embeddings()
precompute_clip_stl10_unlabeled_image_embeddings()
precompute_clip_stl10_test_image_embeddings()
precompute_clip_stl10_text_embeddings()
train_resnet18_embedding_text()
eval_resnet18_embedding_linear()

As seen in the table above, we achieve slightly higher accuracy by training on the features (using Mean Squared Error loss) than we do by training on the output class probabilities (using KL divergence).

To be summarize

  1. By training on features our text-prompt student accuracy increases by 0.25%
  2. By training on features our linear classification head student accuracy increases by 0.032%

While these changes are not very significant, it is interesting to know that training on the embeddings does not adversely impact the model accuracy. The reason this is interesting is because these embeddings are not explicitly targeting the STL10 task, and they could possibly be re-purposed like the original OpenCLIP model, by simply changing the text prompts used for classification, or re-training the linear regression head.

However, in this tutorial we have not yet distilled OpenCLIP in a generic fashion. Exploring this possibility is an interesting next step. But for now, we've got a pretty good classification model for our target task. Let's discuss how we can optimize our student model for deployment, and see how the latency and memory consumption compare to the original OpenCLIP model.

Optimizing the models with TensorRT and comparing the performance

Above we've shown how we can train a resnet CNN model to mimic the large OpenCLIP model. Now, let's see why this effort was worth it. What performance gain can we expect by using our student model?

To understand the performance we should expect from each model, we'll optimize the models with NVIDIA TensorRT and measure the throughput and memory consumption on NVIDIA Jetson. To do this, we've exported the models with ONNX and optimized with NVIDIA TensorRT. Here we'll show the performance of OpenCLIP and resnet18 running on Jetson Orin Nano at 224x224 resolution with batch size 8.

Model Image Size Batch Size Precision Throughput (FPS) Latency (ms) Memory (MB)
openclip_vitb32 224x224 8 FP16 335.816 23.82 1087
resnet18 224x224 8 FP16 1420.2 5.97 315

As we can see, after optimizing each model with TensorRT, our resnet18 model is 4.2x faster than the original open_clip model while using 3.45x less memory.

To measure memory, we used tegrastats. We record the system memory before model execution with trtexec and after model exeuction. The memory in the table above is the change in system memory when the model is running.

For this particular image classification task, Jetson Orin Nano is plenty capable of running the Original OpenCLIP model. However, if you want to run models at higher resolution, these differences in throughput and memory consumption may become critical.

Next steps

That's it for this tutorial! In summary, in in this tutorial, we explored using Knowledge Distillation for training a resnet18 classifier on the STL10 classification dataset. We achieved comparable accuracy to the original OpenCLIP model for this task, while significantly reducing the runtime and memory consumption. We hope this tutorial introduced you to ways that you can explore knowledge distillation for brining large models to the edge.

In addition to this introduction, we've created a companion project, clip-distillation, that enables you to easily create a zero-label image classifier for your own custom task!

It includes,

  1. Scripts to download relevant clip-filtered images to use for distillation
  2. Scripts to distil an efficient CNN model to mimic an OpenCLIP transformer model
    • Includes quantization aware training and structured sparsity as options during training.
  3. Scripts to run inference with NVIDIA TensorRT.

To get started head to clip-distillation.

References

  1. Learning Transferable Visual Models From Natural Language Supervision [Paper]
  2. Knowledge Distillation: A Survey [Paper]
  3. Distilling the Knowledge in a Neural Network [Paper]
  4. Knowledge Distillation and Student-Teacher Learning for Visual Intelligence: A Review and New Outlooks [Paper]
  5. Feature Relational Distillation* [Paper]
  6. Relational Knowledge Distillation* [Paper] (Instances)

More Repositories

1

torch2trt

An easy to use PyTorch to TensorRT converter
Python
4,547
star
2

jetbot

An educational AI robot based on NVIDIA Jetson Nano.
Jupyter Notebook
3,012
star
3

deepstream_python_apps

DeepStream SDK Python bindings and sample applications
Jupyter Notebook
1,439
star
4

Lidar_AI_Solution

A project demonstrating Lidar related AI solutions, including three GPU accelerated Lidar/camera DL networks (PointPillars, CenterPoint, BEVFusion) and the related libs (cuPCL, 3D SparseConvolution, YUV2RGB, cuOSD,).
Python
1,249
star
5

deepstream_reference_apps

Samples for TensorRT/Deepstream for Tesla & Jetson
C++
1,127
star
6

jetracer

An autonomous AI racecar using NVIDIA Jetson Nano
Jupyter Notebook
1,059
star
7

redtail

Perception and AI components for autonomous mobile robotics.
C++
1,013
star
8

trt_pose

Real-time pose estimation accelerated with NVIDIA TensorRT
Python
974
star
9

tf_trt_models

TensorFlow models accelerated with NVIDIA TensorRT
Python
683
star
10

nanosam

A distilled Segment Anything (SAM) model capable of running real-time with NVIDIA TensorRT
Python
616
star
11

cuPCL

A project demonstrating how to use the libs of cuPCL.
C++
551
star
12

yolo_deepstream

yolo model qat and deploy with deepstream&tensorrt
Python
534
star
13

CUDA-PointPillars

A project demonstrating how to use CUDA-PointPillars to deal with cloud points data from lidar.
Python
525
star
14

tf_to_trt_image_classification

Image classification with NVIDIA TensorRT from TensorFlow models.
Python
454
star
15

jetcam

Easy to use Python camera interface for NVIDIA Jetson
Jupyter Notebook
426
star
16

deepstream_tao_apps

Sample apps to demonstrate how to deploy models trained with TAO on DeepStream
C++
369
star
17

jetson_benchmarks

Jetson Benchmark
Python
363
star
18

deepstream_360_d_smart_parking_application

Describes the full end to end smart parking application that is available with DeepStream 5.0
JavaScript
340
star
19

deepstream_pose_estimation

This is a DeepStream application to demonstrate a human pose estimation pipeline.
C++
290
star
20

jetson_dla_tutorial

A tutorial for getting started with the Deep Learning Accelerator (DLA) on NVIDIA Jetson
Python
272
star
21

face-mask-detection

Face Mask Detection using NVIDIA Transfer Learning Toolkit (TLT) and DeepStream for COVID-19
Python
243
star
22

nanoowl

A project that optimizes OWL-ViT for real-time inference with NVIDIA TensorRT.
Python
230
star
23

deepstream-occupancy-analytics

This is a sample application for counting people entering/leaving in a building using NVIDIA Deepstream SDK, Transfer Learning Toolkit (TLT), and pre-trained models. This application can be used to build real-time occupancy analytics applications for smart buildings, hospitals, retail, etc. The application is based on deepstream-test5 sample application.
C
217
star
24

tensorrt_plugin_generator

A simple tool that can generate TensorRT plugin code quickly.
Python
215
star
25

jetcard

An SD card image for web programming AI projects with NVIDIA Jetson Nano
Python
210
star
26

trt_pose_hand

Real-time hand pose estimation and gesture classification using TensorRT
Jupyter Notebook
207
star
27

redaction_with_deepstream

An example of using DeepStream SDK for redaction
C
205
star
28

deepstream_lpr_app

Sample app code for LPR deployment on DeepStream
C
203
star
29

jetson-cloudnative-demo

Multi-container demo for Jetson Xavier NX and Jetson AGX Xavier
Shell
186
star
30

cuDLA-samples

YOLOv5 on Orin DLA
Python
177
star
31

jetson-multicamera-pipelines

Python
158
star
32

Gesture-Recognition

Gesture recognition neural network to classify various hand gestures
Python
129
star
33

clip-distillation

Zero-label image classification via OpenCLIP knowledge distillation
Python
104
star
34

ros2_torch_trt

ROS 2 packages for PyTorch and TensorRT for real-time classification and object detection on Jetson Platforms
Python
101
star
35

yolov5_gpu_optimization

This repository provides YOLOV5 GPU optimization sample
Python
100
star
36

Foresee-Navigation

Semantic-Segmentation based autonomous indoor navigation for mobile robots
C++
91
star
37

deepstream_parallel_inference_app

A project demonstrating how to use nvmetamux to run multiple models in parallel.
C++
90
star
38

deepstream_4.x_apps

deepstream 4.x samples to deploy TLT training models
C++
85
star
39

tao-toolkit-triton-apps

Sample app code for deploying TAO Toolkit trained models to Triton
Python
84
star
40

ros2_deepstream

ROS 2 package for NVIDIA DeepStream applications on Jetson Platforms
Python
82
star
41

argus_camera

Simple Python / C++ interface to CSI camera connected to NVIDIA Jetson.
C++
81
star
42

turtlebot3

Autonomous delivery robot with turtlebot3 and Jetson TX2
C++
79
star
43

ros2_jetson

Shell
79
star
44

jetson-copilot

A reference application for a local AI assistant with LLM and RAG
Python
79
star
45

jetson-stereo-depth

Python
78
star
46

my-jetson-nano-baseboard

An open source Jetson Nano baseboard and tools to design your own.
Python
77
star
47

nvidia-tao

Jupyter Notebook
77
star
48

jetnet

Easy to use neural networks for NVIDIA Jetson (and desktop too!)
Python
75
star
49

deepstream_triton_model_deploy

How to deploy open source models using DeepStream and Triton Inference Server
C++
73
star
50

jetson-generative-ai-playground

71
star
51

ros2_tao_pointpillars

ROS2 node for 3D object detection using TAO-PointPillars.
C++
70
star
52

Formula1Epoch

An autonomous R.C. racecar which detects people.
Makefile
66
star
53

ros2_trt_pose

ROS 2 package for "trt_pose": real-time human pose estimation on NVIDIA Jetson Platform
Python
63
star
54

Electron

An autonomous deep learning indoor delivery robot made with Jetson
C++
62
star
55

deepstream_dockers

A project demonstrating how to make DeepStream docker images.
Shell
57
star
56

ros2_jetson_stats

ROS 2 package for monitoring and controlling NVIDIA Jetson Platform resources
Python
56
star
57

isaac_ros_apriltag

CUDA-accelerated Apriltag detection
C++
55
star
58

jetson-trashformers

Autonomous humanoid that picks up and throws away trash
C++
52
star
59

NVIDIA-Optical-Character-Detection-and-Recognition-Solution

This repository provides optical character detection and recognition solution optimized on Nvidia devices.
C++
51
star
60

sdg_pallet_model

A pallet model trained with SDG optimized for NVIDIA Jetson.
Python
48
star
61

JEP_ChatBot

ChatBot: sample for TensorRT inference with a TF model
Python
46
star
62

jetson-min-disk

Shell
45
star
63

whisper_trt

A project that optimizes Whisper for low latency inference using NVIDIA TensorRT
Python
44
star
64

Deepstream-Dewarper-App

This project demonstrate how to infer and track from a 360 videos by using the dewarper plugin.
C
43
star
65

deepstream-retail-analytics

A DeepStream sample application demonstrating end-to-end retail video analytics for brick-and-mortar retail.
C++
42
star
66

isaac_ros_image_pipeline

Isaac ROS image_pipeline package for hardware-accelerated image processing in ROS2.
C++
41
star
67

gesture_recognition_tlt_deepstream

A project demonstrating how to train your own gesture recognition deep learning pipeline. We start with a pre-trained detection model, repurpose it for hand detection using Transfer Learning Toolkit 3.0, and use it together with the purpose-built gesture recognition model. Once trained, we deploy this model on NVIDIA® Jetson™ using Deepstream SDK.
C
40
star
68

synthetic_data_generation_training_workflow

Workflow for generating synthetic data and training CV models.
Jupyter Notebook
38
star
69

YOLOv5-with-Isaac-ROS

Sample showing how to use YOLOv5 with Nvidia Isaac ROS DNN Inference
Python
38
star
70

retinanet_for_redaction_with_deepstream

This sample shows how to train and deploy a deep learning model for the real time redaction of faces from video streams using the NVIDIA DeepStream SDK
C
37
star
71

scene-text-recognition

Python
34
star
72

deep_nav_layers

A series of plugins to the ROS navigation stack to incorporate deep learning inputs.
Makefile
33
star
73

Nav2-with-Isaac-ROS-GEMs

Python
33
star
74

tao_toolkit_recipes

Jupyter Notebook
32
star
75

GreenMachine

AI kiosk with a camera and a projector to visualize waste type of cafeteria objects
Python
32
star
76

viz_3Dbbox_ros2_pointpillars

Visualization tool for 3D bounding box results of TAO-PointPillars
Python
28
star
77

isaac_demo

Set of demo to try Isaac ROS with Isaac SIM
Python
27
star
78

tlt-iva-examples

A notebook that demonstrates how to use the NVIDIA Intelligent Video Analytics suite to detect objects in real-time. We use Transfer Learning Toolkit to train a fast and accurate detector and DeepStream to run that detector on an NVIDIA Jetson edge device.
Jupyter Notebook
27
star
79

mmj_genai

A reference example for integrating NanoOwl with Metropolis Microservices for Jetson
Python
25
star
80

TAO-Toolkit-Whitepaper-use-cases

TAO best practices. How to adapt for a new domain, new classes, and generalize the model with a small dataset using Nvidia's TAO toolkit
Jupyter Notebook
24
star
81

ros2_nanollm

ROS2 nodes for LLM, VLM, VLA
Python
24
star
82

caffe_ros

Package containing nodes for deep learning in ROS.
C++
23
star
83

jetson_isaac_ros_visual_slam_tutorial

Hosting a tutorial documentation for running Isaac ROS Visual SLAM on Jetson device.
23
star
84

jetbot_mini

Python
22
star
85

centernet_kinect

Real-time CenterNet based object detection on fused IR/Depth images from Kinect sensor. Works on NVIDIA Jetson.
Python
19
star
86

deepstream_libraries

DeepStream Libraries offer CVCUDA, NvImageCodec, and PyNvVideoCodec modules as Python APIs for seamless integration into custom frameworks.
Python
19
star
87

robot_freespace_seg_Isaac_TAO

In this workflow we demonstrate using SDG + TAO for a freespace segmentation application
Python
17
star
88

deepstream-yolo3-gige-apps

A project demonstration on how to use the GigE camera to do the DeepStream Yolo3 object detection, how to set up the GigE camera, and deployment for the DeepStream apps.
C
16
star
89

ros2_torch2trt_examples

ros2 packages for torch2trt examples
Python
15
star
90

ros2_trt_pose_hand

ROS2 package for trt_pos_hand, "Real-time hand pose estimation and gesture classification using TensorRT"
Python
14
star
91

deepstream_triton_migration

Triton Migration Guide for DeepStreamSDK.
14
star
92

ROS2-NanoOWL

ROS 2 node for open-vocabulary object detection using NanoOWL.
Python
14
star
93

jetson-platform-services

A collection of reference AI microservices and workflows for Jetson Platform Services
Jupyter Notebook
13
star
94

jetson_virtual_touchpanel

Enables Jetson to be controlled with handpose using trt_pose
Python
12
star
95

deepstream-segmentation-analytics

A project demonstration to do the industrial defect segmentation based on loading the image from directory and generate the output ground truth.
C
11
star
96

isaac_ros_common

Isaac ROS common utilities, Dockerfiles, and testing code.
Python
11
star
97

tao_byom_examples

Examples of converting different open-source deep learning models to TAO compatible format through TAO BYOM package.
Python
11
star
98

husky_demo

Husky Simulation and Hardware In the Loop simulation on Isaac SIM with Isaac ROS
Python
10
star
99

mmj_utils

A utility library to help integrate Python applications with Metropolis Microservices for Jetson
Python
9
star
100

a2j_handpose_3d

Python
8
star