• Stars
    star
    616
  • Rank 72,837 (Top 2 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created about 1 year ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A distilled Segment Anything (SAM) model capable of running real-time with NVIDIA TensorRT

NanoSAM

👍 Usage - ⏱️ Performance - 🛠️ Setup - 🤸 Examples - 🏋️ Training
- 🧐 Evaluation - 👏 Acknowledgment - 🔗 See also

NanoSAM is a Segment Anything (SAM) model variant that is capable of running in 🔥 real-time 🔥 on NVIDIA Jetson Orin Platforms with NVIDIA TensorRT.

NanoSAM is trained by distilling the MobileSAM image encoder on unlabeled images. For an introduction to knowledge distillation, we recommend checking out this tutorial.

👍 Usage

Using NanoSAM from Python looks like this

from nanosam.utils.predictor import Predictor

predictor = Predictor(
    image_encoder="data/resnet18_image_encoder.engine",
    mask_decoder="data/mobile_sam_mask_decoder.engine"
)

image = PIL.Image.open("dog.jpg")

predictor.set_image(image)

mask, _, _ = predictor.predict(np.array([[x, y]]), np.array([1]))
Notes The point labels may be
Point Label Description
0 Background point
1 Foreground point
2 Bounding box top-left
3 Bounding box bottom-right

Follow the instructions below for how to build the engine files.

⏱️ Performance

NanoSAM runs real-time on Jetson Orin Nano.

Model † ⏱️ Jetson Orin Nano (ms) ⏱️ Jetson AGX Orin (ms) 🎯 Accuracy (mIoU) ‡
Image Encoder Full Pipeline Image Encoder Full Pipeline All Small Medium Large
MobileSAM TBD 146 35 39 0.728 0.658 0.759 0.804
NanoSAM (ResNet18) TBD 27 4.2 8.1 0.706 0.624 0.738 0.796
Notes

† The MobileSAM image encoder is optimized with FP32 precision because it produced erroneous results when built for FP16 precision with TensorRT. The NanoSAM image encoder is built with FP16 precision as we did not notice a significant accuracy degredation. Both pipelines use the same mask decoder which is built with FP32 precision. For all models, the accuracy reported uses the same model configuration used to measure latency.

‡ Accuracy is computed by prompting SAM with ground-truth object bounding box annotations from the COCO 2017 validation dataset. The IoU is then computed between the mask output of the SAM model for the object and the ground-truth COCO segmentation mask for the object. The mIoU is the average IoU over all objects in the COCO 2017 validation set matching the target object size (small, medium, large).

🛠️ Setup

NanoSAM is fairly easy to get started with.

  1. Install the dependencies

    1. Install PyTorch

    2. Install torch2trt

    3. Install NVIDIA TensorRT

    4. (optional) Install TRTPose - For the pose example.

      git clone https://github.com/NVIDIA-AI-IOT/trt_pose
      cd trt_pose
      python3 setup.py develop --user
    5. (optional) Install the Transformers library - For the OWL ViT example.

      python3 -m pip install transformers
  2. Install the NanoSAM Python package

    git clone https://github.com/NVIDIA-AI-IOT/nanosam
    cd nanosam
    python3 setup.py develop --user
  3. Build the TensorRT engine for the mask decoder

    1. Export the MobileSAM mask decoder ONNX file (or download directly from here)

      python3 -m nanosam.tools.export_sam_mask_decoder_onnx \
          --model-type=vit_t \
          --checkpoint=assets/mobile_sam.pt \
          --output=data/mobile_sam_mask_decoder.onnx
    2. Build the TensorRT engine

      trtexec \
          --onnx=data/mobile_sam_mask_decoder.onnx \
          --saveEngine=data/mobile_sam_mask_decoder.engine \
          --minShapes=point_coords:1x1x2,point_labels:1x1 \
          --optShapes=point_coords:1x1x2,point_labels:1x1 \
          --maxShapes=point_coords:1x10x2,point_labels:1x10

      This assumes the mask decoder ONNX file is downloaded to data/mobile_sam_mask_decoder.onnx

      Notes This command builds the engine to support up to 10 keypoints. You can increase this limit as needed by specifying a different max shape.
  4. Build the TensorRT engine for the NanoSAM image encoder

    1. Download the image encoder: resnet18_image_encoder.onnx

    2. Build the TensorRT engine

      trtexec \
          --onnx=data/resnet18_image_encoder.onnx \
          --saveEngine=data/resnet18_image_encoder.engine \
          --fp16
  5. Run the basic usage example

    python3 examples/basic_usage.py \
        --image_encoder=data/resnet18_image_encoder.engine \
        --mask_decoder=data/mobile_sam_mask_decoder.engine
    

    This outputs a result to data/basic_usage_out.jpg

That's it! From there, you can read the example code for examples on how to use NanoSAM with Python. Or try running the more advanced examples below.

🤸 Examples

NanoSAM can be applied in many creative ways.

Example 1 - Segment with bounding box

This example uses a known image with a fixed bounding box to control NanoSAM segmentation.

To run the example, call

python3 examples/basic_usage.py \
    --image_encoder="data/resnet18_image_encoder.engine" \
    --mask_decoder="data/mobile_sam_mask_decoder.engine"

Example 2 - Segment with bounding box (using OWL-ViT detections)

This example demonstrates using OWL-ViT to detect objects using a text prompt(s), and then segmenting these objects using NanoSAM.

To run the example, call

python3 examples/segment_from_owl.py \
    --prompt="A tree" \
    --image_encoder="data/resnet18_image_encoder.engine" \
    --mask_decoder="data/mobile_sam_mask_decoder.engine
Notes - While OWL-ViT does not run real-time on Jetson Orin Nano (3sec/img), it is nice for experimentation as it allows you to detect a wide variety of objects. You could substitute any other real-time pre-trained object detector to take full advantage of NanoSAM's speed.

Example 3 - Segment with keypoints (offline using TRTPose detections)

This example demonstrates how to use human pose keypoints from TRTPose to control NanoSAM segmentation.

To run the example, call

python3 examples/segment_from_pose.py

This will save an output figure to data/segment_from_pose_out.png.

Example 4 - Segment with keypoints (online using TRTPose detections)

This example demonstrates how to use human pose to control segmentation on a live camera feed. This example requires an attached display and camera.

To run the example, call

python3 examples/demo_pose_tshirt.py

Example 5 - Segment and track (experimental)

This example demonstrates a rudimentary segmentation tracking with NanoSAM. This example requires an attached display and camera.

To run the example, call

python3 examples/demo_click_segment_track.py <image_encoder_engine> <mask_decoder_engine>

Once the example is running double click an object you want to track.

Notes This tracking method is very simple and can get lost easily. It is intended to demonstrate creative ways you can use NanoSAM, but would likely be improved with more work.

🏋️ Training

You can train NanoSAM on a single GPU

  1. Download and extract the COCO 2017 train images

    # mkdir -p data/coco  # uncomment if it doesn't exist
    mkdir -p data/coco
    cd data/coco
    wget http://images.cocodataset.org/zips/train2017.zip
    unzip train2017.zip
    cd ../..
  2. Build the MobileSAM image encoder (used as teacher model)

    1. Export to ONNX

      python3 -m nanosam.tools.export_sam_image_encoder_onnx \
          --checkpoint="assets/mobile_sam.pt" \
          --output="data/mobile_sam_image_encoder_bs16.onnx" \
          --model_type=vit_t \
          --batch_size=16
    2. Build the TensorRT engine with batch size 16

      trtexec \
          --onnx=data/mobile_sam_image_encoder_bs16.onnx \
          --shapes=image:16x3x1024x1024 \
          --saveEngine=data/mobile_sam_image_encoder_bs16.engine
  3. Train the NanoSAM image encoder by distilling MobileSAM

    python3 -m nanosam.tools.train \
        --images=data/coco/train2017 \
        --output_dir=data/models/resnet18 \
        --model_name=resnet18 \
        --teacher_image_encoder_engine=data/mobile_sam_image_encoder_bs16.engine \
        --batch_size=16
    Notes Once training, visualizations of progress and checkpoints will be saved to the specified output directory. You can stop training and resume from the last saved checkpoint if needed.

    For a list of arguments, you can type

    python3 -m nanosam.tools.train --help
  4. Export the trained NanoSAM image encoder to ONNX

    python3 -m nanosam.tools.export_image_encoder_onnx \
        --model_name=resnet18 \
        --checkpoint="data/models/resnet18/checkpoint.pth" \
        --output="data/resnet18_image_encoder.onnx"

You can then build the TensorRT engine as detailed in the getting started section.

🧐 Evaluation

You can reproduce the accuracy results above by evaluating against COCO ground truth masks

  1. Download and extract the COCO 2017 validation set.

    # mkdir -p data/coco  # uncomment if it doesn't exist
    cd data/coco
    wget http://images.cocodataset.org/zips/val2017.zip
    wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip
    unzip val2017.zip
    unzip annotations_trainval2017.zip
    cd ../..
  2. Compute the IoU of NanoSAM mask predictions against the ground truth COCO mask annotation.

    python3 -m nanosam.tools.eval_coco \
        --coco_root=data/coco/val2017 \
        --coco_ann=data/coco/annotations/instances_val2017.json \
        --image_encoder=data/resnet18_image_encoder.engine \
        --mask_decoder=data/mobile_sam_mask_decoder.engine \
        --output=data/resnet18_coco_results.json

    This uses the COCO ground-truth bounding boxes as inputs to NanoSAM

  3. Compute the average IoU over a selected category or size

    python3 -m nanosam.tools.compute_eval_coco_metrics \
        data/efficientvit_b0_coco_results.json \
        --size="all"
    Notes For all options type ``python3 -m nanosam.tools.compute_eval_coco_metrics --help``.

    To compute the mIoU for a specific category id.

    python3 -m nanosam.tools.compute_eval_coco_metrics \
        data/resnet18_coco_results.json \
        --category_id=1

👏 Acknowledgement

This project is enabled by the great projects below.

  • SAM - The original Segment Anything model.
  • MobileSAM - The distilled Tiny ViT Segment Anything model.

🔗 See also

More Repositories

1

torch2trt

An easy to use PyTorch to TensorRT converter
Python
4,547
star
2

jetbot

An educational AI robot based on NVIDIA Jetson Nano.
Jupyter Notebook
3,012
star
3

deepstream_python_apps

DeepStream SDK Python bindings and sample applications
Jupyter Notebook
1,439
star
4

Lidar_AI_Solution

A project demonstrating Lidar related AI solutions, including three GPU accelerated Lidar/camera DL networks (PointPillars, CenterPoint, BEVFusion) and the related libs (cuPCL, 3D SparseConvolution, YUV2RGB, cuOSD,).
Python
1,249
star
5

deepstream_reference_apps

Samples for TensorRT/Deepstream for Tesla & Jetson
C++
1,127
star
6

jetracer

An autonomous AI racecar using NVIDIA Jetson Nano
Jupyter Notebook
1,059
star
7

redtail

Perception and AI components for autonomous mobile robotics.
C++
1,013
star
8

trt_pose

Real-time pose estimation accelerated with NVIDIA TensorRT
Python
974
star
9

tf_trt_models

TensorFlow models accelerated with NVIDIA TensorRT
Python
683
star
10

cuPCL

A project demonstrating how to use the libs of cuPCL.
C++
551
star
11

yolo_deepstream

yolo model qat and deploy with deepstream&tensorrt
Python
534
star
12

CUDA-PointPillars

A project demonstrating how to use CUDA-PointPillars to deal with cloud points data from lidar.
Python
525
star
13

tf_to_trt_image_classification

Image classification with NVIDIA TensorRT from TensorFlow models.
Python
454
star
14

jetcam

Easy to use Python camera interface for NVIDIA Jetson
Jupyter Notebook
426
star
15

deepstream_tao_apps

Sample apps to demonstrate how to deploy models trained with TAO on DeepStream
C++
369
star
16

jetson_benchmarks

Jetson Benchmark
Python
363
star
17

deepstream_360_d_smart_parking_application

Describes the full end to end smart parking application that is available with DeepStream 5.0
JavaScript
340
star
18

deepstream_pose_estimation

This is a DeepStream application to demonstrate a human pose estimation pipeline.
C++
290
star
19

jetson_dla_tutorial

A tutorial for getting started with the Deep Learning Accelerator (DLA) on NVIDIA Jetson
Python
272
star
20

face-mask-detection

Face Mask Detection using NVIDIA Transfer Learning Toolkit (TLT) and DeepStream for COVID-19
Python
243
star
21

nanoowl

A project that optimizes OWL-ViT for real-time inference with NVIDIA TensorRT.
Python
230
star
22

deepstream-occupancy-analytics

This is a sample application for counting people entering/leaving in a building using NVIDIA Deepstream SDK, Transfer Learning Toolkit (TLT), and pre-trained models. This application can be used to build real-time occupancy analytics applications for smart buildings, hospitals, retail, etc. The application is based on deepstream-test5 sample application.
C
217
star
23

tensorrt_plugin_generator

A simple tool that can generate TensorRT plugin code quickly.
Python
215
star
24

jetcard

An SD card image for web programming AI projects with NVIDIA Jetson Nano
Python
210
star
25

trt_pose_hand

Real-time hand pose estimation and gesture classification using TensorRT
Jupyter Notebook
207
star
26

redaction_with_deepstream

An example of using DeepStream SDK for redaction
C
205
star
27

deepstream_lpr_app

Sample app code for LPR deployment on DeepStream
C
203
star
28

jetson-cloudnative-demo

Multi-container demo for Jetson Xavier NX and Jetson AGX Xavier
Shell
186
star
29

cuDLA-samples

YOLOv5 on Orin DLA
Python
177
star
30

jetson-multicamera-pipelines

Python
158
star
31

jetson-intro-to-distillation

A tutorial introducing knowledge distillation as an optimization technique for deployment on NVIDIA Jetson
Python
143
star
32

Gesture-Recognition

Gesture recognition neural network to classify various hand gestures
Python
129
star
33

clip-distillation

Zero-label image classification via OpenCLIP knowledge distillation
Python
104
star
34

ros2_torch_trt

ROS 2 packages for PyTorch and TensorRT for real-time classification and object detection on Jetson Platforms
Python
101
star
35

yolov5_gpu_optimization

This repository provides YOLOV5 GPU optimization sample
Python
100
star
36

Foresee-Navigation

Semantic-Segmentation based autonomous indoor navigation for mobile robots
C++
91
star
37

deepstream_parallel_inference_app

A project demonstrating how to use nvmetamux to run multiple models in parallel.
C++
90
star
38

deepstream_4.x_apps

deepstream 4.x samples to deploy TLT training models
C++
85
star
39

tao-toolkit-triton-apps

Sample app code for deploying TAO Toolkit trained models to Triton
Python
84
star
40

ros2_deepstream

ROS 2 package for NVIDIA DeepStream applications on Jetson Platforms
Python
82
star
41

argus_camera

Simple Python / C++ interface to CSI camera connected to NVIDIA Jetson.
C++
81
star
42

turtlebot3

Autonomous delivery robot with turtlebot3 and Jetson TX2
C++
79
star
43

ros2_jetson

Shell
79
star
44

jetson-copilot

A reference application for a local AI assistant with LLM and RAG
Python
79
star
45

jetson-stereo-depth

Python
78
star
46

my-jetson-nano-baseboard

An open source Jetson Nano baseboard and tools to design your own.
Python
77
star
47

nvidia-tao

Jupyter Notebook
77
star
48

jetnet

Easy to use neural networks for NVIDIA Jetson (and desktop too!)
Python
75
star
49

deepstream_triton_model_deploy

How to deploy open source models using DeepStream and Triton Inference Server
C++
73
star
50

jetson-generative-ai-playground

71
star
51

ros2_tao_pointpillars

ROS2 node for 3D object detection using TAO-PointPillars.
C++
70
star
52

Formula1Epoch

An autonomous R.C. racecar which detects people.
Makefile
66
star
53

ros2_trt_pose

ROS 2 package for "trt_pose": real-time human pose estimation on NVIDIA Jetson Platform
Python
63
star
54

Electron

An autonomous deep learning indoor delivery robot made with Jetson
C++
62
star
55

deepstream_dockers

A project demonstrating how to make DeepStream docker images.
Shell
57
star
56

ros2_jetson_stats

ROS 2 package for monitoring and controlling NVIDIA Jetson Platform resources
Python
56
star
57

isaac_ros_apriltag

CUDA-accelerated Apriltag detection
C++
55
star
58

jetson-trashformers

Autonomous humanoid that picks up and throws away trash
C++
52
star
59

NVIDIA-Optical-Character-Detection-and-Recognition-Solution

This repository provides optical character detection and recognition solution optimized on Nvidia devices.
C++
51
star
60

sdg_pallet_model

A pallet model trained with SDG optimized for NVIDIA Jetson.
Python
48
star
61

JEP_ChatBot

ChatBot: sample for TensorRT inference with a TF model
Python
46
star
62

jetson-min-disk

Shell
45
star
63

whisper_trt

A project that optimizes Whisper for low latency inference using NVIDIA TensorRT
Python
44
star
64

Deepstream-Dewarper-App

This project demonstrate how to infer and track from a 360 videos by using the dewarper plugin.
C
43
star
65

deepstream-retail-analytics

A DeepStream sample application demonstrating end-to-end retail video analytics for brick-and-mortar retail.
C++
42
star
66

isaac_ros_image_pipeline

Isaac ROS image_pipeline package for hardware-accelerated image processing in ROS2.
C++
41
star
67

gesture_recognition_tlt_deepstream

A project demonstrating how to train your own gesture recognition deep learning pipeline. We start with a pre-trained detection model, repurpose it for hand detection using Transfer Learning Toolkit 3.0, and use it together with the purpose-built gesture recognition model. Once trained, we deploy this model on NVIDIA® Jetson™ using Deepstream SDK.
C
40
star
68

synthetic_data_generation_training_workflow

Workflow for generating synthetic data and training CV models.
Jupyter Notebook
38
star
69

YOLOv5-with-Isaac-ROS

Sample showing how to use YOLOv5 with Nvidia Isaac ROS DNN Inference
Python
38
star
70

retinanet_for_redaction_with_deepstream

This sample shows how to train and deploy a deep learning model for the real time redaction of faces from video streams using the NVIDIA DeepStream SDK
C
37
star
71

scene-text-recognition

Python
34
star
72

deep_nav_layers

A series of plugins to the ROS navigation stack to incorporate deep learning inputs.
Makefile
33
star
73

Nav2-with-Isaac-ROS-GEMs

Python
33
star
74

tao_toolkit_recipes

Jupyter Notebook
32
star
75

GreenMachine

AI kiosk with a camera and a projector to visualize waste type of cafeteria objects
Python
32
star
76

viz_3Dbbox_ros2_pointpillars

Visualization tool for 3D bounding box results of TAO-PointPillars
Python
28
star
77

isaac_demo

Set of demo to try Isaac ROS with Isaac SIM
Python
27
star
78

tlt-iva-examples

A notebook that demonstrates how to use the NVIDIA Intelligent Video Analytics suite to detect objects in real-time. We use Transfer Learning Toolkit to train a fast and accurate detector and DeepStream to run that detector on an NVIDIA Jetson edge device.
Jupyter Notebook
27
star
79

mmj_genai

A reference example for integrating NanoOwl with Metropolis Microservices for Jetson
Python
25
star
80

TAO-Toolkit-Whitepaper-use-cases

TAO best practices. How to adapt for a new domain, new classes, and generalize the model with a small dataset using Nvidia's TAO toolkit
Jupyter Notebook
24
star
81

ros2_nanollm

ROS2 nodes for LLM, VLM, VLA
Python
24
star
82

caffe_ros

Package containing nodes for deep learning in ROS.
C++
23
star
83

jetson_isaac_ros_visual_slam_tutorial

Hosting a tutorial documentation for running Isaac ROS Visual SLAM on Jetson device.
23
star
84

jetbot_mini

Python
22
star
85

centernet_kinect

Real-time CenterNet based object detection on fused IR/Depth images from Kinect sensor. Works on NVIDIA Jetson.
Python
19
star
86

deepstream_libraries

DeepStream Libraries offer CVCUDA, NvImageCodec, and PyNvVideoCodec modules as Python APIs for seamless integration into custom frameworks.
Python
19
star
87

robot_freespace_seg_Isaac_TAO

In this workflow we demonstrate using SDG + TAO for a freespace segmentation application
Python
17
star
88

deepstream-yolo3-gige-apps

A project demonstration on how to use the GigE camera to do the DeepStream Yolo3 object detection, how to set up the GigE camera, and deployment for the DeepStream apps.
C
16
star
89

ros2_torch2trt_examples

ros2 packages for torch2trt examples
Python
15
star
90

ros2_trt_pose_hand

ROS2 package for trt_pos_hand, "Real-time hand pose estimation and gesture classification using TensorRT"
Python
14
star
91

deepstream_triton_migration

Triton Migration Guide for DeepStreamSDK.
14
star
92

ROS2-NanoOWL

ROS 2 node for open-vocabulary object detection using NanoOWL.
Python
14
star
93

jetson-platform-services

A collection of reference AI microservices and workflows for Jetson Platform Services
Jupyter Notebook
13
star
94

jetson_virtual_touchpanel

Enables Jetson to be controlled with handpose using trt_pose
Python
12
star
95

deepstream-segmentation-analytics

A project demonstration to do the industrial defect segmentation based on loading the image from directory and generate the output ground truth.
C
11
star
96

isaac_ros_common

Isaac ROS common utilities, Dockerfiles, and testing code.
Python
11
star
97

tao_byom_examples

Examples of converting different open-source deep learning models to TAO compatible format through TAO BYOM package.
Python
11
star
98

husky_demo

Husky Simulation and Hardware In the Loop simulation on Isaac SIM with Isaac ROS
Python
10
star
99

mmj_utils

A utility library to help integrate Python applications with Metropolis Microservices for Jetson
Python
9
star
100

a2j_handpose_3d

Python
8
star