• Stars
    star
    170
  • Rank 223,357 (Top 5 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created over 3 years ago
  • Updated 5 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Triton Model Navigator is an inference toolkit designed for optimizing and deploying Deep Learning models with a focus on NVIDIA GPUs.

Triton Model Navigator

Model optimization plays a crucial role in unlocking the maximum performance capabilities of the underlying hardware. By applying various transformation techniques, models can be optimized to fully utilize the specific features offered by the hardware architecture to improve the inference performance and cost. Furthermore, in many cases allow for serialization of models, separating them from the source code. The serialization process enhances portability, allowing the models to be seamlessly deployed in production environments. The decoupling of models from the source code also facilitates maintenance, updates, and collaboration among developers. However, this process comprises multiple steps and offers various potential paths, making manual execution complicated and time-consuming.

The Triton Model Navigator offers a user-friendly and automated solution for optimizing and deploying machine learning models. Using a single entry point for various supported frameworks, allowing users to start the process of searching for the best deployment option with a single call to the dedicated optimize function. Model Navigator handles model export, conversion, correctness testing, and profiling to select optimal model format and save generated artifacts for inference deployment on the PyTriton or Triton Inference Server.

The high-level flowchart below illustrates the process of moving models from source code to deployment optimized formats with the support of the Model Navigator:

Overview

Documentation

The full documentation about optimizing models, using Navigator Package and deploying models in PyTriton and/or Triton Inference Server can be found in documentation.

Support Matrix

The Model Navigator generates multiple optimized and production-ready models. The table below illustrates the model formats that can be obtained by using the Model Navigator with various frameworks.

Table: Supported conversion target formats per each supported Python framework or file.

PyTorch TensorFlow 2 JAX ONNX
Torch Compile SavedModel SavedModel TensorRT
TorchScript Trace TensorRT in TensorFlow TensorRT in TensorFlow
TorchScript Script ONNX ONNX
Torch-TensorRT TensorRT TensorRT
ONNX
TensorRT

Note: The Model Navigator has the capability to support any Python function as input. However, in this particular case, its role is limited to profiling the function without generating any serialized models.

The Model Navigator stores all artifacts within the navigator_workspace. Additionally, it provides an option to save a portable and transferable Navigator Package - an artifact that includes only the models with minimal latency and maximal throughput. This package also includes base formats that can be used to regenerate the TensorRT plan on the target hardware.

Table: Model formats that can be generated from saved Navigator Package and from model sources.

From model source From Navigator Package
SavedModel TorchTensorRT
TensorFlowTensorRT TensorRT in TensorFlow
TorchScript Trace ONNX
TorchScript Script TensorRT
Torch 2 Compile
TorchTensorRT
ONNX
TensorRT

Installation

The following prerequisites must be fulfilled to use Triton Model Navigator

  • Installed Python 3.8+
  • Installed NVIDIA TensorRT for TensorRT models export.

We recommend to use NGC Containers for PyTorch and TensorFlow which provide have all necessary dependencies:

The package can be installed from pypi.org using extra index url:

pip install -U --extra-index-url https://pypi.ngc.nvidia.com triton-model-navigator[<extras,>]

or with nvidia-pyindex:

pip install nvidia-pyindex
pip install -U triton-model-navigator[<extras,>]

To install Triton Model Navigator from source use pip command:

$ pip install --extra-index-url https://pypi.ngc.nvidia.com .[<extras,>]

Extras:

  • tensorflow - Model Navigator with dependencies for TensorFlow2
  • jax - Model Navigator with dependencies for JAX

For using with PyTorch no extras are needed.

Quick Start

This sections describe simple steps of optimizing the model for serving inference on PyTriton or Triton Inference Server as well as saving a Navigator Package for distribution.

Optimize Model

Optimizing models using Model Navigator is as simply as calling optimize function. The optimization process requires at least:

  • model - a Python object, callable or file path with model to optimize.
  • dataloader - a method or class generating input data. The data is utilized to determine the maximum and minimum shapes of the model inputs and create output samples that are used during the optimization process.

Here is an example of running optimize on Torch Hub ResNet50 model:

import torch
import model_navigator as nav

package = nav.torch.optimize(
    model=torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_resnet50', pretrained=True).eval(),
    dataloader=[torch.randn(1, 3, 256, 256) for _ in range(10)],
)

Once the model has been optimized the created artifacts are stored in navigator_workspace and a Package object is returned from the function. Read more about optimize in documentation

Deploy model in PyTriton

The PyTriton can be used to serve inference of any optimized format. Model Navigator provide a dedicated PyTritonAdapter to retrieve the runner and other information required to bind model for serving inference. The runner is an abstraction that connects the model checkpoint with its runtime, making the inference process more accessible and straightforward.

Following that, you can initialize the PyTriton server using the adapter information:

pytriton_adapter = nav.pytriton.PyTritonAdapter(package=package, strategy=nav.MaxThroughputStrategy())
runner = pytriton_adapter.runner

runner.activate()


@batch
def infer_func(**inputs):
    return runner.infer(inputs)


with Triton() as triton:
    triton.bind(
        model_name="resnet50",
        infer_func=infer_func,
        inputs=pytriton_adapter.inputs,
        outputs=pytriton_adapter.outputs,
        config=pytriton_adapter.config,
    )
    triton.serve()

Read more about deploying model on PyTriton in documentation

Deploy model in Triton Inference Server

The optimized model can be also used for serving inference on Triton Inference Server when the serialized format has been created. Model Navigator provide functionality to generate a model deployment configuration directly inside Triton model_repository. The following command will select the model format with the highest throughput and create the Triton deployment in defined path to model repository:

nav.triton.model_repository.add_model_from_package(
    model_repository_path=pathlib.Path("model_repository"),
    model_name="resnet50",
    package=package,
    strategy=nav.MaxThroughputStrategy(),
)

Once the entry is created, you can simply start Triton Inference Server mounting the defined model_repository_path. Read more about deploying model on Triton Inference Server in documentation

Using Navigator Package

The Navigator Package is an artifact that can be produced at the end of the optimization process. The package is a simple Zip file which contains the optimization details, model metadata and serialized formats and can be saved using:

nav.package.save(
    package=package,
    path="/path/to/package.nav"
)

The package can be easily loaded on other machines and used to re-run the optimization process or profile the model. Read more about using package in documentation.

Examples

We provide step-by-step examples that demonstrate how to use various features of Model Navigator. For the sake of readability and accessibility, we use a simple torch.nn.Linear model as an example. These examples illustrate how to optimize, test and deploy the model on the PyTriton and Triton Inference Server.

Useful Links

More Repositories

1

server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
Python
8,180
star
2

pytriton

PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.
Python
725
star
3

tensorrtllm_backend

The Triton TensorRT-LLM Backend
Python
692
star
4

client

Triton Python, C++ and Java client libraries, and GRPC-generated client examples for go, java and scala.
C++
543
star
5

tutorials

This repository contains tutorials and examples for Triton Inference Server
Python
540
star
6

python_backend

Triton backend that enables pre-process, post-processing and other logic to be implemented in Python.
C++
508
star
7

model_analyzer

Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Server models.
Python
423
star
8

fastertransformer_backend

Python
412
star
9

backend

Common source, scripts and utilities for creating Triton backends.
C++
274
star
10

vllm_backend

Python
155
star
11

dali_backend

The Triton backend that allows running GPU-accelerated data pre-processing pipelines implemented in DALI's python API.
C++
122
star
12

onnxruntime_backend

The Triton backend for the ONNX Runtime.
C++
120
star
13

pytorch_backend

The Triton backend for the PyTorch TorchScript models.
C++
113
star
14

core

The core library and APIs implementing the Triton Inference Server.
C++
101
star
15

fil_backend

FIL backend for the Triton Inference Server
Jupyter Notebook
71
star
16

common

Common source, scripts and utilities shared across all Triton repositories.
C++
61
star
17

tensorrt_backend

The Triton backend for TensorRT.
C++
58
star
18

hugectr_backend

Jupyter Notebook
50
star
19

triton_cli

Triton CLI is an open source command line interface that enables users to create, deploy, and profile models served by the Triton Inference Server.
Python
45
star
20

tensorflow_backend

The Triton backend for TensorFlow.
C++
42
star
21

paddlepaddle_backend

C++
32
star
22

openvino_backend

OpenVINO backend for Triton.
C++
27
star
23

developer_tools

C++
18
star
24

stateful_backend

Triton backend for managing the model state tensors automatically in sequence batcher
C++
13
star
25

redis_cache

TRITONCACHE implementation of a Redis cache
C++
11
star
26

checksum_repository_agent

The Triton repository agent that verifies model checksums.
C++
8
star
27

contrib

Community contributions to Triton that are not officially supported or maintained by the Triton project.
Python
8
star
28

third_party

Third-party source packages that are modified for use in Triton.
C
7
star
29

identity_backend

Example Triton backend that demonstrates most of the Triton Backend API.
C++
6
star
30

repeat_backend

An example Triton backend that demonstrates sending zero, one, or multiple responses for each request.
C++
5
star
31

local_cache

Implementation of a local in-memory cache for Triton Inference Server's TRITONCACHE API
C++
4
star
32

square_backend

Simple Triton backend used for testing.
C++
2
star