NVIDIA/TensorRT-Model-Optimizer

Stars
513
Rank 86,178 (Top 2 %)
Language
Python
License
Other
Created 7 months ago
Updated 29 days ago

NVIDIA/TensorRT-Model-Optimizer

NVIDIA

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization, pruning, distillation, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed on NVIDIA GPUs.

nvidia-docker

Build and run Docker containers leveraging NVIDIA GPUs

open-gpu-kernel-modules

NVIDIA Linux open GPU kernel module source

DeepLearningExamples

State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.

Jupyter Notebook

NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

FastPhotoStyle

Style transfer, deep learning, feature transform

TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

Megatron-LM

Ongoing research training transformer models at scale

TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

vid2vid

Pytorch implementation of our method for high-resolution (e.g. 2048x1024) photorealistic video-to-video translation.

apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

pix2pixHD

Synthesizing and manipulating 2048x1024 images with conditional GANs

cuda-samples

Samples for CUDA Developers which demonstrates features in CUDA Toolkit

cutlass

CUDA Templates for Linear Algebra Subroutines

FasterTransformer

Transformer related optimization, including BERT, GPT

DALI

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.

thrust

[ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl

tacotron2

Tacotron 2 - PyTorch implementation with faster-than-realtime inference

Jupyter Notebook

warp

A Python framework for high performance GPU simulation and graphics

DIGITS

Deep Learning GPU Training System

NeMo-Guardrails

NeMo Guardrails is an open-source toolkit for easily adding programmable guardrails to LLM-based conversational systems.

nccl

Optimized primitives for collective multi-GPU communication

flownet2-pytorch

Pytorch implementation of FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks

ChatRTX

A developer reference project for creating Retrieval Augmented Generation (RAG) chatbots on Windows using TensorRT-LLM

k8s-device-plugin

NVIDIA device plugin for Kubernetes

libcudacxx

[ARCHIVED] The C++ Standard Library for your entire system. See https://github.com/NVIDIA/cccl

GenerativeAIExamples

Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.

nvidia-container-toolkit

Build and run containers leveraging NVIDIA GPUs

waveglow

A Flow-based Generative Network for Speech Synthesis

MinkowskiEngine

Minkowski Engine is an auto-diff neural network library for high-dimensional sparse tensors

TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.

Stable-Diffusion-WebUI-TensorRT

TensorRT Extension for Stable Diffusion Web UI

semantic-segmentation

Nvidia Semantic Segmentation monorepo

gpu-operator

NVIDIA GPU Operator creates/configures/manages GPUs atop Kubernetes

cub

[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl

DeepRecommender

Deep learning for recommender systems

stdexec

`std::execution`, the proposed C++ framework for asynchronous and parallel programming.

OpenSeq2Seq

Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP

CUDALibrarySamples

CUDA Library Samples

VideoProcessingFramework

Set of Python bindings to C++ libraries which provides full HW acceleration for video decoding, encoding and GPU-accelerated color space and pixel format conversions

deepops

Tools for building GPU clusters

open-gpu-doc

Documentation of NVIDIA chip/hardware interfaces

aistore

AIStore: scalable storage for AI applications

Q2RTX

NVIDIA’s implementation of RTX ray-tracing in Quake II

trt-samples-for-hackathon-cn

Simple samples for TensorRT programming

cccl

CUDA Core Compute Libraries

MatX

An efficient C++17 GPU numerical computing library with Python-like syntax

partialconv

A New Padding Scheme: Partial Convolution based Padding

sentiment-discovery

Unsupervised Language Modeling at scale for robust sentiment classification

nvidia-container-runtime

NVIDIA container runtime

modulus

Open-source deep-learning framework for building, training, and fine-tuning deep learning models using state-of-the-art Physics-ML methods

gpu-monitoring-tools

Tools for monitoring NVIDIA GPUs on Linux

jetson-gpio

A Python library that enables the use of Jetson's GPIOs

dcgm-exporter

NVIDIA GPU metrics exporter for Prometheus leveraging DCGM

retinanet-examples

Fast and accurate object detection with end-to-end GPU optimization

flowtron

Flowtron is an auto-regressive flow-based generative network for text to speech synthesis with control over speech variation and style transfer

Jupyter Notebook

nccl-tests

cuda-python

CUDA Python Low-level Bindings

mellotron

Mellotron: a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data

Jupyter Notebook

gdrcopy

A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology

libnvidia-container

NVIDIA container runtime library

BigVGAN

Official PyTorch implementation of BigVGAN (ICLR 2023)

spark-rapids

Spark RAPIDS plugin - accelerate Apache Spark with GPUs

nv-wavenet

Reference implementation of real-time autoregressive wavenet inference

DLSS

NVIDIA DLSS is a new and improved deep learning neural network that boosts frame rates and generates beautiful, sharp images for your games

tensorflow

An Open Source Machine Learning Framework for Everyone

gvdb-voxels

Sparse volume compute and rendering on NVIDIA GPUs

MAXINE-AR-SDK

NVIDIA AR SDK - API headers and sample applications

nvvl

A library that uses hardware acceleration to load sequences of video frames to facilitate machine learning training

runx

Deep Learning Experiment Management

NVFlare

NVIDIA Federated Learning Application Runtime Environment

NeMo-Aligner

Scalable toolkit for efficient model alignment

nvcomp

Repository for nvCOMP docs and examples. nvCOMP is a library for fast lossless compression/decompression on the GPU that can be downloaded from https://developer.nvidia.com/nvcomp.

multi-gpu-programming-models

Examples demonstrating available options to program multiple GPUs in a single node or a cluster

Dataset_Synthesizer

NVIDIA Deep learning Dataset Synthesizer (NDDS)

jitify

A single-header C++ library for simplifying the use of CUDA Runtime Compilation (NVRTC).

nvbench

CUDA Kernel Benchmarking Library

libglvnd

The GL Vendor-Neutral Dispatch library

NeMo-Curator

Scalable data pre processing and curation toolkit for LLMs

Jupyter Notebook

cuda-quantum

C++ and Python support for the CUDA Quantum programming model for heterogeneous quantum-classical workflows

AMGX

Distributed multigrid linear solver library on GPU

cuCollections

enroot

A simple yet powerful tool to turn traditional container/OS images into unprivileged sandboxes.

NeMo-Framework-Launcher

Provides end-to-end model development pipelines for LLMs and Multimodal models that can be launched on-prem or cloud-native.

hpc-container-maker

HPC Container Maker

MDL-SDK

NVIDIA Material Definition Language SDK

PyProf

A GPU performance profiling tool for PyTorch models

framework-reproducibility

Providing reproducibility in deep learning frameworks

gpu-rest-engine

A REST API for Caffe using Docker and Go

DCGM

NVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUs

NvPipe

NVIDIA-accelerated zero latency video compression library for interactive remoting applications

torch-harmonics

Differentiable signal processing on the sphere for PyTorch

Jupyter Notebook

cuQuantum

Home for cuQuantum Python & NVIDIA cuQuantum SDK C++ samples

Jupyter Notebook

data-science-stack

NVIDIA Data Science stack tools

ai-assisted-annotation-client

Client side integration example source code and libraries for AI-Assisted Annotation SDK

video-sdk-samples

Samples demonstrating how to use various APIs of NVIDIA Video Codec SDK

egl-wayland

The EGLStream-based Wayland external platform

nvidia-settings

NVIDIA driver control panel

NVTX

The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resources in your applications.

go-nvml

Go Bindings for the NVIDIA Management Library (NVML)

gpu-feature-discovery

GPU plugin to the node feature discovery for Kubernetes