NVIDIA/video-sdk-samples

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

13,339

NeMo

Style transfer, deep learning, feature transform

12,016

FastPhotoStyle

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

11,020

TensorRT

Ongoing research training transformer models at scale

10,618

Megatron-LM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

10,332

TensorRT-LLM

Pytorch implementation of our method for high-resolution (e.g. 2048x1024) photorealistic video-to-video translation.

8,542

vid2vid

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

8,482

apex

Synthesizing and manipulating 2048x1024 images with conditional GANs

8,239

pix2pixHD

Samples for CUDA Developers which demonstrates features in CUDA Toolkit

6,488

cuda-samples

6,119

cutlass

CUDA Templates for Linear Algebra Subroutines

FasterTransformer

DALI

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.

[ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl

5,048

thrust

Tacotron 2 - PyTorch implementation with faster-than-realtime inference

4,914

tacotron2

A Python framework for high performance GPU simulation and graphics

4,562

warp

Deep Learning GPU Training System

4,206

DIGITS

HTML

4,105

NeMo-Guardrails

NeMo Guardrails is an open-source toolkit for easily adding programmable guardrails to LLM-based conversational systems.

Optimized primitives for collective multi-GPU communication

4,064

nccl

Pytorch implementation of FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks

3,187

flownet2-pytorch

A developer reference project for creating Retrieval Augmented Generation (RAG) chatbots on Windows using TensorRT-LLM

2,938

ChatRTX

TypeScript

2,635

k8s-device-plugin

NVIDIA device plugin for Kubernetes

2,481

libcudacxx

[ARCHIVED] The C++ Standard Library for your entire system. See https://github.com/NVIDIA/cccl

Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.

2,294

GenerativeAIExamples

Build and run containers leveraging NVIDIA GPUs

2,192

nvidia-container-toolkit

2,171

waveglow

A Flow-based Generative Network for Speech Synthesis

Minkowski Engine is an auto-diff neural network library for high-dimensional sparse tensors

2,133

MinkowskiEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.

2,007

TransformerEngine

TensorRT Extension for Stable Diffusion Web UI

1,917

Stable-Diffusion-WebUI-TensorRT

Nvidia Semantic Segmentation monorepo

1,886

semantic-segmentation

NVIDIA GPU Operator creates/configures/manages GPUs atop Kubernetes

1,763

gpu-operator

1,735

cub

[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl

Deep learning for recommender systems

1,679

DeepRecommender

`std::execution`, the proposed C++ framework for asynchronous and parallel programming.

1,662

stdexec

Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP

1,554

OpenSeq2Seq

CUDALibrarySamples

VideoProcessingFramework

Set of Python bindings to C++ libraries which provides full HW acceleration for video decoding, encoding and GPU-accelerated color space and pixel format conversions

Tools for building GPU clusters

1,303

deepops

Shell

1,252

open-gpu-doc

Documentation of NVIDIA chip/hardware interfaces

1,243

aistore

AIStore: scalable storage for AI applications

1,233

Q2RTX

NVIDIA’s implementation of RTX ray-tracing in Quake II

1,217

trt-samples-for-hackathon-cn

Simple samples for TensorRT programming

CUDA Core Compute Libraries

1,211

cccl

An efficient C++17 GPU numerical computing library with Python-like syntax

1,200

MatX

A New Padding Scheme: Partial Convolution based Padding

1,187

partialconv

Unsupervised Language Modeling at scale for robust sentiment classification

1,145

sentiment-discovery

Open-source deep-learning framework for building, training, and fine-tuning deep learning models using state-of-the-art Physics-ML methods

1,055

nvidia-container-runtime

NVIDIA container runtime

Makefile

1,035

modulus

Tools for monitoring NVIDIA GPUs on Linux

991

gpu-monitoring-tools

974

jetson-gpio

A Python library that enables the use of Jetson's GPIOs

NVIDIA GPU metrics exporter for Prometheus leveraging DCGM

898

dcgm-exporter

886

retinanet-examples

Fast and accurate object detection with end-to-end GPU optimization

Flowtron is an auto-regressive flow-based generative network for text to speech synthesis with control over speech variation and style transfer

885

flowtron

nccl-tests

cuda-python

CUDA Python Low-level Bindings

Mellotron: a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data

859

mellotron

A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology

852

gdrcopy

NVIDIA container runtime library

832

libnvidia-container

818

BigVGAN

Official PyTorch implementation of BigVGAN (ICLR 2023)

Spark RAPIDS plugin - accelerate Apache Spark with GPUs

806

spark-rapids

Scala

800

nv-wavenet

Reference implementation of real-time autoregressive wavenet inference

NVIDIA DLSS is a new and improved deep learning neural network that boosts frame rates and generates beautiful, sharp images for your games

728

DLSS

727

tensorflow

An Open Source Machine Learning Framework for Everyone

Sparse volume compute and rendering on NVIDIA GPUs

719

gvdb-voxels

674

MAXINE-AR-SDK

NVIDIA AR SDK - API headers and sample applications

671

nvvl

A library that uses hardware acceleration to load sequences of video frames to facilitate machine learning training

Deep Learning Experiment Management

665

runx

NVIDIA Federated Learning Application Runtime Environment

633

NVFlare

Scalable toolkit for efficient model alignment

630

NeMo-Aligner

Repository for nvCOMP docs and examples. nvCOMP is a library for fast lossless compression/decompression on the GPU that can be downloaded from https://developer.nvidia.com/nvcomp.

564

nvcomp

Examples demonstrating available options to program multiple GPUs in a single node or a cluster

545

multi-gpu-programming-models

NVIDIA Deep learning Dataset Synthesizer (NDDS)

535

Dataset_Synthesizer

TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization, pruning, distillation, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed on NVIDIA GPUs.

530

TensorRT-Model-Optimizer

A single-header C++ library for simplifying the use of CUDA Runtime Compilation (NVRTC).

513

jitify

CUDA Kernel Benchmarking Library

512

nvbench

The GL Vendor-Neutral Dispatch library

501

libglvnd

501

NeMo-Curator

Scalable data pre processing and curation toolkit for LLMs

C++ and Python support for the CUDA Quantum programming model for heterogeneous quantum-classical workflows

500

cuda-quantum

Distributed multigrid linear solver library on GPU

496

AMGX

cuCollections

enroot

A simple yet powerful tool to turn traditional container/OS images into unprivileged sandboxes.

Shell

459

NeMo-Framework-Launcher

Provides end-to-end model development pipelines for LLMs and Multimodal models that can be launched on-prem or cloud-native.

hpc-container-maker

MDL-SDK

NVIDIA Material Definition Language SDK

A GPU performance profiling tool for PyTorch models

438

PyProf

Providing reproducibility in deep learning frameworks

437

framework-reproducibility

A REST API for Caffe using Docker and Go

424

gpu-rest-engine

NVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUs

421

DCGM

NVIDIA-accelerated zero latency video compression library for interactive remoting applications

394

NvPipe

Differentiable signal processing on the sphere for PyTorch

390

torch-harmonics

Home for cuQuantum Python & NVIDIA cuQuantum SDK C++ samples

386

cuQuantum

NVIDIA Data Science stack tools

344

data-science-stack

Shell

317

ai-assisted-annotation-client

Client side integration example source code and libraries for AI-Assisted Annotation SDK