nvidia-dockerBuild and run Docker containers leveraging NVIDIA GPUs
open-gpu-kernel-modulesNVIDIA Linux open GPU kernel module source
DeepLearningExamplesState-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
FastPhotoStyleStyle transfer, deep learning, feature transform
NeMoNeMo: a toolkit for conversational AI
vid2vidPytorch implementation of our method for high-resolution (e.g. 2048x1024) photorealistic video-to-video translation.
TensorRTNVIDIA® TensorRT™, an SDK for high-performance deep learning inference, includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for inference applications.
apexA PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
Megatron-LMOngoing research training transformer models at scale
pix2pixHDSynthesizing and manipulating 2048x1024 images with conditional GANs
thrust[ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl
tacotron2Tacotron 2 - PyTorch implementation with faster-than-realtime inference
DALIA GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
DIGITSDeep Learning GPU Training System
FasterTransformerTransformer related optimization, including BERT, GPT
cuda-samplesSamples for CUDA Developers which demonstrates features in CUDA Toolkit
cutlassCUDA Templates for Linear Algebra Subroutines
TensorRT-LLMTensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
flownet2-pytorchPytorch implementation of FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks
ncclOptimized primitives for collective multi-GPU communication
NeMo-GuardrailsNeMo Guardrails is an open-source toolkit for easily adding programmable guardrails to LLM-based conversational systems.
libcudacxx[ARCHIVED] The C++ Standard Library for your entire system. See https://github.com/NVIDIA/cccl
waveglowA Flow-based Generative Network for Speech Synthesis
MinkowskiEngineMinkowski Engine is an auto-diff neural network library for high-dimensional sparse tensors
k8s-device-pluginNVIDIA device plugin for Kubernetes
semantic-segmentationNvidia Semantic Segmentation monorepo
DeepRecommenderDeep learning for recommender systems
cub[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl
OpenSeq2SeqToolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP
warpA Python framework for high performance GPU simulation and graphics
Q2RTXNVIDIA’s implementation of RTX ray-tracing in Quake II
open-gpu-docDocumentation of NVIDIA chip/hardware interfaces
partialconvA New Padding Scheme: Partial Convolution based Padding
deepopsTools for building GPU clusters
VideoProcessingFrameworkSet of Python bindings to C++ libraries which provides full HW acceleration for video decoding, encoding and GPU-accelerated color space and pixel format conversions
TransformerEngineA library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
sentiment-discoveryUnsupervised Language Modeling at scale for robust sentiment classification
nvidia-container-runtimeNVIDIA container runtime
gpu-monitoring-toolsTools for monitoring NVIDIA GPUs on Linux
aistoreAIStore: scalable storage for AI applications
Stable-Diffusion-WebUI-TensorRTTensorRT Extension for Stable Diffusion Web UI
trt-samples-for-hackathon-cnSimple samples for TensorRT programming
retinanet-examplesFast and accurate object detection with end-to-end GPU optimization
flowtronFlowtron is an auto-regressive flow-based generative network for text to speech synthesis with control over speech variation and style transfer
gpu-operatorNVIDIA GPU Operator creates/configures/manages GPUs atop Kubernetes
mellotronMellotron: a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data
CUDALibrarySamplesCUDA Library Samples
MatXAn efficient C++17 GPU numerical computing library with Python-like syntax
stdexec`std::execution`, the proposed C++ framework for asynchronous and parallel programming.
jetson-gpioA Python library that enables the use of Jetson's GPIOs
nv-wavenetReference implementation of real-time autoregressive wavenet inference
tensorflowAn Open Source Machine Learning Framework for Everyone
gdrcopyA fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology
MAXINE-AR-SDKNVIDIA AR SDK - API headers and sample applications
nvvlA library that uses hardware acceleration to load sequences of video frames to facilitate machine learning training
cuda-pythonCUDA Python Low-level Bindings
gvdb-voxelsSparse volume compute and rendering on NVIDIA GPUs
libnvidia-containerNVIDIA container runtime library
runxDeep Learning Experiment Management
spark-rapidsSpark RAPIDS plugin - accelerate Apache Spark with GPUs
Dataset_SynthesizerNVIDIA Deep learning Dataset Synthesizer (NDDS)
BigVGANOfficial PyTorch implementation of BigVGAN (ICLR 2023)
jitifyA single-header C++ library for simplifying the use of CUDA Runtime Compilation (NVRTC).
dcgm-exporterNVIDIA GPU metrics exporter for Prometheus leveraging DCGM
enrootA simple yet powerful tool to turn traditional container/OS images into unprivileged sandboxes.
DLSSNVIDIA DLSS is a new and improved deep learning neural network that boosts frame rates and generates beautiful, sharp images for your games
NVFlareNVIDIA Federated Learning Application Runtime Environment
libglvndThe GL Vendor-Neutral Dispatch library
nvcompRepository for nvCOMP docs and examples. nvCOMP is a library for fast lossless compression/decompression on the GPU that can be downloaded from https://developer.nvidia.com/nvcomp.
MDL-SDKNVIDIA Material Definition Language SDK
PyProfA GPU performance profiling tool for PyTorch models
modulusOpen-source deep-learning framework for building, training, and fine-tuning deep learning models using state-of-the-art Physics-ML methods
gpu-rest-engineA REST API for Caffe using Docker and Go
AMGXDistributed multigrid linear solver library on GPU
framework-reproducibilityProviding reproducibility in deep learning frameworks
hpc-container-makerHPC Container Maker
multi-gpu-programming-modelsExamples demonstrating available options to program multiple GPUs in a single node or a cluster
ccclCUDA C++ Core Libraries
NvPipeNVIDIA-accelerated zero latency video compression library for interactive remoting applications
nvidia-container-toolkitBuild and run containers leveraging NVIDIA GPUs
data-science-stackNVIDIA Data Science stack tools
nvbenchCUDA Kernel Benchmarking Library
video-sdk-samplesSamples demonstrating how to use various APIs of NVIDIA Video Codec SDK
ai-assisted-annotation-clientClient side integration example source code and libraries for AI-Assisted Annotation SDK
cnmemA simple memory manager for CUDA designed to help Deep Learning frameworks manage memory
nvidia-settingsNVIDIA driver control panel
cuQuantumHome for cuQuantum Python & NVIDIA cuQuantum SDK C++ samples
fsi-samplesA collection of open-source GPU accelerated Python tools and examples for quantitative analyst tasks and leverages RAPIDS AI project, Numba, cuDF, and Dask.
cuda-quantumC++ and Python support for the CUDA Quantum programming model for heterogeneous quantum-classical workflows
tensorrt-laboratoryExplore the Capabilities of the TensorRT Platform
NeMo-Megatron-LauncherNeMo Megatron launcher and tools
DCGMNVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUs
egl-waylandThe EGLStream-based Wayland external platform
radttsProvides training, inference and voice conversion recipes for RADTTS and RADTTS++: Flow-based TTS models with Robust Alignment Learning, Diverse Synthesis, and Generative Modeling and Fine-Grained Control over of Low Dimensional (F0 and Energy) Speech Attributes.
gpu-feature-discoveryGPU plugin to the node feature discovery for Kubernetes
VisRTXNVIDIA RTX based implementation of ANARI
FalcorReal-time rendering research framework
transformer-lsOfficial PyTorch Implementation of Long-Short Transformer (NeurIPS 2021).