nvidia-docker
Build and run Docker containers leveraging NVIDIA GPUsopen-gpu-kernel-modules
NVIDIA Linux open GPU kernel module sourceDeepLearningExamples
State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.FastPhotoStyle
Style transfer, deep learning, feature transformNeMo
NeMo: a toolkit for conversational AIvid2vid
Pytorch implementation of our method for high-resolution (e.g. 2048x1024) photorealistic video-to-video translation.TensorRT
NVIDIA® TensorRT™, an SDK for high-performance deep learning inference, includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for inference applications.apex
A PyTorch Extension: Tools for easy mixed precision and distributed training in PytorchMegatron-LM
Ongoing research training transformer models at scalepix2pixHD
Synthesizing and manipulating 2048x1024 images with conditional GANsthrust
[ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccltacotron2
Tacotron 2 - PyTorch implementation with faster-than-realtime inferenceDALI
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.DIGITS
Deep Learning GPU Training SystemFasterTransformer
Transformer related optimization, including BERT, GPTcuda-samples
Samples for CUDA Developers which demonstrates features in CUDA Toolkitcutlass
CUDA Templates for Linear Algebra SubroutinesTensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.flownet2-pytorch
Pytorch implementation of FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networksnccl
Optimized primitives for collective multi-GPU communicationNeMo-Guardrails
NeMo Guardrails is an open-source toolkit for easily adding programmable guardrails to LLM-based conversational systems.libcudacxx
[ARCHIVED] The C++ Standard Library for your entire system. See https://github.com/NVIDIA/ccclwaveglow
A Flow-based Generative Network for Speech SynthesisMinkowskiEngine
Minkowski Engine is an auto-diff neural network library for high-dimensional sparse tensorsk8s-device-plugin
NVIDIA device plugin for Kubernetessemantic-segmentation
Nvidia Semantic Segmentation monorepoDeepRecommender
Deep learning for recommender systemscub
[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/ccclOpenSeq2Seq
Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLPwarp
A Python framework for high performance GPU simulation and graphicsQ2RTX
NVIDIA’s implementation of RTX ray-tracing in Quake IIopen-gpu-doc
Documentation of NVIDIA chip/hardware interfacespartialconv
A New Padding Scheme: Partial Convolution based Paddingdeepops
Tools for building GPU clustersVideoProcessingFramework
Set of Python bindings to C++ libraries which provides full HW acceleration for video decoding, encoding and GPU-accelerated color space and pixel format conversionsTransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.sentiment-discovery
Unsupervised Language Modeling at scale for robust sentiment classificationnvidia-container-runtime
NVIDIA container runtimegpu-monitoring-tools
Tools for monitoring NVIDIA GPUs on Linuxaistore
AIStore: scalable storage for AI applicationsStable-Diffusion-WebUI-TensorRT
TensorRT Extension for Stable Diffusion Web UItrt-samples-for-hackathon-cn
Simple samples for TensorRT programmingretinanet-examples
Fast and accurate object detection with end-to-end GPU optimizationflowtron
Flowtron is an auto-regressive flow-based generative network for text to speech synthesis with control over speech variation and style transfergpu-operator
NVIDIA GPU Operator creates/configures/manages GPUs atop Kubernetesmellotron
Mellotron: a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training dataCUDALibrarySamples
CUDA Library SamplesMatX
An efficient C++17 GPU numerical computing library with Python-like syntaxstdexec
`std::execution`, the proposed C++ framework for asynchronous and parallel programming.jetson-gpio
A Python library that enables the use of Jetson's GPIOsnv-wavenet
Reference implementation of real-time autoregressive wavenet inferencetensorflow
An Open Source Machine Learning Framework for Everyonegdrcopy
A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technologyMAXINE-AR-SDK
NVIDIA AR SDK - API headers and sample applicationsnvvl
A library that uses hardware acceleration to load sequences of video frames to facilitate machine learning trainingcuda-python
CUDA Python Low-level Bindingsgvdb-voxels
Sparse volume compute and rendering on NVIDIA GPUslibnvidia-container
NVIDIA container runtime libraryrunx
Deep Learning Experiment Managementspark-rapids
Spark RAPIDS plugin - accelerate Apache Spark with GPUsDataset_Synthesizer
NVIDIA Deep learning Dataset Synthesizer (NDDS)nccl-tests
NCCL TestsBigVGAN
Official PyTorch implementation of BigVGAN (ICLR 2023)jitify
A single-header C++ library for simplifying the use of CUDA Runtime Compilation (NVRTC).dcgm-exporter
NVIDIA GPU metrics exporter for Prometheus leveraging DCGMenroot
A simple yet powerful tool to turn traditional container/OS images into unprivileged sandboxes.DLSS
NVIDIA DLSS is a new and improved deep learning neural network that boosts frame rates and generates beautiful, sharp images for your gamesNVFlare
NVIDIA Federated Learning Application Runtime Environmentlibglvnd
The GL Vendor-Neutral Dispatch librarynvcomp
Repository for nvCOMP docs and examples. nvCOMP is a library for fast lossless compression/decompression on the GPU that can be downloaded from https://developer.nvidia.com/nvcomp.MDL-SDK
NVIDIA Material Definition Language SDKPyProf
A GPU performance profiling tool for PyTorch modelsmodulus
Open-source deep-learning framework for building, training, and fine-tuning deep learning models using state-of-the-art Physics-ML methodsgpu-rest-engine
A REST API for Caffe using Docker and GoAMGX
Distributed multigrid linear solver library on GPUframework-reproducibility
Providing reproducibility in deep learning frameworkshpc-container-maker
HPC Container Makermulti-gpu-programming-models
Examples demonstrating available options to program multiple GPUs in a single node or a clustercccl
CUDA C++ Core LibrariesNvPipe
NVIDIA-accelerated zero latency video compression library for interactive remoting applicationscuCollections
nvidia-container-toolkit
Build and run containers leveraging NVIDIA GPUsdata-science-stack
NVIDIA Data Science stack toolsnvbench
CUDA Kernel Benchmarking Libraryvideo-sdk-samples
Samples demonstrating how to use various APIs of NVIDIA Video Codec SDKai-assisted-annotation-client
Client side integration example source code and libraries for AI-Assisted Annotation SDKcnmem
A simple memory manager for CUDA designed to help Deep Learning frameworks manage memorynvidia-settings
NVIDIA driver control panelcuQuantum
Home for cuQuantum Python & NVIDIA cuQuantum SDK C++ samplesfsi-samples
A collection of open-source GPU accelerated Python tools and examples for quantitative analyst tasks and leverages RAPIDS AI project, Numba, cuDF, and Dask.cuda-quantum
C++ and Python support for the CUDA Quantum programming model for heterogeneous quantum-classical workflowstensorrt-laboratory
Explore the Capabilities of the TensorRT PlatformNeMo-Megatron-Launcher
NeMo Megatron launcher and toolsDCGM
NVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUsegl-wayland
The EGLStream-based Wayland external platformradtts
Provides training, inference and voice conversion recipes for RADTTS and RADTTS++: Flow-based TTS models with Robust Alignment Learning, Diverse Synthesis, and Generative Modeling and Fine-Grained Control over of Low Dimensional (F0 and Energy) Speech Attributes.gpu-feature-discovery
GPU plugin to the node feature discovery for KubernetesVisRTX
NVIDIA RTX based implementation of ANARIFalcor
Real-time rendering research frameworktransformer-ls
Official PyTorch Implementation of Long-Short Transformer (NeurIPS 2021).Love Open Source and this site? Check out how you can help us