There are no reviews yet. Be the first to send feedback to the community and the maintainers!
nvidia-docker
Build and run Docker containers leveraging NVIDIA GPUsopen-gpu-kernel-modules
NVIDIA Linux open GPU kernel module sourceDeepLearningExamples
State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.NeMo
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)FastPhotoStyle
Style transfer, deep learning, feature transformTensorRT
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.Megatron-LM
Ongoing research training transformer models at scaleTensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.vid2vid
Pytorch implementation of our method for high-resolution (e.g. 2048x1024) photorealistic video-to-video translation.apex
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorchpix2pixHD
Synthesizing and manipulating 2048x1024 images with conditional GANscuda-samples
Samples for CUDA Developers which demonstrates features in CUDA Toolkitcutlass
CUDA Templates for Linear Algebra SubroutinesFasterTransformer
Transformer related optimization, including BERT, GPTDALI
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.thrust
[ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccltacotron2
Tacotron 2 - PyTorch implementation with faster-than-realtime inferencewarp
A Python framework for high performance GPU simulation and graphicsDIGITS
Deep Learning GPU Training SystemNeMo-Guardrails
NeMo Guardrails is an open-source toolkit for easily adding programmable guardrails to LLM-based conversational systems.nccl
Optimized primitives for collective multi-GPU communicationflownet2-pytorch
Pytorch implementation of FlowNet 2.0: Evolution of Optical Flow Estimation with Deep NetworksChatRTX
A developer reference project for creating Retrieval Augmented Generation (RAG) chatbots on Windows using TensorRT-LLMk8s-device-plugin
NVIDIA device plugin for Kuberneteslibcudacxx
[ARCHIVED] The C++ Standard Library for your entire system. See https://github.com/NVIDIA/ccclGenerativeAIExamples
Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.nvidia-container-toolkit
Build and run containers leveraging NVIDIA GPUswaveglow
A Flow-based Generative Network for Speech SynthesisMinkowskiEngine
Minkowski Engine is an auto-diff neural network library for high-dimensional sparse tensorsTransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.Stable-Diffusion-WebUI-TensorRT
TensorRT Extension for Stable Diffusion Web UIsemantic-segmentation
Nvidia Semantic Segmentation monorepogpu-operator
NVIDIA GPU Operator creates/configures/manages GPUs atop Kubernetescub
[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/ccclDeepRecommender
Deep learning for recommender systemsstdexec
`std::execution`, the proposed C++ framework for asynchronous and parallel programming.OpenSeq2Seq
Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLPCUDALibrarySamples
CUDA Library SamplesVideoProcessingFramework
Set of Python bindings to C++ libraries which provides full HW acceleration for video decoding, encoding and GPU-accelerated color space and pixel format conversionsdeepops
Tools for building GPU clustersopen-gpu-doc
Documentation of NVIDIA chip/hardware interfacesaistore
AIStore: scalable storage for AI applicationsQ2RTX
NVIDIA’s implementation of RTX ray-tracing in Quake IItrt-samples-for-hackathon-cn
Simple samples for TensorRT programmingcccl
CUDA Core Compute LibrariesMatX
An efficient C++17 GPU numerical computing library with Python-like syntaxpartialconv
A New Padding Scheme: Partial Convolution based Paddingsentiment-discovery
Unsupervised Language Modeling at scale for robust sentiment classificationnvidia-container-runtime
NVIDIA container runtimemodulus
Open-source deep-learning framework for building, training, and fine-tuning deep learning models using state-of-the-art Physics-ML methodsgpu-monitoring-tools
Tools for monitoring NVIDIA GPUs on Linuxjetson-gpio
A Python library that enables the use of Jetson's GPIOsdcgm-exporter
NVIDIA GPU metrics exporter for Prometheus leveraging DCGMretinanet-examples
Fast and accurate object detection with end-to-end GPU optimizationflowtron
Flowtron is an auto-regressive flow-based generative network for text to speech synthesis with control over speech variation and style transfernccl-tests
NCCL Testscuda-python
CUDA Python Low-level Bindingsmellotron
Mellotron: a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training datagdrcopy
A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technologylibnvidia-container
NVIDIA container runtime libraryBigVGAN
Official PyTorch implementation of BigVGAN (ICLR 2023)spark-rapids
Spark RAPIDS plugin - accelerate Apache Spark with GPUsnv-wavenet
Reference implementation of real-time autoregressive wavenet inferenceDLSS
NVIDIA DLSS is a new and improved deep learning neural network that boosts frame rates and generates beautiful, sharp images for your gamestensorflow
An Open Source Machine Learning Framework for Everyonegvdb-voxels
Sparse volume compute and rendering on NVIDIA GPUsMAXINE-AR-SDK
NVIDIA AR SDK - API headers and sample applicationsnvvl
A library that uses hardware acceleration to load sequences of video frames to facilitate machine learning trainingrunx
Deep Learning Experiment ManagementNVFlare
NVIDIA Federated Learning Application Runtime EnvironmentNeMo-Aligner
Scalable toolkit for efficient model alignmentnvcomp
Repository for nvCOMP docs and examples. nvCOMP is a library for fast lossless compression/decompression on the GPU that can be downloaded from https://developer.nvidia.com/nvcomp.multi-gpu-programming-models
Examples demonstrating available options to program multiple GPUs in a single node or a clusterDataset_Synthesizer
NVIDIA Deep learning Dataset Synthesizer (NDDS)TensorRT-Model-Optimizer
TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization, pruning, distillation, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed on NVIDIA GPUs.jitify
A single-header C++ library for simplifying the use of CUDA Runtime Compilation (NVRTC).nvbench
CUDA Kernel Benchmarking Librarylibglvnd
The GL Vendor-Neutral Dispatch libraryNeMo-Curator
Scalable data pre processing and curation toolkit for LLMscuda-quantum
C++ and Python support for the CUDA Quantum programming model for heterogeneous quantum-classical workflowsAMGX
Distributed multigrid linear solver library on GPUcuCollections
enroot
A simple yet powerful tool to turn traditional container/OS images into unprivileged sandboxes.NeMo-Framework-Launcher
Provides end-to-end model development pipelines for LLMs and Multimodal models that can be launched on-prem or cloud-native.hpc-container-maker
HPC Container MakerMDL-SDK
NVIDIA Material Definition Language SDKPyProf
A GPU performance profiling tool for PyTorch modelsframework-reproducibility
Providing reproducibility in deep learning frameworksgpu-rest-engine
A REST API for Caffe using Docker and GoDCGM
NVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUsNvPipe
NVIDIA-accelerated zero latency video compression library for interactive remoting applicationstorch-harmonics
Differentiable signal processing on the sphere for PyTorchcuQuantum
Home for cuQuantum Python & NVIDIA cuQuantum SDK C++ samplesdata-science-stack
NVIDIA Data Science stack toolsai-assisted-annotation-client
Client side integration example source code and libraries for AI-Assisted Annotation SDKvideo-sdk-samples
Samples demonstrating how to use various APIs of NVIDIA Video Codec SDKegl-wayland
The EGLStream-based Wayland external platformnvidia-settings
NVIDIA driver control panelNVTX
The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resources in your applications.go-nvml
Go Bindings for the NVIDIA Management Library (NVML)Love Open Source and this site? Check out how you can help us