NVIDIA Corporation (@NVIDIA)

Top repositories

1

nvidia-docker

Build and run Docker containers leveraging NVIDIA GPUs
16,896
star
2

open-gpu-kernel-modules

NVIDIA Linux open GPU kernel module source
C
13,385
star
3

DeepLearningExamples

State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
Jupyter Notebook
11,727
star
4

FastPhotoStyle

Style transfer, deep learning, feature transform
Python
11,020
star
5

NeMo

NeMo: a toolkit for conversational AI
Python
8,596
star
6

vid2vid

Pytorch implementation of our method for high-resolution (e.g. 2048x1024) photorealistic video-to-video translation.
Python
8,349
star
7

TensorRT

NVIDIA® TensorRT™, an SDK for high-performance deep learning inference, includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for inference applications.
C++
8,140
star
8

apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
Python
7,702
star
9

Megatron-LM

Ongoing research training transformer models at scale
Python
6,894
star
10

pix2pixHD

Synthesizing and manipulating 2048x1024 images with conditional GANs
Python
6,321
star
11

thrust

[ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl
C++
4,785
star
12

tacotron2

Tacotron 2 - PyTorch implementation with faster-than-realtime inference
Jupyter Notebook
4,562
star
13

DALI

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
C++
4,531
star
14

DIGITS

Deep Learning GPU Training System
HTML
4,105
star
15

FasterTransformer

Transformer related optimization, including BERT, GPT
C++
4,105
star
16

cuda-samples

Samples for CUDA Developers which demonstrates features in CUDA Toolkit
C
3,819
star
17

cutlass

CUDA Templates for Linear Algebra Subroutines
C++
3,531
star
18

TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
C++
3,442
star
19

flownet2-pytorch

Pytorch implementation of FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks
Python
2,938
star
20

nccl

Optimized primitives for collective multi-GPU communication
C++
2,493
star
21

NeMo-Guardrails

NeMo Guardrails is an open-source toolkit for easily adding programmable guardrails to LLM-based conversational systems.
Python
2,470
star
22

libcudacxx

[ARCHIVED] The C++ Standard Library for your entire system. See https://github.com/NVIDIA/cccl
C++
2,289
star
23

waveglow

A Flow-based Generative Network for Speech Synthesis
Python
2,133
star
24

MinkowskiEngine

Minkowski Engine is an auto-diff neural network library for high-dimensional sparse tensors
Python
2,007
star
25

k8s-device-plugin

NVIDIA device plugin for Kubernetes
Go
1,936
star
26

semantic-segmentation

Nvidia Semantic Segmentation monorepo
Python
1,725
star
27

DeepRecommender

Deep learning for recommender systems
Python
1,662
star
28

cub

[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl
Cuda
1,605
star
29

OpenSeq2Seq

Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP
Python
1,511
star
30

warp

A Python framework for high performance GPU simulation and graphics
HTML
1,398
star
31

Q2RTX

NVIDIA’s implementation of RTX ray-tracing in Quake II
C
1,171
star
32

open-gpu-doc

Documentation of NVIDIA chip/hardware interfaces
C
1,156
star
33

partialconv

A New Padding Scheme: Partial Convolution based Padding
Python
1,145
star
34

deepops

Tools for building GPU clusters
Shell
1,090
star
35

VideoProcessingFramework

Set of Python bindings to C++ libraries which provides full HW acceleration for video decoding, encoding and GPU-accelerated color space and pixel format conversions
C++
1,089
star
36

TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
Python
1,059
star
37

sentiment-discovery

Unsupervised Language Modeling at scale for robust sentiment classification
Python
1,049
star
38

nvidia-container-runtime

NVIDIA container runtime
Makefile
1,035
star
39

gpu-monitoring-tools

Tools for monitoring NVIDIA GPUs on Linux
C
974
star
40

aistore

AIStore: scalable storage for AI applications
Go
950
star
41

Stable-Diffusion-WebUI-TensorRT

TensorRT Extension for Stable Diffusion Web UI
Python
941
star
42

trt-samples-for-hackathon-cn

Simple samples for TensorRT programming
Python
914
star
43

retinanet-examples

Fast and accurate object detection with end-to-end GPU optimization
Python
857
star
44

flowtron

Flowtron is an auto-regressive flow-based generative network for text to speech synthesis with control over speech variation and style transfer
Jupyter Notebook
846
star
45

gpu-operator

NVIDIA GPU Operator creates/configures/manages GPUs atop Kubernetes
Go
834
star
46

mellotron

Mellotron: a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data
Jupyter Notebook
813
star
47

CUDALibrarySamples

CUDA Library Samples
Cuda
804
star
48

MatX

An efficient C++17 GPU numerical computing library with Python-like syntax
C++
750
star
49

stdexec

`std::execution`, the proposed C++ framework for asynchronous and parallel programming.
C++
749
star
50

jetson-gpio

A Python library that enables the use of Jetson's GPIOs
Python
734
star
51

nv-wavenet

Reference implementation of real-time autoregressive wavenet inference
Cuda
728
star
52

tensorflow

An Open Source Machine Learning Framework for Everyone
C++
719
star
53

gdrcopy

A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology
C++
711
star
54

MAXINE-AR-SDK

NVIDIA AR SDK - API headers and sample applications
C
671
star
55

nvvl

A library that uses hardware acceleration to load sequences of video frames to facilitate machine learning training
C++
665
star
56

cuda-python

CUDA Python Low-level Bindings
Python
644
star
57

gvdb-voxels

Sparse volume compute and rendering on NVIDIA GPUs
C
636
star
58

libnvidia-container

NVIDIA container runtime library
C
634
star
59

runx

Deep Learning Experiment Management
Python
629
star
60

spark-rapids

Spark RAPIDS plugin - accelerate Apache Spark with GPUs
Scala
592
star
61

Dataset_Synthesizer

NVIDIA Deep learning Dataset Synthesizer (NDDS)
C++
530
star
62

nccl-tests

NCCL Tests
Cuda
530
star
63

BigVGAN

Official PyTorch implementation of BigVGAN (ICLR 2023)
Python
520
star
64

jitify

A single-header C++ library for simplifying the use of CUDA Runtime Compilation (NVRTC).
C++
485
star
65

dcgm-exporter

NVIDIA GPU metrics exporter for Prometheus leveraging DCGM
Go
475
star
66

enroot

A simple yet powerful tool to turn traditional container/OS images into unprivileged sandboxes.
Shell
459
star
67

DLSS

NVIDIA DLSS is a new and improved deep learning neural network that boosts frame rates and generates beautiful, sharp images for your games
C
455
star
68

NVFlare

NVIDIA Federated Learning Application Runtime Environment
Python
454
star
69

libglvnd

The GL Vendor-Neutral Dispatch library
C
453
star
70

nvcomp

Repository for nvCOMP docs and examples. nvCOMP is a library for fast lossless compression/decompression on the GPU that can be downloaded from https://developer.nvidia.com/nvcomp.
C++
447
star
71

MDL-SDK

NVIDIA Material Definition Language SDK
C++
438
star
72

PyProf

A GPU performance profiling tool for PyTorch models
Python
437
star
73

modulus

Open-source deep-learning framework for building, training, and fine-tuning deep learning models using state-of-the-art Physics-ML methods
Python
430
star
74

gpu-rest-engine

A REST API for Caffe using Docker and Go
C++
421
star
75

AMGX

Distributed multigrid linear solver library on GPU
Cuda
417
star
76

framework-reproducibility

Providing reproducibility in deep learning frameworks
Python
408
star
77

hpc-container-maker

HPC Container Maker
Python
404
star
78

multi-gpu-programming-models

Examples demonstrating available options to program multiple GPUs in a single node or a cluster
Cuda
400
star
79

cccl

CUDA C++ Core Libraries
C++
387
star
80

NvPipe

NVIDIA-accelerated zero latency video compression library for interactive remoting applications
Cuda
383
star
81

cuCollections

C++
349
star
82

nvidia-container-toolkit

Build and run containers leveraging NVIDIA GPUs
Go
326
star
83

data-science-stack

NVIDIA Data Science stack tools
Shell
317
star
84

nvbench

CUDA Kernel Benchmarking Library
Cuda
301
star
85

video-sdk-samples

Samples demonstrating how to use various APIs of NVIDIA Video Codec SDK
C++
301
star
86

ai-assisted-annotation-client

Client side integration example source code and libraries for AI-Assisted Annotation SDK
C++
300
star
87

cnmem

A simple memory manager for CUDA designed to help Deep Learning frameworks manage memory
C++
276
star
88

nvidia-settings

NVIDIA driver control panel
C
275
star
89

cuQuantum

Home for cuQuantum Python & NVIDIA cuQuantum SDK C++ samples
Jupyter Notebook
272
star
90

fsi-samples

A collection of open-source GPU accelerated Python tools and examples for quantitative analyst tasks and leverages RAPIDS AI project, Numba, cuDF, and Dask.
Jupyter Notebook
262
star
91

cuda-quantum

C++ and Python support for the CUDA Quantum programming model for heterogeneous quantum-classical workflows
C++
261
star
92

tensorrt-laboratory

Explore the Capabilities of the TensorRT Platform
C++
259
star
93

NeMo-Megatron-Launcher

NeMo Megatron launcher and tools
Python
258
star
94

DCGM

NVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUs
C++
256
star
95

egl-wayland

The EGLStream-based Wayland external platform
C
234
star
96

radtts

Provides training, inference and voice conversion recipes for RADTTS and RADTTS++: Flow-based TTS models with Robust Alignment Learning, Diverse Synthesis, and Generative Modeling and Fine-Grained Control over of Low Dimensional (F0 and Energy) Speech Attributes.
Roff
230
star
97

gpu-feature-discovery

GPU plugin to the node feature discovery for Kubernetes
Go
228
star
98

VisRTX

NVIDIA RTX based implementation of ANARI
C++
220
star
99

Falcor

Real-time rendering research framework
219
star
100

transformer-ls

Official PyTorch Implementation of Long-Short Transformer (NeurIPS 2021).
Python
212
star