• Stars
    star
    317
  • Rank 132,216 (Top 3 %)
  • Language
    Shell
  • License
    Apache License 2.0
  • Created almost 5 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

NVIDIA Data Science stack tools

NVIDIA Data Science Stack

NVIDIA Data Science Stack is a tool to make it easy to setup a machine and manage the software stacks for GPU accelerated Data Science. This includes laptops, desktops, workstations, and cloud virtual machines.

Users can work with containers, or in a local environment.

Contents

Quick Start

For usage and command documentation: ./data-science-stack help at any time.

_Note: The script is designed to run as the user, and ask for sudo password when needed. Do not run it with sudo ...

On Ubuntu 18.04, 20.04, or Red Hat Enterprise Linux (RHEL) 8.x:

git clone https://github.com/NVIDIA/data-science-stack
cd data-science-stack
./data-science-stack setup-system

On RHEL Workstation 7.x:

git clone https://github.com/NVIDIA/data-science-stack
cd data-science-stack
./data-science-stack setup-system
# script will stop, manually install driver ... (instructions below)
./data-science-stack setup-system

On Windows Subsystem for Linux (WSL): Note: This functionality is alpha only (and containers only) until WSL v2 becomes production ready Follow the install instructions to install WSL v2 with CUDA support. Then, create a a Ubuntu or RHEL VM, open a terminal, and follow OS-specific instructions above.

Next, users have a choice to use containers or a local Conda environment:

Option 1 - In a Container (Recommended for container users)

./data-science-stack list
./data-science-stack build-container
./data-science-stack run-container

This creates and runs Jupyter in the container. Users can then connect with the Jupyter notebook running at http://localhost:8888/ Control-C to exit.

To mount data or code into your container, see How do I mount data into containers? below.

The reverse of build-container is purge-container.

For information about Docker refer to https://docs.docker.com/

Option 2 - In a Local Conda Environment (Recommended for initial development work)

./data-science-stack list
./data-science-stack build-conda-env
./data-science-stack run-jupyter

This creates the local environment and runs Jupyter. Users can then connect with the Jupyter notebook at the address and token output by Jupyter. Control-C to exit.

The reverse of build-conda-env is purge-conda-env.

For information about Conda environments refer to https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html

Multiple Users

To setup multiple users on the machine, they will need to get access to Docker and setup Conda in the account

# As the additional user
./data-science-stack setup-user
# ... use container or conda commands above

Upgrading

The script is designed to detect old versions of dependencies and upgrade them, and create new environments/containers.

To upgrade automatically:

./data-science-stack upgrade

If a newer version of data science stack is available, the script will retrieve it and perform the upgrade.

To upgrade manually, get the new version of the script and environment configs with git pull or with a new release .zip, and run the install steps again - most likely setup-system and one of the build-... commands.

New environments and containers will be tagged with the version of the script, so the old ones will not be modified.

Environments and containers are large, to clean up old ones use:

  • Containers - docker images and docker rmi ...
  • Local Conda environments - conda env list and conda env remove ...

Testing

Once Jupyter is up and running (with run-container or run-jupyter) navigate in the left panel to any of the sample notebooks and run them. The sample notebooks come from the RAPIDS notebooks repo https://github.com/rapidsai/notebooks

From the command line in your environment, or inside the container, the run-notebook <notebook-file> command can also be used. Expect warnings since the notebooks can depend on functions only available when using Jupyter's web UI.

Local Tools

Version 2.7.0 introduced the install-tools command (paired with purge-tools), which extends the functionality of the stack. Currently, the list includes:

  • jupyter-repo2docker Point it to a github repository and it will create a docker container, and launch a jupyter notebook inside it
  • Nvidia GPU Cloud CLI This is perhaps the easiest way to interact with Nvidia assets
  • Kaggle CLI Allows users to sync up and manage Kaggle kernels, datasets, etc. locally
  • AWS CLI Allows users to remotely manage resources in AWS. The stack supports it via docker, so make sure you have docker installed.

Creating Custom Stacks

Creating custom environments is covered in the Custom Data Science Stack Environments README.

Minimum Hardware and Software

  • NVIDIA GPU - Pascal, Volta, or Turing family GPU(s) including:
    • Quadro P, GV, and RTX series
    • Tesla P, V and T series
    • GeForce 10xx and 20xx
  • Operating System:
    • Ubuntu 18.04 or 20.04
    • Red Hat Enterprise Linux Workstation 7.5+ or 8.0+ (requires license)
    • Other Linux distributions are NOT supported, but may work as long as the driver and Docker work.

Operating System Setup

Disable "Secure Boot" in the system BIOS/UEFI before installing Linux.

Ubuntu

The Data Science stacks are supported on Ubuntu LTS 18.04.1+ or 20.04 with the 4.15+ kernel. Ubuntu can be downloaded from https://www.ubuntu.com/download/desktop

Red Hat Enterprise Linux Workstation (RHEL)

The Data Science stacks are supported on Red Hat Enterprise Linux Workstation(RHEL) version 7.5+ or 8.x. The RHEL ISO image can be downloaded with the instructions on: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/installation_guide/chap-download-red-hat-enterprise-linux

Red Hat Subscriptions

A Red Hat subscription will be needed to install and use Red Hat Enterprise Linux. A subscription also lets the system obtain update packages and additional packages for Red Hat Enterprise Linux. Either purchase a subscription or obtain a free evaluation subscription from the Red Hat Software & Download Center - https://access.redhat.com/downloads

Register the system with the Red Hat Customer Portal to complete the initial setup. See the How to Register and Subscribe a system to the Red Hat Customer Portal using Red Hat Subscription-Manager for further information - https://access.redhat.com/solutions/253273

Windows Subsystem for Linux (WSL v2)

Note: This functionality is alpha only (and containers only) until WSL v2 becomes production ready

Follow the install instructions for WSL v2 with CUDA support. Then, create a Ubuntu or RHEL VM, open a terminal, and follow OS-specific instructions above.

_Note: WSL v2 currently requires CUDA 11.0, while data science stack 2.9.0 is based on CUDA 11.2. Therefore, WSL v2 is supported via containers only.

Installing the NVIDIA GPU Driver

It is important that updated NVIDIA drivers are installed on the system. The minimum version of the NVIDIA driver supported is 460.39. More recent drivers may be available, but may not have been tested with the data science stacks.

Ubuntu or RHEL v8.x Driver Install

Driver install for Ubuntu is handled by data-science-stack setup-system so no manual install should be required.

If the driver if too old or the script is having problems, the driver can be removed (this may have side effects, read the warnings) and reinstalled:

./data-science-stack purge-driver
# reboot
./data-science-stack setup-system
# reboot

RHEL v7.x Driver Install

Before attempting to install the driver check that the system does not have /usr/bin/nvidia-uninstall which is left by an old driver .run file. If it exists, run it with sudo /usr/bin/nvidia-uninstall to remove the old driver first.

Install the base dependencies:

./data-science-stack setup-system
# this will stop once prerequisites are installed

Upgrade the kernel and reboot:

sudo yum upgrade -y kernel
sudo reboot

Note: You may find that yum lock was acquired by "PackageKit" process on fresh install. To free the lock, kill the PackageKit process: (/usr/share/PackageKit/helpers/yum/yumBackend.> py)

ps aux | grep yum
kill <PackageKit_ProcessID>

Now you should be able to run yum upgrade kernel

sudo yum install -y kernel-devel kernel-headers gcc dkms acpid libglvnd

Next, disable nouveau and reboot:

sudo cat <<EOF | sudo tee /etc/modprobe.d/blacklist-nouveau.conf
blacklist nouveau
options nouveau modeset=0
EOF

sudo cp /etc/sysconfig/grub /etc/sysconfig/grub.bak
sudo vim /etc/sysconfig/grub

While editing the grub file: Change the line containing GRUB_CMDLINE_LINUX="crashkernel=auto ... quiet" to GRUB_CMDLINE_LINUX="crashkernel=auto ... quiet rd.driver.blacklist=grub.nouveau". Save, and close vim (with ":wq" ).

sudo grub2-mkconfig -o /boot/grub2/grub.cfg
sudo mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r)-nouveau.img
sudo dracut /boot/initramfs-$(uname -r).img $(uname -r)
sudo reboot

Once nouveau has been disabled, change to runlevel 3:

sudo telinit 3

Note: If after runlevel change, the screen is stuck on a blinking cursor, hit Ctrl + Alt + F3

Check that nouveau is not loaded:

lsmod | grep nouveau

Download and install the driver:

# Check for the latest before using - https://www.nvidia.com/Download/index.aspx
wget https://us.download.nvidia.com/XFree86/Linux-x86_64/460.56/NVIDIA-Linux-x86_64-460.56.run
sudo sh ./NVIDIA-Linux-x86_64-460.56.run

Note: In some cases the following prompts will occur:

  • If prompted to add to DKMS select YES.
  • If prompted that the "The distribution-provided pre-install script failed! Are you sure you want to continue", select Continue.
  • If prompted to install the 32-bit compatibility libraries, select YES.
  • If prompted to update or overwrite existing libglvnd installation, select DO NOT Overwrite.

One of the last installation steps will offer to update the X configuration file. Either accept that offer (suggested), edit the X configuration file manually so that the NVIDIA X driver will be used, or run nvidia-xconfig.

Once the NVIDIA driver install has completed, reboot.

sudo reboot

Windows Subsystem for Linux (WSL v2) Driver Install

There is no need to install the driver inside WSL VMs as they use the driver installed in Windows. Data Science Stack scripts will detect WSL and not install the driver again.

Installing NVIDIA Container SELinux Policy

Note: This section is only for systems that will use SELinux AND Containers

NVIDIA publishes an SELinux policy that enables using GPUs within containers on NVIDIA DGX Servers on GitHub at: https://github.com/NVIDIA/dgx-selinux

This policy has been validated on NVIDIA DGX servers running RHEL 7.5 and 7.6. It is expected that users/admins will use the DGX SELinux policy as a reference and will modify it as needed to fit their servers.

Actions performed by the script below:

  • Install the dependencies required to build the DGX SELinux policy
  • Clone the DGX SELinux policy git project
  • << CUSTOMIZE THE POLICY >>
  • Build the SELinux policy
  • Install the SELinux policy

Note: To accommodate SELinux, nvidia-container-selinux is required to allow containers to use NVIDIA GPUs. The --security-opt option in the command sets the label type that is created by the package so that the specified container uses the NVIDIA GPUs. If SELinux is removed or disabled, then the --security-opt option is not needed.

sudo yum install -y git selinux-policy selinux-policy-devel \
  selinux-policy-base libselinux-utils policycoreutils policycoreutils-python
git clone https://github.com/NVIDIA/dgx-selinux.git
cd dgx-selinux/src/nvidia-container-selinux

<<< CUSTOMIZE YOUR SELINUX POLICY >>>

make -f /usr/share/selinux/devel/Makefile
sudo semodule -i nvidia-container.pp
sudo reboot

Note: You may encounter error messages while building the SELinux policy such as “/usr/share/selinux/devel/include/contrib/container.if:33: Error: duplicate definition of container_runtime_exec(). Original definition on 60.. These may be safely ignored if the nvidia-container.pp file was generated, and installed successfully. For reference, see https://bugzilla.redhat.com/show_bug.cgi?id=1567980

Laptop Power and Integrated GPU Configuration

On Laptop systems GPU selection and power settings may need additional configuration.

Note: On some systems, the external display connectors are driven by the NVIDIA GPU, so restricting graphics to the Intel IGP will prevent the use of external displays. If the use of external displays is desired on such a system, the NVIDIA GPU will need to be shared between graphics and compute tasks.

For best performance, it is recommended that X be configured to use the Intel integrated graphics processor (IGP) to drive the display. This allows the full resources of the NVIDIA GPU to be dedicated to running compute workloads. For optimal power savings, it is recommended that the GPU be powered off when not in use.

Intel IGP on Ubuntu or RHEL 8.x

When the NVIDIA driver is installed, Ubuntu and RHEL 8 will automatically configure the NVIDIA GPU to render the desktop environment, and offload the graphics rendered by the NVIDIA GPU for display on the Intel IGP using PRIME display offloading. For systems which drive external displays through the NVIDIA GPU, where use of external displays is desired, no further configuration is needed. For other systems, the X server will require additional configuration in order to dedicate the NVIDIA GPU for compute tasks only.

In order to configure the X server properly, determine the PCI bus ID of the Intel IGP. Run the command:

lspci -d 8086::0300

to list all Intel VGA devices, which should display a line like:

00:02.0 VGA compatible controller: Intel Corporation Device 3e9b (rev 02)

Make a note of the bus ID that appears at the beginning of the line ("00:02.0" in this example), and adapt this bus ID in order to use it in an X configuration file. lspci lists bus IDs using hexadecimal numbers in the form [<domain>:]<bus>:<device>.<function>. On systems where the only PCI domain is domain 0, the domain will typically be omitted. The X configuration file accepts bus IDs using decimal numbers in the form PCI:<bus>[@<domain>]:device:function. If the PCI domain is 0, the domain may be omitted.

As an example, the lspci bus ID of 00:02.0 listed above would be written as PCI:0:2:0 in an X configuration file. As an additional example showing the domain field populated, unique values for each field, and numbers that are different in decimal versus hexadecimal, the lspci bus ID 0010:0f:e.d would be written as PCI:15@16:14:13 in an X configuration file.

Once the correct PCI bus ID is determined, populate the file /etc/X11/xorg.conf with the following contents, creating it if necessary:

Section "Device"
    Identifier "Device0"
    BusID "<correctly formatted PCI bus ID for Intel IGP>"
    Driver "modesetting"
EndSection

Section "Screen"
    Identifier "Screen0"
    Device "Device0"
EndSection

Replace the text <correctly formatted PCI bus ID for Intel IGP> with the bus ID string formed previously.

Note: While Ubuntu provides tools for simplifying switching between the default NVIDIA+Intel PRIME display offloading behavior and an Intel-only configuration, the Intel-only profile prevents the use of the NVIDIA driver for non-graphical purposes in addition to disabling its use for graphics, necessitating the manual X configuration.

Intel IGP on RHEL 7.x

On versions of RHEL before RHEL 8, the X server will be configured to use the Intel IGP only for graphics by default, and no further configuration is needed to ensure that the NVIDIA GPU’s resources remain dedicated for compute purposes. On systems with the external displays driven by the NVIDIA GPU, where use of external displays is desired, PRIME display offloading will need to be manually configured. Manual configuration of PRIME display offloading is beyond the scope of this documentation.

Laptop GPU Power Management

The NVIDIA GPU driver supports runtime Power Management (PM). By default, GPU runtime power management is disabled. To enable GPU runtime PM, please install an NVIDIA PM udev rules file. This udev file:

  • Removes function 2 (USB xHCI Host controller) and function 3 (USB Type-C USCI controller) of the GPU, if present. Linux kernel versions before 5.3 do not have full-fledged support for these functions, which will prevent GPU runtime PM.
  • Sets 'auto' in the sysfs runtime PM entries for function 0 (VGA display controller) and function 1 (Audio controller).

Installing the NVIDIA PM udev rules on laptops

Create the file 80-nvidia-pm.rules with the following contents:

# udev rules for Enabling Runtime Power Management for NVIDIA GPU.
#
# The NVIDIA Turing GPU is a multi-function PCI device
# which has the following four functions:
#
#     Function 0 : VGA display controller
#     Function 1 : Audio controller
#     Function 2 : USB xHCI Host controller
#     Function 3 : USB Type-C USCI controller
#
# The NVIDIA GPU driver only manages function 0.
# The remaining functions are managed by other drivers.
# The drivers for function 2 and function 3 in this kernel version
# lack full support for runtime PM, which prevents proper runtime
# PM functionality for function 0.
#
# This udev rules script will remove these functions during
# boot and will allow runtime PM to work for the GPU. It won't
# impact normal USB functionality, which is managed by the
# integrated USB xHCI Host controller.

# Remove NVIDIA USB xHCI Host Controller devices, if present
ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x0c0330", ATTR{remove}="1"

# Remove NVIDIA USB Type-C UCSI devices, if present
ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x0c8000", ATTR{remove}="1"

# Enable runtime PM for NVIDIA VGA controller devices
ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030000", TEST=="power/control", ATTR{power/control}="auto"

# Enable runtime PM for NVIDIA Audio controller devices
ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x040300", TEST=="power/control", ATTR{power/control}="auto"

Copy the downloaded file to /lib/udev/rules.d/

sudo cp 80-nvidia-pm.rules /lib/udev/rules.d/

Reboot the system

sudo reboot

To check if function 2 and 3 have been removed, run following commands, which should not give any output.

lspci -d '10de::0c03'
lspci -d '10de::0c80'

Uninstalling the NVIDIA PM udev rules on laptops

Remove the NVIDIA PM rules file

sudo rm /lib/udev/rules.d/80-nvidia-pm.rules

Reboot the system

sudo reboot

Troubleshooting and FAQ

The driver does not install correctly

Try using purge-driver followed by install-driver, then check with diagnostics. If the driver was previously installed with a.run file the script will let you know how to remove the old driver.

How much disk space is needed?

About 50GB free should be enough. A lot of space is needed during environment/container creation since Conda has a package cache.

The script is failing after it cannot reach URLs or download files

To setup the Data Science Stack the script needs to update the OS and other installed packages, install software from NVIDIA, setup Docker and pull containers, download Conda packages, clone repos from GitHub, and other tasks. During this process if the network is down, the OS or IT firewalls are blocking any of those hosts errors will occur. Retrying the command will work in most cases after the problem/block is resolved.

How do I mount data into containers?

To mount code or data directories into your running container, add additional -v "/host/path/:/mount/location" parameters to the docker run ... command. The latest command to run the container is displayed by ./data-science-stack run-container when it runs.

For example to mount ~/notebooks and /data directories in as /notebooks and /data volumes the Docker command would begin with

docker run -v ~/notebooks:/notebooks -v /data:/data ...

For information about Docker mounts refer to https://docs.docker.com/storage/bind-mounts/

More Information

More Repositories

1

nvidia-docker

Build and run Docker containers leveraging NVIDIA GPUs
16,896
star
2

open-gpu-kernel-modules

NVIDIA Linux open GPU kernel module source
C
14,997
star
3

DeepLearningExamples

State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
Jupyter Notebook
13,339
star
4

NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Python
12,016
star
5

FastPhotoStyle

Style transfer, deep learning, feature transform
Python
11,020
star
6

TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
C++
10,618
star
7

Megatron-LM

Ongoing research training transformer models at scale
Python
10,332
star
8

TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
C++
8,542
star
9

vid2vid

Pytorch implementation of our method for high-resolution (e.g. 2048x1024) photorealistic video-to-video translation.
Python
8,482
star
10

apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
Python
8,239
star
11

pix2pixHD

Synthesizing and manipulating 2048x1024 images with conditional GANs
Python
6,488
star
12

cuda-samples

Samples for CUDA Developers which demonstrates features in CUDA Toolkit
C
6,119
star
13

cutlass

CUDA Templates for Linear Algebra Subroutines
C++
5,519
star
14

FasterTransformer

Transformer related optimization, including BERT, GPT
C++
5,313
star
15

DALI

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
C++
5,048
star
16

thrust

[ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl
C++
4,914
star
17

tacotron2

Tacotron 2 - PyTorch implementation with faster-than-realtime inference
Jupyter Notebook
4,562
star
18

warp

A Python framework for high performance GPU simulation and graphics
Python
4,206
star
19

DIGITS

Deep Learning GPU Training System
HTML
4,105
star
20

NeMo-Guardrails

NeMo Guardrails is an open-source toolkit for easily adding programmable guardrails to LLM-based conversational systems.
Python
4,064
star
21

nccl

Optimized primitives for collective multi-GPU communication
C++
3,187
star
22

flownet2-pytorch

Pytorch implementation of FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks
Python
2,938
star
23

ChatRTX

A developer reference project for creating Retrieval Augmented Generation (RAG) chatbots on Windows using TensorRT-LLM
TypeScript
2,635
star
24

k8s-device-plugin

NVIDIA device plugin for Kubernetes
Go
2,481
star
25

libcudacxx

[ARCHIVED] The C++ Standard Library for your entire system. See https://github.com/NVIDIA/cccl
C++
2,294
star
26

GenerativeAIExamples

Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.
Python
2,192
star
27

nvidia-container-toolkit

Build and run containers leveraging NVIDIA GPUs
Go
2,171
star
28

waveglow

A Flow-based Generative Network for Speech Synthesis
Python
2,133
star
29

MinkowskiEngine

Minkowski Engine is an auto-diff neural network library for high-dimensional sparse tensors
Python
2,007
star
30

TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
Python
1,917
star
31

Stable-Diffusion-WebUI-TensorRT

TensorRT Extension for Stable Diffusion Web UI
Python
1,886
star
32

semantic-segmentation

Nvidia Semantic Segmentation monorepo
Python
1,763
star
33

gpu-operator

NVIDIA GPU Operator creates/configures/manages GPUs atop Kubernetes
Go
1,735
star
34

cub

[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl
Cuda
1,679
star
35

DeepRecommender

Deep learning for recommender systems
Python
1,662
star
36

stdexec

`std::execution`, the proposed C++ framework for asynchronous and parallel programming.
C++
1,554
star
37

OpenSeq2Seq

Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP
Python
1,511
star
38

CUDALibrarySamples

CUDA Library Samples
Cuda
1,468
star
39

VideoProcessingFramework

Set of Python bindings to C++ libraries which provides full HW acceleration for video decoding, encoding and GPU-accelerated color space and pixel format conversions
C++
1,303
star
40

deepops

Tools for building GPU clusters
Shell
1,252
star
41

open-gpu-doc

Documentation of NVIDIA chip/hardware interfaces
C
1,243
star
42

aistore

AIStore: scalable storage for AI applications
Go
1,233
star
43

Q2RTX

NVIDIA’s implementation of RTX ray-tracing in Quake II
C
1,217
star
44

trt-samples-for-hackathon-cn

Simple samples for TensorRT programming
Python
1,211
star
45

cccl

CUDA Core Compute Libraries
C++
1,200
star
46

MatX

An efficient C++17 GPU numerical computing library with Python-like syntax
C++
1,187
star
47

partialconv

A New Padding Scheme: Partial Convolution based Padding
Python
1,145
star
48

sentiment-discovery

Unsupervised Language Modeling at scale for robust sentiment classification
Python
1,055
star
49

nvidia-container-runtime

NVIDIA container runtime
Makefile
1,035
star
50

modulus

Open-source deep-learning framework for building, training, and fine-tuning deep learning models using state-of-the-art Physics-ML methods
Python
991
star
51

gpu-monitoring-tools

Tools for monitoring NVIDIA GPUs on Linux
C
974
star
52

jetson-gpio

A Python library that enables the use of Jetson's GPIOs
Python
898
star
53

dcgm-exporter

NVIDIA GPU metrics exporter for Prometheus leveraging DCGM
Go
886
star
54

retinanet-examples

Fast and accurate object detection with end-to-end GPU optimization
Python
885
star
55

flowtron

Flowtron is an auto-regressive flow-based generative network for text to speech synthesis with control over speech variation and style transfer
Jupyter Notebook
867
star
56

nccl-tests

NCCL Tests
Cuda
864
star
57

cuda-python

CUDA Python Low-level Bindings
Python
859
star
58

mellotron

Mellotron: a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data
Jupyter Notebook
852
star
59

gdrcopy

A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology
C++
832
star
60

libnvidia-container

NVIDIA container runtime library
C
818
star
61

BigVGAN

Official PyTorch implementation of BigVGAN (ICLR 2023)
Python
806
star
62

spark-rapids

Spark RAPIDS plugin - accelerate Apache Spark with GPUs
Scala
800
star
63

nv-wavenet

Reference implementation of real-time autoregressive wavenet inference
Cuda
728
star
64

DLSS

NVIDIA DLSS is a new and improved deep learning neural network that boosts frame rates and generates beautiful, sharp images for your games
C
727
star
65

tensorflow

An Open Source Machine Learning Framework for Everyone
C++
719
star
66

gvdb-voxels

Sparse volume compute and rendering on NVIDIA GPUs
C
674
star
67

MAXINE-AR-SDK

NVIDIA AR SDK - API headers and sample applications
C
671
star
68

nvvl

A library that uses hardware acceleration to load sequences of video frames to facilitate machine learning training
C++
665
star
69

runx

Deep Learning Experiment Management
Python
633
star
70

NVFlare

NVIDIA Federated Learning Application Runtime Environment
Python
630
star
71

NeMo-Aligner

Scalable toolkit for efficient model alignment
Python
564
star
72

nvcomp

Repository for nvCOMP docs and examples. nvCOMP is a library for fast lossless compression/decompression on the GPU that can be downloaded from https://developer.nvidia.com/nvcomp.
C++
545
star
73

multi-gpu-programming-models

Examples demonstrating available options to program multiple GPUs in a single node or a cluster
Cuda
535
star
74

Dataset_Synthesizer

NVIDIA Deep learning Dataset Synthesizer (NDDS)
C++
530
star
75

TensorRT-Model-Optimizer

TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization, pruning, distillation, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed on NVIDIA GPUs.
Python
513
star
76

jitify

A single-header C++ library for simplifying the use of CUDA Runtime Compilation (NVRTC).
C++
512
star
77

nvbench

CUDA Kernel Benchmarking Library
Cuda
501
star
78

libglvnd

The GL Vendor-Neutral Dispatch library
C
501
star
79

NeMo-Curator

Scalable data pre processing and curation toolkit for LLMs
Jupyter Notebook
500
star
80

cuda-quantum

C++ and Python support for the CUDA Quantum programming model for heterogeneous quantum-classical workflows
C++
496
star
81

AMGX

Distributed multigrid linear solver library on GPU
Cuda
474
star
82

cuCollections

C++
470
star
83

enroot

A simple yet powerful tool to turn traditional container/OS images into unprivileged sandboxes.
Shell
459
star
84

NeMo-Framework-Launcher

Provides end-to-end model development pipelines for LLMs and Multimodal models that can be launched on-prem or cloud-native.
Python
459
star
85

hpc-container-maker

HPC Container Maker
Python
442
star
86

MDL-SDK

NVIDIA Material Definition Language SDK
C++
438
star
87

PyProf

A GPU performance profiling tool for PyTorch models
Python
437
star
88

framework-reproducibility

Providing reproducibility in deep learning frameworks
Python
424
star
89

gpu-rest-engine

A REST API for Caffe using Docker and Go
C++
421
star
90

DCGM

NVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUs
C++
394
star
91

NvPipe

NVIDIA-accelerated zero latency video compression library for interactive remoting applications
Cuda
390
star
92

torch-harmonics

Differentiable signal processing on the sphere for PyTorch
Jupyter Notebook
386
star
93

cuQuantum

Home for cuQuantum Python & NVIDIA cuQuantum SDK C++ samples
Jupyter Notebook
344
star
94

ai-assisted-annotation-client

Client side integration example source code and libraries for AI-Assisted Annotation SDK
C++
308
star
95

video-sdk-samples

Samples demonstrating how to use various APIs of NVIDIA Video Codec SDK
C++
301
star
96

egl-wayland

The EGLStream-based Wayland external platform
C
299
star
97

nvidia-settings

NVIDIA driver control panel
C
292
star
98

NVTX

The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resources in your applications.
C
290
star
99

go-nvml

Go Bindings for the NVIDIA Management Library (NVML)
C
288
star
100

gpu-feature-discovery

GPU plugin to the node feature discovery for Kubernetes
Go
286
star