• This repository has been archived on 25/Sep/2023
  • Stars
    star
    703
  • Rank 62,044 (Top 2 %)
  • Language
    Python
  • License
    Other
  • Created over 4 years ago
  • Updated 8 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

cuSignal - RAPIDS Signal Processing Library

ย cuSignal

cuSignal is a GPU-accelerated signal processing library in Python that is both based on and extends the SciPy Signal API. Notably, cuSignal:

  • Delivers orders-of-magnitude speedups over CPU with a familiar API
  • Supports a zero-copy connection to popular Deep Learning frameworks like PyTorch, Tensorflow, and Jax
  • Runs on any CUDA-capable GPU of Maxwell architecture or newer, including the Jetson Nano
  • Optimizes streaming, real-time applications via zero-copy memory buffer between CPU and GPU
  • Is fully built within the GPU Python Ecosystem, where both core functionality and optimized kernels are dependent on the CuPy and Numba projects

If you're interested in the above concepts but prefer to program in C++ rather than Python, please consider MatX. MatX is an efficient C++17 GPU Numerical Computing library with a Pythonic Syntax.

Table of Contents

Quick Start

A polyphase resampler changes the sample rate of an incoming signal while using polyphase filter banks to preserve the overall shape of the original signal. The following example shows how cuSignal serves as a drop-in replacement for SciPy Signal's polyphase resampler and how cuSignal interacts with data generated on GPU with CuPy, a drop-in replacement for the numerical computing library NumPy.

Scipy Signal and NumPy (CPU)

import numpy as np
from scipy import signal

start = 0
stop = 10
num_samps = int(1e8)
resample_up = 2
resample_down = 3

cx = np.linspace(start, stop, num_samps, endpoint=False)
cy = np.cos(-cx**2/6.0)

%%timeit
cf = signal.resample_poly(cy, resample_up, resample_down, window=('kaiser', 0.5))

This code executes on 2x Xeon E5-2600 in 2.36 sec.

cuSignal and CuPy (GPU)

import cupy as cp
import cusignal

start = 0
stop = 10
num_samps = int(1e8)
resample_up = 2
resample_down = 3

gx = cp.linspace(start, stop, num_samps, endpoint=False)
gy = cp.cos(-gx**2/6.0)

%%timeit
gf = cusignal.resample_poly(gy, resample_up, resample_down, window=('kaiser', 0.5))

This code executes on an NVIDIA V100 in 13.8 ms, a 170x increase over SciPy Signal. On an A100, this same code completes in 4.69 ms; 500x faster than CPU.

Next, we'll show that cuSignal can be used to access data that isn't explicitly generated on GPU. In this case, we use cusignal.get_shared_mem to allocate a buffer of memory that's been addressed by both the GPU and CPU. This process allows cuSignal to process data online.

cuSignal with Data Generated on the CPU with Mapped, Pinned (zero-copy) Memory

import cupy as cp
import numpy as np
import cusignal

start = 0
stop = 10
num_samps = int(1e8)
resample_up = 2
resample_down = 3

# Generate Data on CPU with NumPy
cx = np.linspace(start, stop, num_samps, endpoint=False)
cy = np.cos(-cx**2/6.0)

# Create shared memory between CPU and GPU and load with CPU signal (cy)
gpu_signal = cusignal.get_shared_mem(num_samps, dtype=np.float64)

%%time
# Move data to GPU/CPU shared buffer and run polyphase resampler
gpu_signal[:] = cy
gf = cusignal.resample_poly(gpu_signal, resample_up, resample_down, window=('kaiser', 0.5))

This code executes on an NVIDIA V100 in 174 ms.

Finally, the example below shows that cuSignal can access data that's been generated elsewhere and moved to the GPU via cp.asarray. While this approach is fine for prototyping and algorithm development, it should be avoided for online signal processing.

cuSignal with Data Generated on the CPU and Copied to GPU

import cupy as cp
import numpy as np
import cusignal

start = 0
stop = 10
num_samps = int(1e8)
resample_up = 2
resample_down = 3

# Generate Data on CPU
cx = np.linspace(start, stop, num_samps, endpoint=False)
cy = np.cos(-cx**2/6.0)

%%time
gf = cusignal.resample_poly(cp.asarray(cy), resample_up, resample_down, window=('kaiser', 0.5))

This code executes on an NVIDIA V100 in 637 ms.

Installation

cuSignal has been tested on and supports all modern GPUs - from Maxwell to Ampere. While Anaconda is the preferred installation mechanism for cuSignal, developers and Jetson users should follow the source build instructions below; there isn't presently a conda aarch64 package for cuSignal.

Conda, Linux OS (Preferred)

cuSignal can be installed with (Miniconda or the full Anaconda distribution) from the rapidsai channel. If you're using a Jetson GPU, please follow the build instructions below

conda install -c rapidsai -c conda-forge -c nvidia \
    cusignal

# To specify a certain CUDA or Python version (e.g. 11.8 and 3.9, respectively)
conda install -c rapidsai -c conda-forge -c nvidia \
    cusignal python=3.9 cudatoolkit=11.8

For the nightly verison of cusignal, which includes pre-release features:

conda install -c rapidsai-nightly -c conda-forge -c nvidia \
    cusignal

# To specify a certain CUDA or Python version (e.g. 11.8 and 3.9, respectively)
conda install -c rapidsai-nightly -c conda-forge -c nvidia \
    cusignal python=3.9 cudatoolkit=11.8

While only CUDA versions >= 11.2 are officially supported, cuSignal has been confirmed to work with CUDA version 10.2 and above. If you run into any issues with the conda install, please follow the source installation instructions, below.

For more OS and version information, please visit the RAPIDS version picker.

Source, aarch64 (Jetson Nano, TK1, TX2, Xavier, AGX Clara DevKit), Linux OS

Since the Jetson platform is based on the arm chipset, we need to use an aarch64 supported Anaconda environment. While there are multiple options here, we recommend miniforge. Further, it's assumed that your Jetson device is running a current (>= 4.3) edition of JetPack and contains the CUDA Toolkit.

Please note, prior to installing cuSignal, ensure that your PATH environment variables are set to find the CUDA Toolkit. This can be done with:

export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
  1. Clone the cuSignal repository

    # Set the location to cuSignal in an environment variable CUSIGNAL_HOME
    export CUSIGNAL_HOME=$(pwd)/cusignal
    
    # Download the cuSignal repo
    git clone https://github.com/rapidsai/cusignal.git $CUSIGNAL_HOME
  2. Install miniforge and create the cuSignal conda environment:

    cd $CUSIGNAL_HOME
    conda env create -f conda/environments/cusignal_jetson_base.yml

    Note: Compilation and installation of CuPy can be quite lengthy (~30+ mins), particularly on the Jetson Nano. Please consider setting the CUPY_NVCC_GENERATE_CODE environment variable to decrease the CuPy dependency install time:

    export CUPY_NVCC_GENERATE_CODE="arch=compute_XX,code=sm_XX"

    where XX is your GPU's compute capability. If you'd like to compile to multiple architectures (e.g Nano and Xavier), concatenate the arch=... string with semicolins.

  3. Activate created conda environment

    conda activate cusignal-dev

  4. Install cuSignal module

    cd $CUSIGNAL_HOME
    ./build.sh  # install cuSignal to $PREFIX if set, otherwise $CONDA_PREFIX
                # run ./build.sh -h to print the supported command line options.
  5. Once installed, periodically update environment

    cd $CUSIGNAL_HOME
    conda env update -f conda/environments/cusignal_jetson_base.yml
  6. Optional: Confirm unit testing via PyTest

    cd $CUSIGNAL_HOME/python
    pytest -v  # for verbose mode
    pytest -v -k <function name>  # for more select testing

Source, Linux OS

  1. Clone the cuSignal repository

    # Set the location to cuSignal in an environment variable CUSIGNAL_HOME
    export CUSIGNAL_HOME=$(pwd)/cusignal
    
    # Download the cuSignal repo
    git clone https://github.com/rapidsai/cusignal.git $CUSIGNAL_HOME
  2. Download and install Anaconda or Miniconda then create the cuSignal conda environment:

    Base environment (core dependencies for cuSignal)

    cd $CUSIGNAL_HOME
    conda env create -f conda/environments/cusignal_base.yml

    Full environment (including RAPIDS's cuDF, cuML, cuGraph, and PyTorch)

    cd $CUSIGNAL_HOME
    conda env create -f conda/environments/cusignal_full.yml
  3. Activate created conda environment

    conda activate cusignal-dev

  4. Install cuSignal module

    cd $CUSIGNAL_HOME
    ./build.sh  # install cuSignal to $PREFIX if set, otherwise $CONDA_PREFIX
                # run ./build.sh -h to print the supported command line options.
  5. Once installed, periodically update environment

    cd $CUSIGNAL_HOME
    conda env update -f conda/environments/cusignal_base.yml
  6. Optional: Confirm unit testing via PyTest

    cd $CUSIGNAL_HOME/python
    pytest -v  # for verbose mode
    pytest -v -k <function name>  # for more select testing

Source, Windows OS

We have confirmed that cuSignal successfully builds and runs on Windows by using CUDA on WSL. Please follow the instructions in the link to install WSL 2 and the associated CUDA drivers. You can then proceed to follow the cuSignal source build instructions, below.

  1. Download and install Anaconda for Windows. In an Anaconda Prompt, navigate to your checkout of cuSignal.

  2. Create cuSignal conda environment

    conda create --name cusignal-dev

  3. Activate conda environment

    conda activate cusignal-dev

  4. Install cuSignal Core Dependencies

    conda install numpy numba scipy cudatoolkit pip
    pip install cupy-cudaXXX
    

    Where XXX is the version of the CUDA toolkit you have installed. 11.5, for example is cupy-cuda115. See the CuPy Documentation for information on getting wheels for other versions of CUDA.

  5. Install cuSignal module

    ./build.sh
    
  6. Optional: Confirm unit testing via PyTest In the cuSignal top level directory:

    pip install pytest pytest-benchmark
    pytest
    

Docker - All RAPIDS Libraries, including cuSignal

cuSignal is part of the general RAPIDS docker container but can also be built using the included Dockerfile and the below instructions to build and run the container. Please note, <image> and <tag> are user specified, for example docker build -t cusignal:cusignal-22.12 docker/..

docker build -t <image>:<tag> docker/.
docker run --gpus all --rm -it <image>:<tag> /bin/bash

Please see the RAPIDS Release Selector for more information on supported Python, Linux, and CUDA versions and for the specific command to pull the generic RAPIDS container.

Documentation

The complete cuSignal API documentation including a complete list of functionality and examples can be found for both the Stable and Nightly (Experimental) releases. cuSignal has about 75% coverage of the SciPy Signal API and includes added functionality, particularly for phased array systems and speech analysis. Please search the documentation for your function of interest and file an issue if you see a gap.

cuSignal (Stable) | cuSignal (Nightly)

Notebooks and Examples

cuSignal strives for 100% coverage between features and notebook examples. While we stress GPU performance, our guiding phisolophy is based on user productivity, and it's always such a bummer when you can't quickly figure out how to use exciting new features.

Core API examples are shown in the api_guide of our Notebooks folder. We also provide some example online and offline streaming software-defined radio examples in the srd part of the Notebooks. See SDR Integration for more information, too.

In addition to learning about how the API works, these notebooks provide rough benchmarking metrics for user-defined parameters like window length, signal size, and datatype.

SDR Integration

SoapySDR is a "vendor neutral and platform independent" library for software-defined radio usage. When used in conjunction with device (SDR) specific modules, SoapySDR allows for easy command-and-control of radios from Python or C++. To install SoapySDR into an existing cuSignal Conda environment, run:

conda install -c conda-forge soapysdr

A full list of subsequent modules, specific to your SDR are listed here, but some common ones:

  • rtlsdr: conda install -c conda-forge soapysdr-module-rtlsdr
  • Pluto SDR: conda install -c conda-forge soapysdr-module-plutosdr
  • UHD: conda install -c conda-forge soapysdr-module-uhd

Another popular SDR library, specific to the rtl-sdr, is pyrtlsdr.

For examples using SoapySDR, pyrtlsdr, and cuSignal, please see the notebooks/sdr directory.

Please note, for most rtlsdr devices, you'll need to blacklist the libdvb driver in Linux. To do this, run sudo vi /etc/modprobe.d/blacklist.conf and add blacklist dvb_usb_rtl28xxu to the end of the file. Restart your computer upon completion.

If you have a SDR that isn't listed above (like the LimeSDR), don't worry! You can symbolically link the system-wide Python bindings installed via apt-get to the local conda environment. Further, check conda-forge for any packages before installing something from source. Please file an issue if you run into any problems.

Benchmarking

cuSignal uses pytest-benchmark to compare performance between CPU and GPU signal processing implementations. To run cuSignal's benchmark suite, navigate to the topmost python directory ($CUSIGNAL_HOME/python) and run:

pytest --benchmark-enable --benchmark-gpu-disable

Benchmarks are disabled by default in setup.cfg providing only test correctness checks.

As with the standard pytest tool, the user can use the -v and -k flags for verbose mode and to select a specific benchmark to run. When intrepreting the output, we recommend comparing the mean execution time reported.

To reduce columns in benchmark result's table, add --benchmark-columns=LABELS, like --benchmark-columns=min,max,mean. For more information on pytest-benchmark please visit the Usage Guide.

Parameter --benchmark-gpu-disable is to disable memory checks from Rapids GPU benchmark tool. Doing so speeds up benchmarking.

If you wish to skip benchmarks of SciPy functions add -m "not cpu"

Lastly, benchmarks will be executed on local files. Therefore to test recent changes made to source, rebuild cuSignal.

Example

pytest -k upfirdn2d -m "not cpu" --benchmark-enable --benchmark-gpu-disable --benchmark-columns=mean

Output

cusignal/test/test_filtering.py ..................                                                                                                                                                                                                                                   [100%]


---------- benchmark 'UpFirDn2d': 18 tests -----------
Name (time in us, mem in bytes)         Mean
------------------------------------------------------
test_upfirdn2d_gpu[-1-1-3-256]      195.2299 (1.0)
test_upfirdn2d_gpu[-1-9-3-256]      196.1766 (1.00)
test_upfirdn2d_gpu[-1-1-7-256]      196.2881 (1.01)
test_upfirdn2d_gpu[0-2-3-256]       196.9984 (1.01)
test_upfirdn2d_gpu[0-9-3-256]       197.5675 (1.01)
test_upfirdn2d_gpu[0-1-7-256]       197.9015 (1.01)
test_upfirdn2d_gpu[-1-9-7-256]      198.0923 (1.01)
test_upfirdn2d_gpu[-1-2-7-256]      198.3325 (1.02)
test_upfirdn2d_gpu[0-2-7-256]       198.4676 (1.02)
test_upfirdn2d_gpu[0-9-7-256]       198.6437 (1.02)
test_upfirdn2d_gpu[0-1-3-256]       198.7477 (1.02)
test_upfirdn2d_gpu[-1-2-3-256]      200.1589 (1.03)
test_upfirdn2d_gpu[-1-2-2-256]      213.0316 (1.09)
test_upfirdn2d_gpu[0-1-2-256]       213.0944 (1.09)
test_upfirdn2d_gpu[-1-9-2-256]      214.6168 (1.10)
test_upfirdn2d_gpu[0-2-2-256]       214.6975 (1.10)
test_upfirdn2d_gpu[-1-1-2-256]      216.4033 (1.11)
test_upfirdn2d_gpu[0-9-2-256]       217.1675 (1.11)
------------------------------------------------------

Contributing Guide

Review the CONTRIBUTING.md file for information on how to contribute code and issues to the project. The TL;DR, as applicable to cuSignal, is to fork our repository to your own project space, implement a feature, and submit a PR against cuSignal's main branch from your fork.

If you notice something broken with cuSignal or have a feature request -- whether for a new function to be added or for additional performance, please file an issue. We love to hear feedback, whether positive or negative.

cuSignal Blogs and Talks

More Repositories

1

cudf

cuDF - GPU DataFrame Library
C++
7,248
star
2

cuml

cuML - RAPIDS Machine Learning Library
C++
3,864
star
3

cugraph

cuGraph - RAPIDS Graph Analytics Library
Cuda
1,559
star
4

raft

RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing high performance applications.
Cuda
586
star
5

notebooks

RAPIDS Sample Notebooks
Shell
577
star
6

jupyterlab-nvdashboard

A JupyterLab extension for displaying dashboards of GPU usage.
TypeScript
548
star
7

cuspatial

CUDA-accelerated GIS and spatiotemporal algorithms
Jupyter Notebook
543
star
8

rmm

RAPIDS Memory Manager
C++
420
star
9

deeplearning

Jupyter Notebook
336
star
10

cucim

cuCIM - RAPIDS GPU-accelerated image processing library
Jupyter Notebook
302
star
11

dask-cuda

Utilities for Dask and CUDA interactions
Python
266
star
12

cuxfilter

GPU accelerated cross filtering with cuDF.
Python
261
star
13

node

GPU-accelerated data science and visualization in node
TypeScript
170
star
14

clx

A collection of RAPIDS examples for security analysts, data scientists, and engineers to quickly get started applying RAPIDS and GPU acceleration to real-world cybersecurity use cases.
Jupyter Notebook
167
star
15

libgdf

[ARCHIVED] C GPU DataFrame Library
Cuda
138
star
16

dask-cudf

[ARCHIVED] Dask support for distributed GDF object --> Moved to cudf
Python
136
star
17

cloud-ml-examples

A collection of Machine Learning examples to get started with deploying RAPIDS in the Cloud
Jupyter Notebook
134
star
18

ucx-py

Python bindings for UCX
Python
112
star
19

gpu-bdb

RAPIDS GPU-BDB
Python
103
star
20

kvikio

KvikIO - High Performance File IO
Python
100
star
21

plotly-dash-rapids-census-demo

Jupyter Notebook
92
star
22

gputreeshap

C++
83
star
23

frigate

Frigate is a tool for automatically generating documentation for your Helm charts
Python
76
star
24

wholegraph

WholeGraph - large scale Graph Neural Networks
Cuda
75
star
25

spark-examples

[ARCHIVED] Moved to github.com/NVIDIA/spark-xgboost-examples
Jupyter Notebook
70
star
26

docker

Dockerfile templates for creating RAPIDS Docker Images
Shell
62
star
27

cuvs

cuVS - a library for vector search and clustering on the GPU
Jupyter Notebook
57
star
28

custrings

[ARCHIVED] GPU String Manipulation --> Moved to cudf
Cuda
46
star
29

docs

RAPIDS Documentation Site
HTML
34
star
30

cudf-alpha

[ARCHIVED] cuDF [alpha] - RAPIDS Merge of GoAi into cuDF
34
star
31

rapids-examples

Jupyter Notebook
31
star
32

nvgraph

C++
26
star
33

rapids-cmake

CMake
24
star
34

cuhornet

Cuda
24
star
35

cuDataShader

Jupyter Notebook
22
star
36

gpuci-build-environment

Common build environment used by gpuCI for building RAPIDS
Dockerfile
19
star
37

distributed-join

C++
19
star
38

dask-cuml

[ARCHIVED] Dask support for multi-GPU machine learning algorithms --> Moved to cuml
Python
16
star
39

integration

RAPIDS - combined conda package & integration tests for all of RAPIDS libraries
Shell
15
star
40

devcontainers

Shell
15
star
41

xgboost-conda

Conda recipes for xgboost
Jupyter Notebook
12
star
42

ucxx

C++
11
star
43

benchmark

Python
10
star
44

dependency-file-generator

Python
10
star
45

helm-chart

Shell
9
star
46

deployment

RAPIDS Deployment Documentation
Jupyter Notebook
9
star
47

miniforge-cuda

Dockerfile
9
star
48

asvdb

Python
8
star
49

ci-imgs

Dockerfile
7
star
50

dask-cugraph

Python
7
star
51

rapids.ai

rapids.ai web site
HTML
7
star
52

ptxcompiler

Python
6
star
53

GaaS

Python
5
star
54

rvc

Go
4
star
55

scikit-learn-nv

Python
4
star
56

ops-bot

A Probot application used by the Ops team for automation.
TypeScript
4
star
57

workflows

Shell
4
star
58

rapids-triton

C++
4
star
59

dask-build-environment

Build environments for various dask related projects on gpuCI
Dockerfile
3
star
60

roc

GitHub utilities for the RAPIDS Ops team
Go
3
star
61

multi-gpu-tools

Shell
3
star
62

detect-weak-linking

Python
3
star
63

dask-cuda-benchmarks

Python
2
star
64

rapids_triton_pca_example

C++
2
star
65

shared-workflows

Reusable GitHub Actions workflows for RAPIDS CI
Shell
2
star
66

dgl-cugraph-build-environment

Dockerfile
2
star
67

cugunrock

Cuda
2
star
68

projects

Jupyter Notebook
2
star
69

gpuci-mgmt

Mangement scripts for gpuCI
Shell
1
star
70

ansible-roles

1
star
71

code-share

C++
1
star
72

build-metrics-reporter

Python
1
star
73

cibuildwheel-imgs

Dockerfile
1
star
74

gpuci-tools

User tools for use within the gpuCI environment
Shell
1
star
75

pynvjitlink

Python
1
star
76

rapids-dask-dependency

Shell
1
star
77

sphinx-theme

This repository contains a Sphinx theme used for RAPIDS documentation
CSS
1
star