• Stars
    star
    1,275
  • Rank 36,921 (Top 0.8 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created over 3 years ago
  • Updated over 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Several simple examples for popular neural network toolkits calling custom CUDA operators.

Neural Network CUDA Example

logo

Several simple examples for neural network toolkits (PyTorch, TensorFlow, etc.) calling custom CUDA operators.

We provide several ways to compile the CUDA kernels and their cpp wrappers, including jit, setuptools and cmake.

We also provide several python codes to call the CUDA kernels, including kernel time statistics and model training.

For more accurate time statistics, you'd best use nvprof or nsys to run the code.

Environments

  • NVIDIA Driver: 418.116.00
  • CUDA: 11.0
  • Python: 3.7.3
  • PyTorch: 1.7.0+cu110
  • TensorFlow: 2.4.1
  • CMake: 3.16.3
  • Ninja: 1.10.0
  • GCC: 8.3.0

Cannot ensure successful running in other environments.

Code structure

โ”œโ”€โ”€ include
โ”‚   โ””โ”€โ”€ add2.h # header file of add2 cuda kernel
โ”œโ”€โ”€ kernel
โ”‚   โ””โ”€โ”€ add2_kernel.cu # add2 cuda kernel
โ”œโ”€โ”€ pytorch
โ”‚   โ”œโ”€โ”€ add2_ops.cpp # torch wrapper of add2 cuda kernel
โ”‚   โ”œโ”€โ”€ time.py # time comparison of cuda kernel and torch
โ”‚   โ”œโ”€โ”€ train.py # training using custom cuda kernel
โ”‚   โ”œโ”€โ”€ setup.py
โ”‚   โ””โ”€โ”€ CMakeLists.txt
โ”œโ”€โ”€ tensorflow
โ”‚   โ”œโ”€โ”€ add2_ops.cpp # tensorflow wrapper of add2 cuda kernel
โ”‚   โ”œโ”€โ”€ time.py # time comparison of cuda kernel and tensorflow
โ”‚   โ”œโ”€โ”€ train.py # training using custom cuda kernel
โ”‚   โ””โ”€โ”€ CMakeLists.txt
โ”œโ”€โ”€ LICENSE
โ””โ”€โ”€ README.md

PyTorch

Compile cpp and cuda

JIT
Directly run the python code.

Setuptools

python3 pytorch/setup.py install

CMake

mkdir build
cd build
cmake ../pytorch
make

Run python

Compare kernel running time

python3 pytorch/time.py --compiler jit
python3 pytorch/time.py --compiler setup
python3 pytorch/time.py --compiler cmake

Train model

python3 pytorch/train.py --compiler jit
python3 pytorch/train.py --compiler setup
python3 pytorch/train.py --compiler cmake

TensorFlow

Compile cpp and cuda

CMake

mkdir build
cd build
cmake ../tensorflow
make

Run python

Compare kernel running time

python3 tensorflow/time.py --compiler cmake

Train model

python3 tensorflow/train.py --compiler cmake

Implementation details (in Chinese)

PyTorch่‡ชๅฎšไน‰CUDA็ฎ—ๅญๆ•™็จ‹ไธŽ่ฟ่กŒๆ—ถ้—ดๅˆ†ๆž
่ฏฆ่งฃPyTorch็ผ–่ฏ‘ๅนถ่ฐƒ็”จ่‡ชๅฎšไน‰CUDA็ฎ—ๅญ็š„ไธ‰็งๆ–นๅผ
ไธ‰ๅˆ†้’Ÿๆ•™ไฝ ๅฆ‚ไฝ•PyTorch่‡ชๅฎšไน‰ๅๅ‘ไผ ๆ’ญ

F.A.Q

Q. ImportError: libc10.so: cannot open shared object file: No such file or directory
A. You must do import torch before import add2.

Q. tensorflow.python.framework.errors_impl.NotFoundError: build/libadd2.so: undefined symbol: _ZTIN10tensorflow8OpKernelE
A. Check if ${TF_LFLAGS} in CmakeLists.txt is correct.