Neural Network CUDA Example
Several simple examples for neural network toolkits (PyTorch, TensorFlow, etc.) calling custom CUDA operators.
We provide several ways to compile the CUDA kernels and their cpp wrappers, including jit, setuptools and cmake.
We also provide several python codes to call the CUDA kernels, including kernel time statistics and model training.
For more accurate time statistics, you'd best use nvprof or nsys to run the code.
Environments
- NVIDIA Driver: 418.116.00
- CUDA: 11.0
- Python: 3.7.3
- PyTorch: 1.7.0+cu110
- TensorFlow: 2.4.1
- CMake: 3.16.3
- Ninja: 1.10.0
- GCC: 8.3.0
Cannot ensure successful running in other environments.
Code structure
โโโ include
โ โโโ add2.h # header file of add2 cuda kernel
โโโ kernel
โ โโโ add2_kernel.cu # add2 cuda kernel
โโโ pytorch
โ โโโ add2_ops.cpp # torch wrapper of add2 cuda kernel
โ โโโ time.py # time comparison of cuda kernel and torch
โ โโโ train.py # training using custom cuda kernel
โ โโโ setup.py
โ โโโ CMakeLists.txt
โโโ tensorflow
โ โโโ add2_ops.cpp # tensorflow wrapper of add2 cuda kernel
โ โโโ time.py # time comparison of cuda kernel and tensorflow
โ โโโ train.py # training using custom cuda kernel
โ โโโ CMakeLists.txt
โโโ LICENSE
โโโ README.md
PyTorch
Compile cpp and cuda
JIT
Directly run the python code.
Setuptools
python3 pytorch/setup.py install
CMake
mkdir build
cd build
cmake ../pytorch
make
Run python
Compare kernel running time
python3 pytorch/time.py --compiler jit
python3 pytorch/time.py --compiler setup
python3 pytorch/time.py --compiler cmake
Train model
python3 pytorch/train.py --compiler jit
python3 pytorch/train.py --compiler setup
python3 pytorch/train.py --compiler cmake
TensorFlow
Compile cpp and cuda
CMake
mkdir build
cd build
cmake ../tensorflow
make
Run python
Compare kernel running time
python3 tensorflow/time.py --compiler cmake
Train model
python3 tensorflow/train.py --compiler cmake
Implementation details (in Chinese)
PyTorch่ชๅฎไนCUDA็ฎๅญๆ็จไธ่ฟ่กๆถ้ดๅๆ
่ฏฆ่งฃPyTorch็ผ่ฏๅนถ่ฐ็จ่ชๅฎไนCUDA็ฎๅญ็ไธ็งๆนๅผ
ไธๅ้ๆไฝ ๅฆไฝPyTorch่ชๅฎไนๅๅไผ ๆญ
F.A.Q
Q. ImportError: libc10.so: cannot open shared object file: No such file or directory
A. You must doimport torch
beforeimport add2
.
Q. tensorflow.python.framework.errors_impl.NotFoundError: build/libadd2.so: undefined symbol: _ZTIN10tensorflow8OpKernelE
A. Check if${TF_LFLAGS}
inCmakeLists.txt
is correct.