• Stars
    star
    274
  • Rank 150,274 (Top 3 %)
  • Language
    Python
  • License
    MIT License
  • Created over 7 years ago
  • Updated almost 7 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

CuPy fused PyTorch neural networks ops

PyINN

CuPy implementations of fused PyTorch ops.

PyTorch version of imagine-nn

The purpose of this package is to contain CUDA ops written in Python with CuPy, which is not a PyTorch dependency.

An alternative to CuPy would be https://github.com/pytorch/extension-ffi, but it requires a lot of wrapping code like https://github.com/sniklaus/pytorch-extension, so doesn't really work with quick prototyping.

Another advantage of CuPy over C code is that dimensions of each op are known at JIT-ing time, and compiled kernels potentially can be faster. Also, the first version of the package was in PyCUDA, but it can't work with PyTorch multi-GPU.

~~On Maxwell Titan X pyinn.conv2d_depthwise MobileNets are ~2.6x faster than F.conv2d~~ benchmark.py

No longer the case - with new kernels PyTorch 0.3.0 is now ~20% faster than pyinn.

Installation

pip install git+https://github.com/szagoruyko/pyinn.git@master

Example

import torch
from torch.autograd import Variable
import pyinn as P
x = Variable(torch.randn(1,4,5,5).cuda())
w = Variable(torch.randn(4,1,3,3).cuda())
y = P.conv2d_depthwise(x, w, padding=1)

or with modules interface:

from pyinn.modules import Conv2dDepthwise
module = Conv2dDepthwise(channels=4, kernel_size=3, padding=1).cuda()
y = module(x)

Documentation

conv2d_depthwise

Implements depthwise convolution as in https://arxiv.org/abs/1704.04861 MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

CUDA kernels from BVLC/caffe#5665

CPU side is done by F.conv2d.

Equivalent to:

F.conv2d(input, weight, groups=input.size(1))

Inputs and arguments are the same with F.conv2d

dgmm

Multiplication with a diagonal matrix.

Used CUDA dgmm function, sometimes is faster than expand.

In torch functions does input.mm(x.diag()). Both left and right mutliplications are supported.

Args: input: 2D tensor x: 1D tensor

cdgmm

Complex multiplication with a diagonal matrix.

Does input.mm(x.diag()) where input and x are complex.

Args: input: 3D tensor with last dimension of size 2 x: 2D tensor with last dimension of size 2

NCReLU

Applies NCReLU (negative concatenated ReLU) nonlinearity.

Does torch.cat([x.clamp(min=0), x.clamp(max=0)], dim=1) in a single fused op.

Used in https://arxiv.org/abs/1706.00388 DiracNets: Training Very Deep Neural Networks Without Skip-Connections

Args: input: 4D tensor

im2col and col2im

Rearrange image blocks into columns.

The representation is used to perform GEMM-based convolution.

Output is 5D (or 6D in case of minibatch) tensor.

Minibatch implementation is inefficient, and could be done in a single CUDA kernel.

More Repositories

1

pytorchviz

A small package to create visualizations of PyTorch execution graphs
Jupyter Notebook
3,180
star
2

attention-transfer

Improving Convolutional Networks via Attention Transfer (ICLR 2017)
Jupyter Notebook
1,439
star
3

wide-residual-networks

3.8% and 18.3% on CIFAR-10 and CIFAR-100
Lua
1,297
star
4

diracnets

Training Very Deep Neural Networks Without Skip-Connections
Jupyter Notebook
586
star
5

functional-zoo

PyTorch and Tensorflow functional model definitions
Jupyter Notebook
586
star
6

loadcaffe

Load Caffe networks in Torch7
Protocol Buffer
494
star
7

cvpr15deepcompare

Code and models for "Learning to Compare Image Patches via Convolutional Neural Networks"
C++
467
star
8

cifar.torch

92.45% on CIFAR-10 in Torch
Lua
174
star
9

torch-opencv-demos

Torch7+OpenCV+ConvNets
Lua
167
star
10

binary-wide-resnet

PyTorch implementation of Wide Residual Networks with 1-bit weights by McDonnell (ICLR 2018)
Python
124
star
11

imagine-nn

IMAGINE torch neural network routines
Lua
109
star
12

torch-caffe-binding

Use Caffe in Torch7
C++
64
star
13

imagenet-validation.torch

Fast and easy testing of imagenet models
Lua
49
star
14

neural-style-autograd

autograd version of https://github.com/jcjohnson/neural-style
Lua
44
star
15

cunnproduction

easy embeddable Torch7 networks
C++
35
star
16

nnpack.torch

Torch FFI-bindings for NNPACK
Lua
30
star
17

iterm.torch

Display images directly in iTerm2
Lua
28
star
18

openai-gemm.pytorch

PyTorch bindings for openai-gemm
Python
20
star
19

fastrcnn-models.torch

Fast-RCNN models in Torch-7 format
18
star
20

cutorch-rtc

lua apply function for cutorch
Lua
17
star
21

idiap-tutorials

Jupyter Notebook
16
star
22

functional-style-transfer

minimal implementation of style transfer
Jupyter Notebook
10
star
23

nvrtc.torch

Torch7 bindings for CUDA NVRTC (runtime compilation) library
Lua
9
star
24

imi-demos

live convolutional neural networks demos
Python
9
star
25

cunn-rtc

Runtime compiled Torch cunn modules
Lua
8
star
26

clipp.torch

Torch interface to OpenCLIPP
C++
6
star
27

examples

Python
5
star
28

libclsvm

OpenCL optimized SVM library
C++
2
star
29

infimnist.torch

Torch7 InfiMNIST ffi binding
C
1
star