• Stars
    star
    358
  • Rank 118,855 (Top 3 %)
  • Language
    Python
  • Created almost 7 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Convert Caffe models to PyTorch

This converter can be useful for porting Caffe code and layers to PyTorch. Features:

  • dump caffemodel weights to hdf5, npy, pt and json formats
  • load Caffe models and use them from PyTorch
  • mock PyCaffe API to allow for smooth porting of Caffe-using code (drop-in script for OICR for changing backend in train/eval to PyTorch is below):
    • Net, Blob, SGDSolver
  • wrapping Caffe's Python layers (see the OICR example)
  • example of ROI pooling in PyTorch without manual CUDA code compilation (see the OICR example)

The layer support isn't as complete as in https://github.com/marvis/pytorch-caffe. Currently it supports the following Caffe layers:

  • convolution (num_output, kernel_size, stride, pad, dilation; constant and gaussian weight/bias fillers)
  • inner_product (num_output; constant and gaussian weight/bias fillers)
  • max / avg pooling (kernel_size, stride, pad)
  • relu
  • dropout (dropout_ratio)
  • eltwise (prod, sum, max)
  • softmax (axis)
  • local response norm (local_size, alpha, beta)

Dependencies: protobuf with Python bindings, including protoc binary in PATH.

PRs to enable other layers or layer params are very welcome (see the definition of the modules dictionary in the code)!

License is MIT.

Dump weights to PT or HDF5

# prototxt and caffemodel from https://gist.github.com/ksimonyan/211839e770f7b538e2d8#file-readme-md

# dumps to PT by default to VGG_ILSVRC_16_layers.caffemodel.pt
python -m caffemodel2pytorch VGG_ILSVRC_16_layers.caffemodel

# dumps to HDF5 converted.h5
python -m caffemodel2pytorch VGG_ILSVRC_16_layers.caffemodel -o converted.h5
# load dumped VGG16 in PyTorch
import collections, torch, torchvision, numpy, h5py
model = torchvision.models.vgg16()
model.features = torch.nn.Sequential(collections.OrderedDict(zip(['conv1_1', 'relu1_1', 'conv1_2', 'relu1_2', 'pool1', 'conv2_1', 'relu2_1', 'conv2_2', 'relu2_2', 'pool2', 'conv3_1', 'relu3_1', 'conv3_2', 'relu3_2', 'conv3_3', 'relu3_3', 'pool3', 'conv4_1', 'relu4_1', 'conv4_2', 'relu4_2', 'conv4_3', 'relu4_3', 'pool4', 'conv5_1', 'relu5_1', 'conv5_2', 'relu5_2', 'conv5_3', 'relu5_3', 'pool5'], model.features)))
model.classifier = torch.nn.Sequential(collections.OrderedDict(zip(['fc6', 'relu6', 'drop6', 'fc7', 'relu7', 'drop7', 'fc8'], model.classifier)))

state_dict = h5py.File('converted.h5', 'r') # torch.load('VGG_ILSVRC_16_layers.caffemodel.pt')
model.load_state_dict({l : torch.from_numpy(numpy.array(v)).view_as(p) for k, v in state_dict.items() for l, p in model.named_parameters() if k in l})

Run Caffe models using PyTorch as backend

import torch
import caffemodel2pytorch

model = caffemodel2pytorch.Net(
	prototxt = 'VGG_ILSVRC_16_layers_deploy.prototxt',
	weights = 'VGG_ILSVRC_16_layers.caffemodel',
	caffe_proto = 'https://raw.githubusercontent.com/BVLC/caffe/master/src/caffe/proto/caffe.proto'
)
model.cuda()
model.eval()
torch.set_grad_enabled(False)

# make sure to have right procedure of image normalization and channel reordering
image = torch.Tensor(8, 3, 224, 224).cuda()

# outputs dict of PyTorch Variables
# in this example the dict contains the only key "prob"
#output_dict = model(data = image)

# you can remove unneeded layers:
del model.prob
del model.fc8

# a single input variable is interpreted as an input blob named "data"
# in this example the dict contains the only key "fc7"
output_dict = model(image)

Imitate pycaffe interface to help in porting

import numpy as np
import caffemodel2pytorch as caffe

caffe.set_mode_gpu()
caffe.set_device(0)

# === LOADING AND USING THE NET IN EVAL MODE ===

net = caffe.Net('VGG_ILSVRC_16_layers_deploy.prototxt', caffe.TEST, weights = 'VGG_ILSVRC_16_layers.caffemodel')

# outputs a dict of NumPy arrays, data layer is sidestepped
blobs_out = net.forward(data = np.zeros((8, 3, 224, 224), dtype = np.float32))

# access the last layer
layer = net.layers[-1]

# converts and provides the output as NumPy array
numpy_array = net.blobs['conv1_1'].data

# access the loss weights
loss_weights = net.blob_loss_weights

# === BASIC OPTIMIZER ===

# this example uses paths from https://github.com/ppengtang/oicr

# create an SGD solver, loads the net in train mode
# it knows about base_lr, weight_decay, momentum, lr_mult, decay_mult, iter_size, lr policy step, step_size, gamma
# it finds train.prototxt from the solver.prototxt's train_net or net parameters
solver = caffe.SGDSolver('oicr/models/VGG16/solver.prototxt')

# load pretrained weights
solver.net.copy_from('oicr/data/imagenet_models/VGG16.v2.caffemodel')

# runs one iteration of forward, backward, optimization; returns a float loss value
# data layer must be registered or inputs must be provided as keyword arguments
loss = solver.step(1)

Drop-in script for OICR enabling PyTorch as backend for eval and training

Place caffe_pytorch_oicr.py and caffemodel2pytorch.py in the root oicr directory. To use the PyTorch backend in testing and in training, put a line import caffe_pytorch_oicr at the very top (before import _init_paths) in tools/test_net.py and tools/train_net.py respectively. It requires PyTorch and CuPy (for on-the-fly CUDA kernel compilation).

# caffe_pytorch_oicr.py

import collections
import torch
import torch.nn.functional as F
import cupy
import caffemodel2pytorch

caffemodel2pytorch.initialize('./caffe-oicr/src/caffe/proto/caffe.proto') # needs to be called explicitly for these porting scenarios to enable caffe.proto.caffe_pb2 variable
caffemodel2pytorch.set_mode_gpu()
caffemodel2pytorch.modules['GlobalSumPooling'] = lambda param: lambda pred: pred.sum(dim = 0, keepdim = True)
caffemodel2pytorch.modules['MulticlassCrossEntropyLoss'] = lambda param: lambda pred, labels, eps = 1e-6: F.binary_cross_entropy(pred.clamp(eps, 1 - eps), labels)
caffemodel2pytorch.modules['data'] = lambda param: __import__('roi_data_layer.layer').layer.RoIDataLayer() # wrapping a PyCaffe layer
caffemodel2pytorch.modules['OICRLayer'] = lambda param: OICRLayer # wrapping a PyTorch function
caffemodel2pytorch.modules['WeightedSoftmaxWithLoss'] = lambda param: WeightedSoftmaxWithLoss
caffemodel2pytorch.modules['ReLU'] = lambda param: torch.nn.ReLU(inplace = True) # wrapping a PyTorch module
caffemodel2pytorch.modules['ROIPooling'] = lambda param: lambda input, rois: RoiPooling(param['pooled_h'], param['pooled_w'], param['spatial_scale'])(input, rois) # wrapping a PyTorch autograd function

def WeightedSoftmaxWithLoss(prob, labels_ic, cls_loss_weights, eps = 1e-12):
	loss = -cls_loss_weights * F.log_softmax(prob, dim = -1).gather(-1, labels_ic.long().unsqueeze(-1)).squeeze(-1)
	valid_sum = cls_loss_weights.gt(eps).float().sum()
	return loss.sum() / (loss.numel() if valid_sum == 0 else valid_sum)

def OICRLayer(boxes, cls_prob, im_labels, cfg_TRAIN_FG_THRESH = 0.5):
    cls_prob = (cls_prob if cls_prob.size(-1) == im_labels.size(-1) else cls_prob[..., 1:]).clone()
    boxes = boxes[..., 1:]
    gt_boxes, gt_classes, gt_scores = [], [], []
    for i in im_labels.eq(1).nonzero()[:, 1]:
        max_index = int(cls_prob[:, i].max(dim = 0)[1])
        gt_boxes.append(boxes[max_index])
        gt_classes.append(int(i) + 1)
        gt_scores.append(float(cls_prob[max_index, i]))
        cls_prob[max_index] = 0
    max_overlaps, gt_assignment = overlap(boxes, torch.stack(gt_boxes)).max(dim = 1)
    return gt_assignment.new(gt_classes)[gt_assignment] * (max_overlaps > cfg_TRAIN_FG_THRESH).type_as(gt_assignment), max_overlaps.new(gt_scores)[gt_assignment]

class RoiPooling(torch.autograd.Function):
	CUDA_NUM_THREADS = 1024
	GET_BLOCKS = staticmethod(lambda N: (N + RoiPooling.CUDA_NUM_THREADS - 1) // RoiPooling.CUDA_NUM_THREADS)
	Stream = collections.namedtuple('Stream', ['ptr'])

	kernel_forward = b'''
	#define FLT_MAX 340282346638528859811704183484516925440.0f
	typedef float Dtype;
	#define CUDA_KERNEL_LOOP(i, n) for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < (n); i += blockDim.x * gridDim.x)
	extern "C"
	__global__ void ROIPoolForward(const int nthreads, const Dtype* bottom_data,
		const Dtype spatial_scale, const int channels, const int height,
		const int width, const int pooled_height, const int pooled_width,
		const Dtype* bottom_rois, Dtype* top_data, int* argmax_data) {
	  CUDA_KERNEL_LOOP(index, nthreads) { 
		// (n, c, ph, pw) is an element in the pooled output
		int pw = index % pooled_width;
		int ph = (index / pooled_width) % pooled_height;
		int c = (index / pooled_width / pooled_height) % channels;
		int n = index / pooled_width / pooled_height / channels;

		bottom_rois += n * 5;
		int roi_batch_ind = bottom_rois[0];
		int roi_start_w = round(bottom_rois[1] * spatial_scale);
		int roi_start_h = round(bottom_rois[2] * spatial_scale);
		int roi_end_w = round(bottom_rois[3] * spatial_scale);
		int roi_end_h = round(bottom_rois[4] * spatial_scale);

		// Force malformed ROIs to be 1x1
		int roi_width = max(roi_end_w - roi_start_w + 1, 1);
		int roi_height = max(roi_end_h - roi_start_h + 1, 1);
		Dtype bin_size_h = static_cast<Dtype>(roi_height)
						   / static_cast<Dtype>(pooled_height);
		Dtype bin_size_w = static_cast<Dtype>(roi_width)
						   / static_cast<Dtype>(pooled_width);

		int hstart = static_cast<int>(floor(static_cast<Dtype>(ph)
											* bin_size_h));
		int wstart = static_cast<int>(floor(static_cast<Dtype>(pw)
											* bin_size_w));
		int hend = static_cast<int>(ceil(static_cast<Dtype>(ph + 1)
										 * bin_size_h));
		int wend = static_cast<int>(ceil(static_cast<Dtype>(pw + 1)
										 * bin_size_w));

		// Add roi offsets and clip to input boundaries
		hstart = min(max(hstart + roi_start_h, 0), height);
		hend = min(max(hend + roi_start_h, 0), height);
		wstart = min(max(wstart + roi_start_w, 0), width);
		wend = min(max(wend + roi_start_w, 0), width);
		bool is_empty = (hend <= hstart) || (wend <= wstart);

		// Define an empty pooling region to be zero
		Dtype maxval = is_empty ? 0 : -FLT_MAX;
		// If nothing is pooled, argmax = -1 causes nothing to be backprop'd
		int maxidx = -1;
		bottom_data += (roi_batch_ind * channels + c) * height * width;
		for (int h = hstart; h < hend; ++h) {
		  for (int w = wstart; w < wend; ++w) {
			int bottom_index = h * width + w;
			if (bottom_data[bottom_index] > maxval) {
			  maxval = bottom_data[bottom_index];
			  maxidx = bottom_index;
			}
		  }
		}
		top_data[index] = maxval;
		argmax_data[index] = maxidx;
	  }
	}
	'''

	kernel_backward = b'''
	typedef float Dtype;
	#define CUDA_KERNEL_LOOP(i, n) for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < (n); i += blockDim.x * gridDim.x)
	extern "C"
	__global__ void ROIPoolBackward(const int nthreads, const Dtype* top_diff,
		const int* argmax_data, const int num_rois, const Dtype spatial_scale,
		const int channels, const int height, const int width,
		const int pooled_height, const int pooled_width, Dtype* bottom_diff,
		const Dtype* bottom_rois) {
	  CUDA_KERNEL_LOOP(index, nthreads) {
		// (n, c, h, w) coords in bottom data
		int w = index % width;
		int h = (index / width) % height;
		int c = (index / width / height) % channels;
		int n = index / width / height / channels;

		Dtype gradient = 0;
		// Accumulate gradient over all ROIs that pooled this element
		for (int roi_n = 0; roi_n < num_rois; ++roi_n) {
		  const Dtype* offset_bottom_rois = bottom_rois + roi_n * 5;
		  int roi_batch_ind = offset_bottom_rois[0];
		  // Skip if ROI's batch index doesn't match n
		  if (n != roi_batch_ind) {
			continue;
		  }

		  int roi_start_w = round(offset_bottom_rois[1] * spatial_scale);
		  int roi_start_h = round(offset_bottom_rois[2] * spatial_scale);
		  int roi_end_w = round(offset_bottom_rois[3] * spatial_scale);
		  int roi_end_h = round(offset_bottom_rois[4] * spatial_scale);

		  // Skip if ROI doesn't include (h, w)
		  const bool in_roi = (w >= roi_start_w && w <= roi_end_w &&
							   h >= roi_start_h && h <= roi_end_h);
		  if (!in_roi) {
			continue;
		  }

		  int offset = (roi_n * channels + c) * pooled_height * pooled_width;
		  const Dtype* offset_top_diff = top_diff + offset;
		  const int* offset_argmax_data = argmax_data + offset;

		  // Compute feasible set of pooled units that could have pooled
		  // this bottom unit

		  // Force malformed ROIs to be 1x1
		  int roi_width = max(roi_end_w - roi_start_w + 1, 1);
		  int roi_height = max(roi_end_h - roi_start_h + 1, 1);

		  Dtype bin_size_h = static_cast<Dtype>(roi_height)
							 / static_cast<Dtype>(pooled_height);
		  Dtype bin_size_w = static_cast<Dtype>(roi_width)
							 / static_cast<Dtype>(pooled_width);

		  int phstart = floor(static_cast<Dtype>(h - roi_start_h) / bin_size_h);
		  int phend = ceil(static_cast<Dtype>(h - roi_start_h + 1) / bin_size_h);
		  int pwstart = floor(static_cast<Dtype>(w - roi_start_w) / bin_size_w);
		  int pwend = ceil(static_cast<Dtype>(w - roi_start_w + 1) / bin_size_w);

		  phstart = min(max(phstart, 0), pooled_height);
		  phend = min(max(phend, 0), pooled_height);
		  pwstart = min(max(pwstart, 0), pooled_width);
		  pwend = min(max(pwend, 0), pooled_width);

		  for (int ph = phstart; ph < phend; ++ph) {
			for (int pw = pwstart; pw < pwend; ++pw) {
			  if (offset_argmax_data[ph * pooled_width + pw] == (h * width + w)) {
				gradient += offset_top_diff[ph * pooled_width + pw];
			  }
			}
		  }
		}
		bottom_diff[index] = gradient;
	  }
	}
	'''
	cupy_init = cupy.array([])
	compiled_forward = cupy.cuda.compiler.compile_with_cache(kernel_forward).get_function('ROIPoolForward')
	compiled_backward = cupy.cuda.compiler.compile_with_cache(kernel_backward).get_function('ROIPoolBackward')

	def __init__(self, pooled_height, pooled_width, spatial_scale):
		self.pooled_height = pooled_height
		self.pooled_width = pooled_width
		self.spatial_scale = spatial_scale

	def forward(self, images, rois):
		output = torch.cuda.FloatTensor(len(rois), images.size(1) * self.pooled_height * self.pooled_width)
		self.argmax = torch.cuda.IntTensor(output.size()).fill_(-1)
		self.input_size = images.size()
		self.save_for_backward(rois)
		RoiPooling.compiled_forward(grid = (RoiPooling.GET_BLOCKS(output.numel()), 1, 1), block = (RoiPooling.CUDA_NUM_THREADS, 1, 1), args=[
			output.numel(), images.data_ptr(), cupy.float32(self.spatial_scale), self.input_size[-3], self.input_size[-2], self.input_size[-1],
			self.pooled_height, self.pooled_width, rois.data_ptr(), output.data_ptr(), self.argmax.data_ptr()
			  ], stream=RoiPooling.Stream(ptr=torch.cuda.current_stream().cuda_stream))
		return output

	def backward(self, grad_output):
		rois, = self.saved_tensors
		grad_input = torch.cuda.FloatTensor(*self.input_size).zero_()
		RoiPooling.compiled_backward(grid = (RoiPooling.GET_BLOCKS(grad_input.numel()), 1, 1), block = (RoiPooling.CUDA_NUM_THREADS, 1, 1), args=[
			grad_input.numel(), grad_output.data_ptr(), self.argmax.data_ptr(), len(rois), cupy.float32(self.spatial_scale), self.input_size[-3],
			self.input_size[-2], self.input_size[-1], self.pooled_height, self.pooled_width, grad_input.data_ptr(), rois.data_ptr()
			  ], stream=RoiPooling.Stream(ptr=torch.cuda.current_stream().cuda_stream))
		return grad_input, None
		
def overlap(box1, box2):
    b1, b2 = box1.t().contiguous(), box2.t().contiguous()
    xx1 = torch.max(b1[0].unsqueeze(1), b2[0].unsqueeze(0))
    yy1 = torch.max(b1[1].unsqueeze(1), b2[1].unsqueeze(0))
    xx2 = torch.min(b1[2].unsqueeze(1), b2[2].unsqueeze(0))
    yy2 = torch.min(b1[3].unsqueeze(1), b2[3].unsqueeze(0))
    inter = area(x1 = xx1, y1 = yy1, x2 = xx2, y2 = yy2)
    return inter / (area(b1.t()).unsqueeze(1) + area(b2.t()).unsqueeze(0) - inter)

def area(boxes = None, x1 = None, y1 = None, x2 = None, y2 = None):
    return (boxes[..., 3] - boxes[..., 1] + 1) * (boxes[..., 2] - boxes[..., 0] + 1) if boxes is not None else (x2 - x1 + 1).clamp(min = 0) * (y2 - y1 + 1).clamp(min = 0)

Note: I've also had to replace utils/bbox.pyx by utils/cython_bbox.pyx and utils/nms.pyx by utils/cython_nms.pyx in lib/setup.py to deal with some setup.py issues.

More Repositories

1

mpegflow

A tool for easy extraction of motion vectors stored in video files
C++
253
star
2

fastvideofeat

Efficient feature extraction, aggregation and classification for action recognition (CVPR 2014)
TeX
111
star
3

contextlocnet

ContextLocNet: Context-aware Deep Network Models for Weakly Supervised Localization (ECCV 2016)
Lua
88
star
4

ctc

Primer on CTC implementation in pure Python PyTorch code
Python
83
star
5

metriclearningbench

Metric learning models in PyTorch with results on CUB2011, CARS196, Stanford Online Products
Python
65
star
6

caffemodel2json

A small tool to dump Caffe's *.caffemodel to JSON for inspection
Python
40
star
7

convasr

Baseline convolutional ASR system in PyTorch
HTML
21
star
8

discordspeechtotext

Discord Speech-To-Text bot in Python using Google Cloud Speech-To-Text API
Python
21
star
9

tfcheckpoint2pytorch

Converts TensorFlow checkpoints (with index, meta and data files) to PyTorch, HDF5 and JSON
Python
18
star
10

wemosetup

A simple Python script to set up WeMo devices
Python
16
star
11

tsp-bb

Implementation of branch-and-bound for TSP in C++
C++
13
star
12

wigwam

A humane dependency fetcher for scientific projects
Python
10
star
13

readaudio

Read audio with FFmpeg into NumPy/PyTorch via ctypes (standard library module)
C
10
star
14

inferspeech

PyTorch speech2text inference script for the NVidia openseq2seq wav2letter model variant
Python
9
star
15

yet_another_pytorch_slot_attention

Reimplementation of Slot Attention (object discovery task) in PyTorch with converted checkpoint
Python
7
star
16

busyboxnanozipdiff3

Build script of BusyBox for WebAssembly (wasm) using Emscripten
C
6
star
17

mask2cad_pytorch

Python
6
star
18

echomsk

Crawler and parser utilities for Russian talk radio echo.msk.ru
Python
6
star
19

dotnetlibtorch

The simplest possible interfacing of C# and libtorch via DLPack P/Invoke wrapper around C function and structures
C#
5
star
20

pydlpack

Create a DLPack tensor in plain C library, wrap DLPack structures in Python, import DLpack tensor into NumPy / PyTorch
C
4
star
21

zrxiv

HTML
4
star
22

selective_search_pytorch

Selective Search reimplementation in PyTorch that allows to extract not only the bounding boxes, but also the region masks
C++
4
star
23

natudump

Scraping LegiFrance naturalisation decrees for fun and OSINT profit
Python
3
star
24

prostoboxes

A one-file, zero-dependency bounding box image annotation HTML user interface
Python
2
star
25

convdia

Jupyter Notebook
2
star
26

gittex

JavaScript
2
star
27

lexical-playground-only

TypeScript
2
star
28

busytex

[WIP] TexLive 2020 compiled with Emscripten into WebAssembly and bundled into a single executable
JavaScript
1
star
29

paws

Minimal scripts for spinning EC2 instances and running Deep Learning jobs
Python
1
star
30

vosges

Vosges is a simple Python tool for generating and running Sun Grid Engine (SGE) jobs, with a nice HTML dashboard
Python
1
star
31

UBR

Python
1
star
32

torchwav

Adaptation of wave and scipy.io.wavfile for reading .WAV audio format in PyTorch
Python
1
star
33

busyidetest

1
star
34

open_stt_splits

Python
1
star