kastnerkyle/deform-conv

Stars
209
Rank 187,219 (Top 4 %)
Language
Python
License
MIT License
Created over 7 years ago
Updated over 7 years ago

kastnerkyle/deform-conv

kastnerkyle

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Deformable Convolution in TensorFlow / Keras

Understanding Deformable Convolution

Keras / TensorFlow implementation of deformable convolution.

Dai, Jifeng, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, and Yichen Wei. 2017. “Deformable Convolutional Networks.” arXiv [cs.CV]. arXiv. http://arxiv.org/abs/1703.06211

Check out https://medium.com/@phelixlau/notes-on-deformable-convolutional-networks-baaabbc11cf3 for my summary of the paper.

Experiment on MNIST and Scaled Data Augmentation

To demonstrate the effectiveness of deformable convolution with scaled images, we show that by simply replacing regular convolution with deformable convolution and fine-tuning just the offsets with a scale-augmented datasets, deformable CNN performs significantly better than regular CNN on the scaled MNIST dataset. This indicates that deformable convolution is able to more effectively utilize already learned feature map to represent geometric distortion.

First, we train a 4-layer CNN with regular convolution on MNIST without any data augmentation. Then we replace all regular convolution layers with deformable convolution layers and freeze the weights of all layers except the newly added convolution layers responsible for learning the offsets. This model is then fine-tuned on the scale-augmented MNIST dataset.

In this set up, the deformable CNN is forced to make better use of the learned feature map by only changing its receptive field.

Note that the deformable CNN did not receive additional supervision other than the labels and is trained with cross-entropy just like the regular CNN.

Test Accuracy	Regular CNN	Deformable CNN
Regular MNIST	98.74%	97.27%
Scaled MNIST	57.01%	92.55%

Please refer to scripts/scaled_mnist.py for reproducing this result.

Notes on Implementation

This implementation is not efficient. In fact a forward pass with deformed convolution takes 260 ms, while regular convolution takes only 10 ms. Also, GPU average utilization is only around 10%.
This implementation also does not take advantage of the fact that offsets and the input have similar shape (in tf_batch_map_offsets). (So STN-style bilinear sampling will help)
The TensorFlow Keras backend must be used

PyCon2015

Material for talk "Machine Learning 101" https://speakerdeck.com/kastnerkyle/pycon2015 https://us.pycon.org/2015/schedule/presentation/367/

kaggle-dogs-vs-cats

Code for Kaggle Dovs vs. Cats competition http://www.kaggle.com/c/dogs-vs-cats

tools

Various tools for graphs, audio, images

representation_mixing

Demos, pretrained models, and (WIP) code supporting Representation Mixing

SciPy2015

Talk for SciPy2015 "Deep Learning: Tips From The Road"

kastnerkyle.github.io

Jupyter Notebook

kaggle-cifar10

Code for Kaggle Cifar10 competition http://www.kaggle.com/c/cifar-10

kaggle-criteo

Code for Criteo competition http://www.kaggle.com/c/criteo-display-ad-challenge

diphone_synthesizer

A tutorial diphone synthesizer in Python

SciPy2013

Slides and code for SciPy 2013 talk - "A Gentle Introduction To Machine Learning"

raw_voice_cleanup

Examples of cleaning up raw voices

ez-phones

Wrapper to pocketsphinx phoneme labeling tools

harmonic_recomposition_workshop

net

Neural networks in Theano (ABANDONED/DISCONTINUED) - see dagbldr for a continuation of this code with some new tricks

pachet_experiments

research_megarepo

A monster repo for random research, not organized in any particular way

romanian_noise_tests

Samples and visualization of a small test for speech synthesis - see the paper at http://www.josesotelo.com/speechsynthesis/

g2p_pi

Simple rules based grapheme to phoneme in Python

quadcopter-brain

Repository for work on a quadcopter brain using hacked wifi SD cards

speech_density

Speech modeling using code by Kratarth Goel http://dblp.uni-trier.de/pers/hd/g/Goel:Kratarth

PyTexas2013

Code and slides for PyTexas 2013 talk, "Trends in Deep Learning"

udem_masters_thesis

My (English, after the abstract) thesis, "Structured Prediction and Generative Modeling using Neural Networks". Thanks to my professors (Aaron Courville, Roland Memisevic, Yoshua Bengio) and coauthors (Junyoung Chung, Laurent Dinh, Amjad Almahairi, Francesco Visin, Kyunghyun Cho, Kratarth Goel, and others)

deviation

Deviation for the Devo F7

analysis_of_audio_superresolution_using_neural_nets

Analyzing samples from ICLR workshop paper https://openreview.net/forum?id=S1gNakBFx&noteId=S1gNakBFx

arrayprocessing

Array processing algorithms and tools for RF array design, direction of arrival (DOA, also known as DF) estimation, and geolocation.

santa_barbaria

Standalone experiments

PyGotham2015

Slides for PyGotham2015

crikey

Research experiments

vrnn-samples

Audio samples from "A Recurrent Latent Variable Model for Sequential Data"; J. Chung, K. Kastner, L. Dinh, K. Goel, A. Courville, Y. Bengio http://arxiv.org/abs/1506.02216

rmelnet

Experimental dump of R-MelNet related code and demo files

Jupyter Notebook

pyrnn

Python bindings for RNNLIB

speech_feature_frontend

Experimenting with standalone frontend features from https://github.com/r9y9/deepvoice3_pytorch

EuroScipy2014

Slides and small scripts for EuroScipy talk "Neural Networks for Computer Vision". Abstract can be found here: https://www.euroscipy.org/2014/schedule/presentation/2/

Sandbox

Toy implementations of things which may (or may not) grow into something else

deconstructionism

deconstructing a working model from https://github.com/Grzego/handwriting-generation

tfbldr

Tensorflow models and tools for research

todo

My todo list on various things - no code

QuDSP

A DSP Application Framework

ift6266h15

Work for IFT6266, Hiver (Winter) 2015 - Representation Learning. Class project focuses on Kaggle Cats and Dogs competition.

School

minet

Theano based neural networks... again (DISCONTINUED/ABANDONED) - see dagbldr for experiments continued in this style

linear-chain-crf

Working through linear chain CRF from Hugo Larochelle's Homework 2 - http://www.dmi.usherb.ca/~larocheh/cours/ift725_A2014/evaluations.html

pthbldr

PyTorch tools and experiments

pyklatt

Automatically exported from code.google.com/p/pyklatt

modern_wta

hmm_tts_build

a direct repository for building and using a "simple" tts

scrape_lyrics

stexp

fruitspeech_dataset

Github host for Hakon Sandsmark's Fruitspeech Dataset

Configs

Configs for different programs

dagbldr-tensorflow

A port of dagbldr that uses tensorflow instead of Theano

pyrobosim

miller_center_presidential_speech

jaxexps

kklib

coursera-audio-signal-processing

Python coursework for https://class.coursera.org/audio-001

dvaess

Tensorflow implementation of DAVE##: Learning Undirected Posteriors by Backpropagation through MCMC Updates, https://arxiv.org/abs/1901.03440

kaggle-decmeg2014

Code for Kaggle DecMeg 2014 competition http://www.kaggle.com/c/decoding-the-human-brain

tf_and_torch_speechmatch

blizzard_speech_example

ark

generated_music_samples

Generated music samples from various models

Thesis

Work directly tied to work on my master's thesis

ift6268h15

Repository for Dr. Roland Memisevic's course - Learning for Vision

ale_world