• Stars
    star
    350
  • Rank 120,502 (Top 3 %)
  • Language
    Python
  • License
    MIT License
  • Created over 7 years ago
  • Updated over 7 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Source code and pretrained model for running pix2pix in realtime on a webcam feed.

This is the source code and pretrained model for the webcam pix2pix demo I posted recently on twitter and vimeo. It uses deep learning, or to throw in a few buzzwords: deep convolutional conditional generative adversarial network autoencoder.

video 1 video 1

video 2 video 2

Overview

The code in this particular repo actually has nothing to do with pix2pix, GANs or even deep learning. It just loads any pre-trained tensorflow model (as long as it complies with a few constraints), feeds it a processed webcam input, and displays the output of the model. It just so happens that the model I trained and used is pix2pix (details below).

I.e. The steps can be summarised as:

  1. Collect data: scrape the web for a ton of images, preprocess and prepare training data
  2. Train and export a model
  3. Preprocessing and prediction: load pretrained model, feed it live preprocessed webcam input, display the results.

1. Data

I scraped art collections from around the world from the Google Art Project on wikimedia. A lot of the images are classical portraits of rich white dudes, so I only used about 150 collections, trying to keep the data as geographically and culturally diverse as possible (full list I used is here). But the data is still very euro-centric, as there might be hundreds or thousands of scans from a single European museum, but only 8 scans from an Arab museum.

I downloaded the 300px versions of the images, and ran a batch process to :

  • Rescale them to 256x256 (without preserving aspect ratio)
  • Run a a simple edge detection filter (opencv canny)

I also ran a batch process to take multiple crops from the images (instead of a non-uniform resizing) but I haven't trained on that yet. Instead of canny edge detection, I also started looking into the much better 'Holistically-Nested Edge Detection' (aka HED) by Xie and Tu (as used by the original pix2pix paper), but haven't trained on that yet either.

This is done by the preprocess.py script (sorry no command line arguments, edit the script to change paths and settings, should be quite self-explanatory).

A small sample of the training data - including predictions of the trained model - can be seen here. Right-most column is the original image, left-most column is the preprocessed version. These two images are fed into the pix2pix network as a 'pair' to be trained on. The middle column is what the model learns to produce given only the left-most column. (The images show each training iteration - i.e. the number on the left, which goes from 20,000 to 58,000, so it gradually gets better the further down you go on the page).

training_data

I also trained an unconditional GAN (i.e. normal DCGAN on this same training data. An example of its output can be seen below. (This is generating 'completely random' images that resemble the training data).

dcgan

2. Training

The training and architecture is straight up 'Image-to-Image Translation with Conditional Adversarial Nets' by Isola et al (aka pix2pix). I trained with the tensorflow port by @affinelayer (Christopher Hesse), which is also what's powering that 'sketch-to-cat'- demo that went viral recently. He also wrote a nice tutorial on how pix2pix works. Infinite thanks to the authors (and everyone they built on) for making their code open-source!

I only made one infinitesimally tiny change to the tensorflow-pix2pix training code, and that is to add tf.Identity to the generator inputs and outputs with a human-readable name, so that I can feed and fetch the tensors with ease. So if you wanted to use your own models with this application, you'd need to do the same. (Or make a note of the input/output tensor names, and modify the json accordingly, more on this below).

You can download my pretrained model from the Releases tab.

pix2pix_diff

3. Preprocessing and prediction

What this particular application does is load the pretrained model, do live preprocessing of a webcam input, and feed it to the model. I do the preprocessing with old fashioned basic computer vision, using opencv. It's really very minimal and basic. You can see the GUI below (the GUI uses pyqtgraph).

ruby

Different scenes require different settings.

E.g. for 'live action' I found canny to provide better (IMHO) results, and it's what I used in the first video at the top. The thresholds (canny_t1, canny_t2) depend on the scene, amount of detail, and the desired look.

If you have a lot of noise in your image you may want to add a tiny bit of pre_blur or pre_median. Or play with them for 'artistic effect'. E.g. In the first video, at around 1:05-1:40, I add a ton of median (values around 30-50).

For drawing scenes (e.g. second video) I found adaptive threshold to give more interesting results than canny (i.e. disable canny and enable adaptive threshold), though you may disagree.

For a completely static input (i.e. if you freeze the capture, disabling the camera update) the output is likely to flicker a very small amount as the model makes different predictions for the same input - though this is usually quite subtle. However for a live camera feed, the noise in the input is likely to create lots of flickering in the output, especially due to the high susceptibility of canny or adaptive threshold to noise, so some temporal blurring can help.

accum_w1 and accum_w2 are for temporal blurring of the input, before going into the model: new_image = old_image * w1 + new_image * w2 (so ideally they should add up to one - or close to).

Prediction.pre_time_lerp and post_time_lerp also do temporal smoothing: new_image = old_image * xxx_lerp + new_image * (1 - xxx_lerp) pre_time_lerp is before going into the model, and post_time_lerp is after coming out of the model.

Zero for any of the temporal blurs disables them. Values for these depend on your taste. For both of the videos above I had all of pre_model blurs (i.e. accum_w1, accum_w2 and pre_time_lerp) set to zero, and played with different post_time_lerp settings ranging from 0.0 (very flickery and flashing) to 0.9 (very slow and fadey and 'dreamy'). Usually around 0.5-0.8 is my favourite range.

Using other models

If you'd like to use a different model, you need to setup a JSON file similar to the one below. The motivation here is that I actually have a bunch of JSONs in my app/models folder which I can dynamically scan and reload, and the model data is stored elsewhere on other disks, and the app can load and swap between models at runtime and scale inputs/outputs etc automatically.

{
	"name" : "gart_canny_256", # name of the model (for GUI)
	"ckpt_path" : "./models/gart_canny_256", # path to saved model (meta + checkpoints). Loads latest if points to a folder, otherwise loads specific checkpoint
	"input" : { # info for input tensor
		"shape" : [256, 256, 3],  # expected shape (height, width, channels) EXCLUDING batch (assumes additional axis==0 will contain batch)
		"range" : [-1.0, 1.0], # expected range of values 
		"opname" : "generator/generator_inputs" # name of tensor (':0' is appended in code)
	},
	"output" : { # info for output tensor
		"shape" : [256, 256, 3], # shape that is output (height, width, channels) EXCLUDING batch (assumes additional axis==0 will contain batch)
		"range" : [-1.0, 1.0], # value range that is output
		"opname" : "generator/generator_outputs" # name of tensor (':0' is appended in code)
	}
}

Requirements

  • python 2.7 (likely to work with 3.x as well)
  • tensorflow 1.0+
  • opencv 3+ (probably works with 2.4+ as well)
  • pyqtgraph (only tested with 0.10)

Tested only on Ubuntu 16.04, but should work on other platforms.

I use the Anaconda python distribution which comes with almost everything you need, then it's (hopefully) as simple as:

  1. Download and install anaconda from https://www.continuum.io/downloads

  2. Install tensorflow https://www.tensorflow.org/install/ (Which - if you have anaconda - is often quite straight forward since most dependencies are included)

  3. Install opencv and pyqtgraph

    conda install -c menpo opencv3 conda install pyqtgraph

Acknowledgements

Infinite thanks once again to

  • Isola et al for pix2pix and @affinelayer (Christopher Hesse) for the tensorflow port
  • Radford et al for DCGAN and @carpedm20 (Taehoon Kim) for the tensorflow port
  • The tensorflow team
  • Countless others who have contributed to the above, either directly or indirectly, or opensourced their own research making the above possible
  • My wife for putting up with me working on a bank holiday to clean up my code and upload this repo.

More Repositories

1

eco-nft

A basic guide to ecofriendly CryptoArt (NFTs)
727
star
2

ai-resources

Selection of resources to learn Artificial Intelligence / Machine Learning / Statistical Inference / Deep Learning / Reinforcement Learning
620
star
3

ofxMSATensorFlow

C++ openframeworks addon and examples integrating Google's TensorFlow numerical computation and machine learning library
C++
455
star
4

ofxMSAFluid

C++ openFrameworks addon for solving and drawing 2D fluid systems based on Navier-Stokes equations and Jos Stam's paper "Real-Time Fluid Dynamics for Games"
C++
189
star
5

ofxMSAOpenCL

C++ openFrameworks addon for very simple to use wrapper for OpenCL. All underlying openCL objects are accessible to allow advanced features too if need be
C++
88
star
6

p5-MSAFluid

MSAFluid library for processing.
Processing
82
star
7

ofxKinect-demos

demos for ofxKinect (Kinect openFrameworks addons)
C++
82
star
8

ofxIlda

C++ openFrameworks addon for a device agnostic Ilda functionality to control galvo+mirror based laser projectors. See ofxEtherdream for integration with opensource ILDA laser DAC.
C++
73
star
9

ofxMSAmcts

A very simple C++ MCTS (Monte Carlo Tree Search) implementation with examples for openFrameworks
C++
61
star
10

ofxMSAPhysics

C++ openFrameworks addon for particle/constraint based physics library with springs, attractors and collision. It uses a very similar api to the traer.physics library for processing to make getting into it as easy as possible. All classes are template based with typedefs for physics in 2D or 3D - Potentially even more dimensions! Demo at www.memo.tv/msaphysics
C++
52
star
11

ofxEtherdream

C++ openFrameworks addon for interfacing with the Etherdream DAC
C
51
star
12

ofxMSAWord2Vec

openFrameworks addon to load and manage word2vec word embeddings, and supporting python code to train them
Python
42
star
13

ofxARDrone

C++ openFrameworks addon to interface with and control Parrot AR.Drone1 and 2.
C++
32
star
14

py-msa-kdenlive

Python script to load a Kdenlive (OSS NLE video editor) project file, and conform the edit on video or numpy arrays.
Python
30
star
15

ofxMSAInteractiveObject

C++ openFrameworks addon which wraps up some poco functionality to make flash-like objects which auto updates/draws and adds mouse methods like onRollOver, onPress, OnRollout? etc with bounds checking with easy to remember functions to register/un-register callbacks.
C++
28
star
16

ofxSimpleGuiToo

C++ openFrameworks addon for simple and very quick to setup GUI based on Todd Vanderlin's ofxSimpleGui. It uses a very similar (almost identical) API, but with a rewritten backend.
C++
25
star
17

petita-dumdum

I'm a Soundcloud bot. I like the abstract poetry of Petita Tatata, so I download and improvise music over them. I upload my improvisations to soundcloud. I'm still very young and just learning to play. But it's fun and I enjoy.
Max
23
star
18

of-Slitscan3D

volumetric slit-scan demo using Kinect. made with openFrameworks 0072 + ofxKinect
C
22
star
19

iSteveJobs

Feel free this to install this in all Apple Stores. (I'm sure it's legal, but don't quote me on that)
C
21
star
20

ofxMSAInterpolator

C++ openFrameworks addon with a set of template classes for doing various types of interpolations on data with any number of dimensions. You can feed the system an arbitrary number of data, then resample at any resolution, or ask for the value at any percentage along the data. Input data can be floats (for 1D splines, Vec2f (for 2D splines), Vec3f (for 3D splines), or even matrices, or custom data types (e.g. biped pose). Demo at www.memo.tv/msainterpolator
C
17
star
21

ofxMSACore

A very lightweight C++ wrapper which maps basic types to allow tight integration with openFrameworks (www.openframeworks.cc) and Cinder (www.libcinder.org) - or potentially any other C++ frameworks. Used by most of my other ofxMSAxxx addons.
C++
16
star
22

ofxMSAControlFreak

GUI agnostic parameter management system for openFrameworks
C++
15
star
23

VolumeRunner

C++
13
star
24

ofxMSAlibs

Collection of addons for openFrameworks
13
star
25

stayhomesavelives

JavaScript
11
star
26

max-skeletonML

Max
11
star
27

p5-Webcam-Piano-1.5

Processing demo to use the webcam to play a virtual piano. Adjustable gridsize, harmonics, scale etc.
Java
10
star
28

ofxMSAControlFreakGui

OpenGL Gui to display ofxMSAControlFreak parameters
C++
8
star
29

MSAQTPlayer

Native Cocoa fullscreen, multiple output Quicktime player with fast, greater-than-4096-pixels support.
Objective-C
7
star
30

ofxMSAPingPong

C++ openFrameworks addon to create a double buffer to be able to ping-pong back and forth. Templated class allows any data type to be ping-ponged (E.g. opengl texture, opencl image buffer, arbitrary data buffer etc).
C++
7
star
31

ofxMSABPMTapper

C++ openFrameworks addon to keep track of BPM (via tapping) and output helper tempos on the beat, double-tempo, half-tempo, on every bar etc.
C++
6
star
32

ofxARDroneOscBridge

C++
6
star
33

ofxMSATimer

C++ openFrameworks addon to keep track of ultra high resolution time on posix systems using mach_absolute_time (currently using host timing on windows and linux)
C++
6
star
34

AS3-SVGExport

Demo for drawing and exporting SVG from Flash. Doesn't support full SVG, only strokes, colors, thickness.
6
star
35

ainsanity-webextension

Cross-browser WebExtension to replace the terms Artificial Intelligence, Artificially Intelligent, AI etc. with Data-Driven Methods and DDM (and throws in a bar graph emoji for good measure)
JavaScript
6
star
36

AS3-Noise-Suite

A suite of AS3 classes for creating various noise functions, interpolations, fractal composition etc.
ActionScript
5
star
37

of-VertexArray-VBO-PointSprite-Demo

Demos usage of VertexArrays, VBOs, and Point Sprites for openFrameworks 006+. Mac xcode project provided, but should also work on win, linux, ios (if immediate mode code is removed)
C++
5
star
38

ofxMSAStereoSolver

C++ openFrameworks addon to calibrate, rectify and create disparity maps from two non-rectified and non-calibrated images (or image sequences). NOT TESTED WITH LATEST OPENFRAMEWORKS YET.
C++
5
star
39

Projector-Lag-Timer

Measures the lag in a video projector
C++
4
star
40

triggershift

http://www.triggershift.org/
Java
4
star
41

ciVSyncTestApp

Vsync / FPS test application for Cinder
C++
4
star
42

ofxMSACocoa

Native Cocoa Windowing addon for openFrameworks [OLD]
Objective-C++
4
star
43

OF-audioOverSerial

old school serial comms over audio (i.e. encode text or any binary data).
C++
4
star
44

MSAMultiVideoPlayer

a very simple cross platform application to position and play multiple videos anywhere on the screen (or multiple screens).
C++
4
star
45

ofxMSABumblebeeGrabber

C++ openFrameworks addon to capture stereo images from a PointGrey bumblebee using PTGrey SDKs. NOT TESTED WITH LATEST OPENFRAMEWORKS YET.
C++
3
star
46

pr_kinect2_tracker

C++
3
star
47

p5-Drippy

Drippy branchy particle system for processing. Underlying graphics engine for my Roots project.
Java
3
star
48

joanie_contour_tester

C++
3
star
49

ofxMSAMultiCam

Wrapper for managing multiple cameras (ofVideoGrabber or ofxMachineVision), and laying them out into an ofFbo for easy processing in a single go (e.g. on GPU).
C++
3
star
50

p5-bubbles

A simple bubbles particle system demo built with www.processing.org
Java
3
star
51

eyeo-workshop-2013-06-05

Support files for Eyeo Workshop 2013-06-05 "The black art of manipulating numbers. Essential applied maths for computational artists."
Processing
3
star
52

ofxMSAOrderedPointerMap

C++ template class (openFrameworks addon) to create an ordered named map of pointers (wraps std::vector and std::map)
C++
2
star
53

ofxProjectorBlendSimpleGui

Manages a ofxSimpleGuiToo for ofxProjectorBlend (requires my fork)
C++
2
star
54

ofxMSAMotionTracker

C++
2
star
55

ofxMSAOrderedMap

C++ template class (openFrameworks addon) to create an ordered named map (wraps std::map and std::vector)
C++
2
star
56

AS3-Flies3D

A custom 3D particle system and demo written in ActionScript 3.0 with no dependencies (using custom 3D engine) & demo simulating a swarm of flies.
ActionScript
2
star
57

ofxMSAVideoInput

Multi Video Input wrapper for openFrameworks
C++
2
star
58

Empathy_001

Video playback and data capture software for "Realism, abstraction and empathy in video game violence" study.
C++
2
star
59

NSArray-C-Array-performance-comparison

Tools (originally for iphone, would work on mac too) to compare performance of NSArray with C/C++ Array
Objective-C
2
star
60

ofxMSAVectorUtils

C++ templated generic vector<T> util functions for when you don't want to use a huge linalg lib. Only very basic vector functions implemented. For more complex stuff I suggest a dedicated library (like Eigen or Armadillo)
C++
2
star
61

AS3-Starry-Mouse

Springy, trippy, starry particle system using custom 3D particle system
ActionScript
2
star
62

ofxMSAControlFreakOsc

C++
1
star
63

Boss-Grinder

It puts the data in the grind (written for processing 1.5.1)
Java
1
star
64

msalibs_2000

MSA Libs circa 1999/2000
C++
1
star
65

msaOscML

1
star
66

ofSharedPtrExample

simple C++ openframeworks example demonstrating how to use (standard) shared pointers
C++
1
star
67

ofxMSAParamsCocoa

openFrameworks Cocoa Gui generator for MSAParams [OLD]
Objective-C++
1
star
68

ofxMSAPerlin

original (2002 improved) Perlin noise for openFrameworks
C++
1
star
69

resdelet-2011

All the letters on your screen pour down when you hit the DELETE key
1
star
70

MSAFreehander

C++
1
star
71

ofxMSAFluidGuiManager

GUI Manager for ofxMSAFluid
C++
1
star
72

ofxMSALividOhmRGB

A simple C++ class to manage and organize the midi config of a Livid Ohm64/RGB midi controller
Objective-C
1
star
73

AS2-Snow

A Snow particle system for ActionScript 2.0
ActionScript
1
star
74

ofxMSADirManager

Directory utils
C++
1
star
75

ofVSyncTestApp

Vsync / FPS test application for openFrameworks
C++
1
star
76

ofxMSAControlFreakImGui

Lightweight wrapper to display ofxMSAControlFreak parameters with ImGui
C++
1
star
77

HouseOfCardsDataConvertor

Utility to preprocess and convert the Radiohead "House of Cards" point cloud data from text to a binary grid for much faster streaming and persistence across frames
C++
1
star
78

ofxMSAObjCPointer

C++ openFrameworks addon to provide simple ObjectiveC style Smart Pointer functionality, which keeps reference count and frees memory when reference count is zero. This is accomplished by providing retain/release methods similar to Objective C. This is now probably superseded by built in shared pointers.
C++
1
star
79

p5-thom-yorke-face-puller

Processing demo to interact with the Radiohead "House of Cards" point cloud data with mouse, pulling, stretching, springs etc.
Java
1
star