Caption This!
This repository contains source code corresponding to our article "Caption this, with TensorFlow!"
Git Basics
-
Go to your home directory by opening your terminal and entering
cd ~
-
Clone the repository by entering
git clone https://github.com/mlberkeley/oreilly-captions.git
Docker (highly recommended)
Install Docker using the platform-specific installation instructions for Docker here. Our iPython notebooks are compatible with TensorFlow 1.0.
Option A: Use our pre-built Docker image from Docker Hub
-
After installing Docker, pull a prebuilt image from our Docker Hub by entering:
docker pull mlatberkeley/showandtell
You will need a Docker Hub account in order to pull the image (get one here). If it's your first time pulling a Docker image from Docker Hub you will need to login to your Docker Hub account from your terminal with
docker login
, and follow the username and password prompt. -
To run the pulled image (after cloning and downloading the repository) enter
docker run -it -p 8888:8888 -v <path to repo>:/root mlatberkeley/showandtell
where
<path to repo>
should be the absolute path to your cloned repository. If you followed our Git Basics section the path should be<path to your home directory>/oreilly-captions
. -
After building, starting, and attaching to the appropriate Docker container, run the provided Jupyter notebooks by entering
jupyter notebook --ip 0.0.0.0
and navigate to http://0.0.0.0:8888 in your browser.
Option B: Download and build your own Docker image from our GitHub repo
If you want to build a GPU or CPU-based Docker image of your own, you can use the Dockerfiles provided in the /dockerfiles/
subdirectory of our GitHub repo.
-
After cloning the repo to your machine, enter
docker build -t showandtell_<image_type> -f ./dockerfiles/Dockerfile.<image_type> ./dockerfiles/
where
<image_type>
is eithergpu
orcpu
. (Note that, in order to run these files on your GPU, you'll need to have a compatible GPU, with drivers installed and configured properly as described in TensorFlow's documentation.) -
Run the Docker image by entering
docker run -it -p 8888:8888 -v <path to repo>:/root showandtell_<image_type>
where
<image_type>
is eithergpu
orcpu
, depending on the image you built in the last step. -
After building, starting, and attaching to the appropriate Docker container, run the provided Jupyter notebooks by entering
jupyter notebook --ip 0.0.0.0
and navigate to http://0.0.0.0:8888 in your browser.
Note If you are using Docker Toolbox as opposed to native Docker you will have to navigate to the Daemon IP adress (instead of 0.0.0.0) provided right after starting the Docker Quickstart Terminal (for us this was 192.168.99.100) in order to use Jupyter.
Debugging docker
If you receive an error of the form:
WARNING: Error loading config file:/home/rp/.docker/config.json - stat /home/rp/.docker/config.json: permission denied
Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.26/images/json: dial unix /var/run/docker.sock: connect: permission denied
It's most likely because you installed Docker using sudo permissions with a packet manager such as brew
or apt-get
. To solve this permission denied
simply run docker with sudo
(ie. run docker
commands with sudo docker <command and options>
instead of just docker <command and options>
).
The Notebooks
There are three notebooks:
1. O'Reilly Training.ipynb
- Contains code to train a TensorFlow caption generator from a VGG16 word embedding as described in our article. Note: you must run this notebook'strain
method before running any of the other notebooks in order to generate a mapping between integers and our vocabulary's words that will be reused in the other notebooks.2. O'Reilly Generate.ipynb
- Contains the same code as1. O'Reilly Training.ipynb
except it introduces functionality to generate captions from an image embedding (as opposed to just being able to train on captions). Functions as a sanity check for the quality of captions we are generating.3. O'Reilly Generate from image.ipynb
- Builds on the previous notebook, except instead of feeding an image embedding to our caption generation model, it first feeds an image to the VGG-16 Convolutional Neural Network to generate an image feature embedding. This gives us an end-to-end pipeline for going from an image to a caption.- In order to run the test notebook edit the image path in the ipynb (more details in the
.ipynb
itself).
Additional Downloads:
In order to run the first two notebooks, you will need VGG-16 image embeddings for the Flickr-30K dataset. These image embeddings are available from our Google Drive.
Additionally, you will need the corresponding captions for these images (results_20130124.token
), which can also be downloaded from our Google Drive.
In order to run the 3. O'Reilly Generate from image.ipynb
notebook you will need to download a pretrained TensorFlow model for VGG-16 generated from the original Caffe model from the VGG-16 paper.
Place all of these downloads in the ./data/
directory.
Pretrained Weights:
We've trained the caption generator (w/o training VGG-16 End2End) to 500 epochs, and we've placed the resulting checkpoint files in ./models/tensorflow
. You should experience an average reconstruction loss of ~1.75-1.85.