• Stars
    star
    108
  • Rank 314,792 (Top 7 %)
  • Language
    Python
  • License
    MIT License
  • Created over 7 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

An implementation of image captioning in Keras

Image Captioning in Keras

(Note: You can read an in-depth tutorial about the implementation in this blogpost.)

This is an implementation of image captioning model based on Vinyals et al. with a few differences:

  • For CNN we use Inception v3 instead of Inception v1.

  • For RNN we use multi-layered LSTM instead of single-layered one.

  • We don't have a special start-of-sentence word so we feed the first word at t = 1 instead of t = 2.

  • We use different values for some hyperparameters:

    Hyperparameter Value
    Learning rate 0.00051
    Batch size 32
    Epochs 33
    Dropout rate 0.22
    Embedding size 300
    LSTM output size 300
    LSTM layers 3

Examples of Captions Generated by the Proposed Model

Result examples without errors

Evaluation Metrics

Quantitatively, the proposed model's performance is on par with Vinyals' model on Flickr8k dataset:

Metric Proposed Model Vinyals' Model
BLEU-1 61.8 63
BLEU-2 40.8 41
BLEU-3 27.8 27
BLEU-4 19.0 N/A
METEOR 21.5 N/A
CIDEr 41.5 N/A

Environment Setup

  1. Download the dataset needed.

    ./scripts/download_dataset.sh
  2. Download pretrained word vectors.

    ./scripts/download_pretrained_word_vectors.sh
  3. Download pycocoevalcap data.

    ./scripts/download_pycocoevalcap_data.sh
  4. Install the dependencies.

    Note: It was only tested on Python 2.7. It may need minor code changes to work on Python 3.

    # Optional: Create and activate your virtualenv / Conda environment
    
    pip install -r requirements.txt
  5. Setup PYTHONPATH.

    source ./scripts/setup_pythonpath.sh

Using a Pretrained Model

  1. Download a pretrained model from releases page.

  2. Copy model-weights.hdf5 to keras-image-captioning/results/flickr8k/final-model.

  3. Now you can run an inference from that checkpoint by executing a command below from keras-image-captioning directory:

    python -m keras_image_captioning.inference \
    --dataset-type test \
    --method beam_search \
    --beam-size 3 \
    --training-dir results/flickr8k/final-model

Training from Scratch

1. Run a Training

For reproducing the model, execute:

python -m keras_image_captioning.training \
  --training-label repro-final-model \
  --from-training-dir results/flickr8k/final-model

There are many arguments available that you can look inside training.py.

2. Run an Inference and Evaluate It

python -m keras_image_captioning.inference \
  --dataset-type test \
  --method beam_search \
  --beam-size 3 \
  --training-dir var/flickr8k/training-results/repro-final-model

Note:

  • dataset_type can be either 'validation' or 'test'.
  • You can look the captions generated at var/flickr8k/training-results/repro-final-model/test-predictions-3-20.yaml. You can compare it with my result at results/flickr8k/final-model/test-predictions-3-20.yaml.

License

MIT License. See LICENSE file for details.