• Stars
    star
    210
  • Rank 186,491 (Top 4 %)
  • Language
    Python
  • Created almost 7 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Code for AI Challenger contest. (Generating chinese image captions)

Image Captioning in Chinese (trained on AI Challenger)

This provides the code to reproduce my result on AI Challenger Captioning contest (#3 on test b).

This is based on my ImageCaptioning.pytorch repository and self-critical.pytorch. (They all share a lot of the same git history)

Requirements

Python 2.7 PyTorch 0.2 (along with torchvision) tensorboard-pytorch jieba hashlib

Pretrained models (not supported)

Train your own network on AI Challenger

Download ai_challenger dataset and preprocessing

First, download the ai_challenger images from link. We need both training and validationd data. We decompress the data into a same folder, say data/ai_challenger, the structure would look like:

β”œβ”€β”€ data
β”‚Β Β  β”œβ”€β”€ ai_challenger
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ caption_train_annotations_20170902.json
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ caption_train_images_20170902
β”‚Β Β  β”‚Β Β  β”‚Β Β  β”œβ”€β”€ ...
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ caption_validataion_annotations_20170910.json
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ caption_validation_images_20170910
β”‚Β Β  β”‚Β Β  β”‚Β Β  β”œβ”€β”€ ...
β”‚Β Β  β”œβ”€β”€ ...

Once we have the images and the annotations, we can now invoke the prepro_*.py script, which will read all of this in and create a dataset (two feature folders, a hdf5 label file and a json file).

$ python scripts/prepro_split_tokenize.py --input_json ./data/ai_challenger/caption_train_annotations_20170902.json ./data/ai_challenger/caption_validation_annotations_20170910.json --output_json ./data/data_chinese.json --num_val 10000 --num_test 10000
$ python scripts/prepro_labels.py --input_json data/data_chinese.json --output_json data/chinese_talk.json --output_h5 data/chinese_talk --max_length 20 --word_count_threshold 20
$ python scripts/prepro_reference_json.py --input_json ./data/ai_challenger/caption_train_annotations_20170902.json ./data/ai_challenger/caption_validation_annotations_20170910.json --output_json ./data/eval_reference.json
$ python scripts/prepro_ngrams.py --input_json data/data_chinese.json --dict_json data/chinese_talk.json --output_pkl data/chinese-train --split train

prepro_split_tokenize will conbine both training and validation data, and randomly the dataset into train, val and test. It will also tokenize the captions using jiebe.

prepro_labels.py will map all words that occur <= 20 times to a special 卍 token, and create a vocabulary for all the remaining words. The image information and vocabulary are dumped into data/chinese_talk.json and discretized caption data are dumped into data/chinese_talk_label.h5.

prepro_reference_json.py will prepare the json file for caption evaluation.

prepro_ngrams.py will prepare the file for self critical training.

(Check the prepro scripts for more options, like other resnet models or other attention sizes.)

Prepare the features

We use bottom-up features to get the best results. However, if the code should also support using resnet101 features.

  • Using resnet101
$ python scripts/prepro_feats.py --input_json data/data_chinese.json --output_dir data/chinese_talk --images_root data/ai_challenger --att_size 7

This extracts the resnet101 features (both fc feature and last conv feature) of each image. The features are saved in data/chinese_talk_fc and data/chinese_talk_att, and resulting files are about 100GB.

  • Using bottom-up-features

Here is the pre-extracted feature for downloading link.

Code for extracting the features is here

Download the evaluation code

Clone from link and link

Start training

mkdir xe
$ bash run_train.sh

Evaluate on test split

$ python eval.py --dump_images 0 --num_images -1 --split test  --model log_dense_box_bn/model-best.pth --language_eval 1 --beam_size 5 --temperature 1.0 --sample_max 1  --infos_path log_dense_box_bn/infos_dense_box_bn-best.pkl

To run ensemble:

python eval_ensemble.py --dump_images 0 --language_eval 1 --batch_size 5 --num_images -1 --split test  --ids dense_box_bn dense_box_bn1 --beam_size 5 --temperature 1.0 --sample_max 1

Acknowledgements

Thanks the original neuraltalk2 and awesome PyTorch team.

More Repositories

1

pytorch-faster-rcnn

pytorch1.0 updated. Support cpu test and demo. (Use detectron2, it's a masterpiece)
Jupyter Notebook
1,818
star
2

ImageCaptioning.pytorch

I decide to sync up this repo and self-critical.pytorch. (The old master is in old master branch for archive)
Python
1,349
star
3

self-critical.pytorch

Unofficial pytorch implementation for Self-critical Sequence Training for Image Captioning. and others.
Python
942
star
4

pytorch-resnet

Convert resnet trained in caffe to pytorch model. (group norm resnet is provided too)
Python
227
star
5

Transformer_Captioning

Use transformer for captioning
Python
156
star
6

DiscCaptioning

Code for Discriminability objective for training descriptive captions(CVPR 2018)
Python
110
star
7

NeuralDialog-CVAE-pytorch

OpenEdge ABL
95
star
8

Faster-RCNN-Densecap-torch

Faster-RCNN based on Densecap(deprecated)
Jupyter Notebook
86
star
9

zsl-gcn-pth

zero-shot-gcn in pytorch
Python
72
star
10

neuraltalk2-tensorflow

Neuraltalk2 in tensorflow
Python
58
star
11

Context-aware-ZSR

Official code for paper Context-aware Zero-shot Recognition (https://arxiv.org/abs/1904.09320 to appear at AAAI2020)
Python
57
star
12

baipiao_jianying

η™½ε«–ε‰ͺζ˜ ηš„θ―­ιŸ³θ―†εˆ«οΌˆε­¦δΉ εˆ†δΊ«οΌ‰
Python
55
star
13

GoogleConceptualCaptioning

Python
53
star
14

pytorch-mobilenet-from-tf

Mobilenet model converted from tensorflow
Jupyter Notebook
49
star
15

bottom-up-attention-ai-challenger

Jupyter Notebook
38
star
16

lmdbdict

A simple wrapper for lmdb. Support dict-like operations.
Python
21
star
17

rtutils

Python
17
star
18

lazy_related_work

Python
14
star
19

refexp-comprehension

Referring expression comprehension on ReferIt(RefClef)
Lua
10
star
20

canada_us_visa_spotter

Python
7
star
21

ruotianluo.github.io

SCSS
4
star
22

play_with_jax

My attempt to learn jax.
2
star