KISS

Code for the paper KISS: Keeping it Simple for Scene Text Recognition.

This repository contains the code you can use in order to train a model based on our paper. You will also find instructions on how to access our model and also how to evaluate the model.

Pretrained Model

You can find the pretrained model here. Download the zip and put into any directory. We will refer to this directory as <model_dir>.

Prepare for using the Code

make sure you have at least Python 3.7 installed on your system
create a new virtual environment (or whatever you like to use)
install all requirements with pip install -r requirements.txt (if you do not have a CUDA capable device in your PC, you should remove the package cupy from the file requirements.txt).

Datasets

If you want to train your model on the same datasets, as we did, you'll need to get the train data first. Second, you can get the train annotation we used from here.

Image Data

You can find the image data for each dataset, using the following links:

MJSynth: https://www.robots.ox.ac.uk/~vgg/data/text/
SynthText: https://www.robots.ox.ac.uk/~vgg/data/scenetext/
SynthAdd: Follow instructions from here

Once, you've downloaded all the images, you can get the gt-files we've prepared for the MJSynth and SynthAdd datasets here.

For the SynthText dataset, you'll have to create them yourself. You can do so by following these steps:

Get the data and put it into a directory (lets assume we put the data into the directory /data/oxford)
run the script crop_words_from_oxford.py (you can find it in datasets/text_recognition) with the following command line parameters python crop_words_from_oxford.py /data/oxford/gt.mat /data/oxford_words.
This will crop all words based on their axis aligned bounding box from the original oxford gt.
Create train and validation split with the script create_train_val_splits.py. python create_train_val_splits.py /data/oxford_words/gt.json.
Run the script json_to_npz.py with the following command line: python json_to_npz /data/oxford_words/train.json ../../train_utils/char-map-bos.json. This will create a file called train.npz in the same directory as the file gt.json is currently located in.
Repeat the last step with the files validation.json.

Once you are done with this, you'll need to combine all npz files into one large npz file. You can use the combine_npz_datasets.py for this. Assume you saved the MJSynth dataset + npz file here /data/mjsynth and the SynthAdd dataset + npz file here /data/SynthAdd, then you'll need to run the script in the following way: python combine_npz_datasets.py /data/mjsynth/annotation_train.npz /data/oxford_words/train.npz /data/SynthAdd/gt.npz --destination /data/datasets_combined.npz.

Since the datasets may contain words that are longer than N characters (we always set N to 23), we need to get rid of all words that are longer than N characters. You can use the script filter_word_length.py for this. Use it like so: python filter_word_length.py 23 /data/datasets_combined.npz --npz. Do the same thing with the file validation.npz you obtained from splitting the SynthText dataset.

If you want to follow our experiments with the balanced dataset, you can create a balanced dataset with the script balance_dataset.py. For example: python balance_dataset.py /data/datasets_combined_filtered_23.npz datasets_combined_balanced_23.npz -m 200000. If you do not use the -m switch the script will show you dataset statistics and you can choose your own value.

Evaluation Data

In this ssection we explain, hou you can get the evaluation data + annotation. For getting the evaluation data you just need to do 2 steps per dataset:

Clone the repository.
Download the npz annotation file. And place it in the directory, where you cloned the git repository to.

Dataset	Git Repo	NPZ-Link	Note
ICDAR2013	https://github.com/ocr-algorithm-and-data/ICDAR2013	download	Rename the directory `test` to `Challenge2_Test_Task3_Images`
ICDAR2015	https://github.com/ocr-algorithm-and-data/ICDAR2015	download	Rename the dir `TestSet` to `ch4_test_word_images_gt`
CUTE80	https://github.com/ocr-algorithm-and-data/CUTE80	download	-
IIIT5K	https://github.com/ocr-algorithm-and-data/IIIT5K	download	-
SVT	https://github.com/ocr-algorithm-and-data/SVT	download	Remove all subdirs, but the dir `test_crop`. Rename this dir to `img`
SVTP	https://github.com/ocr-algorithm-and-data/SVT-Perspective	download	-

Training

Now you should be ready for training 🎉. You can use the script train_text_recognition.py, which is in the root-directory of this repo.

Before you can start your training, you'll need to adapt the config in config.cfg. Set the values following this list:

train_file: Set this to the file /data/datasets_combined_filtered_23.npz
val_file: Set this to /data/oxford_words.validation.npz
keys in TEST_DATASETS set those to the corresponding npz file you got here and setup in the last step.

You can now run the training with, e.g., python train_text_recognition.py <name for the log> -g 0 -l tests --image-mode RGB --rdr 0.95 This will start the training and create a new directlry with log entries in logs/tests. Get some coffee and sleep, because the training will take some time!

You can inspect the train progress with Tensorboard. Just start Tensorboard in the root directory like so: tensorboard --logir logs.

Evaluation

Once, you've trained a model or if you just downloaded the model we provided, you can run the evaluation script on it.

If you want to know how the model performes on all datasets, you can use the script run_eval_on_all_datasets.py. Lets assume you trained a model and logs/tests/train is the path to the log dir. Now, you can run the evaluation with this command: python run_eval_on_all_datasets.py config.cfg 0 -b 16 --snapshot-dir logs/tests/train. You can also render the predictions of the model for each evaluation image by making the following changes to the command: python run_eval_on_all_datasets.py config.cfg 0 -b 1 --snapshot-dir logs/tests/train --render. You will then find the results for each image in the directory logs/tests/train/eval_bboxes.

Questions?

Feel free to open an issue! You want to contribute? Just open a PR 😄!

License

This code is licensed under GPLv3, see the file LICENSE for more information.

Citation

If you find this code useful, please cite our paper:

@misc{bartz2019kiss,
    title={KISS: Keeping It Simple for Scene Text Recognition},
    author={Christian Bartz and Joseph Bethge and Haojin Yang and Christoph Meinel},
    year={2019},
    eprint={1911.08400},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

Bartzi/kiss

Bartzi

Reviews

Repository Details