• Stars
    star
    110
  • Rank 316,770 (Top 7 %)
  • Language
    Python
  • License
    GNU General Publi...
  • Created almost 5 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Code for the paper "KISS: Keeping it Simple for Scene Text Recognition"

KISS

Code for the paper KISS: Keeping it Simple for Scene Text Recognition.

This repository contains the code you can use in order to train a model based on our paper. You will also find instructions on how to access our model and also how to evaluate the model.

Pretrained Model

You can find the pretrained model here. Download the zip and put into any directory. We will refer to this directory as <model_dir>.

Prepare for using the Code

  • make sure you have at least Python 3.7 installed on your system
  • create a new virtual environment (or whatever you like to use)
  • install all requirements with pip install -r requirements.txt (if you do not have a CUDA capable device in your PC, you should remove the package cupy from the file requirements.txt).

Datasets

If you want to train your model on the same datasets, as we did, you'll need to get the train data first. Second, you can get the train annotation we used from here.

Image Data

You can find the image data for each dataset, using the following links:

Once, you've downloaded all the images, you can get the gt-files we've prepared for the MJSynth and SynthAdd datasets here.

For the SynthText dataset, you'll have to create them yourself. You can do so by following these steps:

  1. Get the data and put it into a directory (lets assume we put the data into the directory /data/oxford)
  2. run the script crop_words_from_oxford.py (you can find it in datasets/text_recognition) with the following command line parameters python crop_words_from_oxford.py /data/oxford/gt.mat /data/oxford_words.
  3. This will crop all words based on their axis aligned bounding box from the original oxford gt.
  4. Create train and validation split with the script create_train_val_splits.py. python create_train_val_splits.py /data/oxford_words/gt.json.
  5. Run the script json_to_npz.py with the following command line: python json_to_npz /data/oxford_words/train.json ../../train_utils/char-map-bos.json. This will create a file called train.npz in the same directory as the file gt.json is currently located in.
  6. Repeat the last step with the files validation.json.

Once you are done with this, you'll need to combine all npz files into one large npz file. You can use the combine_npz_datasets.py for this. Assume you saved the MJSynth dataset + npz file here /data/mjsynth and the SynthAdd dataset + npz file here /data/SynthAdd, then you'll need to run the script in the following way: python combine_npz_datasets.py /data/mjsynth/annotation_train.npz /data/oxford_words/train.npz /data/SynthAdd/gt.npz --destination /data/datasets_combined.npz.

Since the datasets may contain words that are longer than N characters (we always set N to 23), we need to get rid of all words that are longer than N characters. You can use the script filter_word_length.py for this. Use it like so: python filter_word_length.py 23 /data/datasets_combined.npz --npz. Do the same thing with the file validation.npz you obtained from splitting the SynthText dataset.

If you want to follow our experiments with the balanced dataset, you can create a balanced dataset with the script balance_dataset.py. For example: python balance_dataset.py /data/datasets_combined_filtered_23.npz datasets_combined_balanced_23.npz -m 200000. If you do not use the -m switch the script will show you dataset statistics and you can choose your own value.

Evaluation Data

In this ssection we explain, hou you can get the evaluation data + annotation. For getting the evaluation data you just need to do 2 steps per dataset:

  1. Clone the repository.
  2. Download the npz annotation file. And place it in the directory, where you cloned the git repository to.
Dataset Git Repo NPZ-Link Note
ICDAR2013 https://github.com/ocr-algorithm-and-data/ICDAR2013 download Rename the directory test to Challenge2_Test_Task3_Images
ICDAR2015 https://github.com/ocr-algorithm-and-data/ICDAR2015 download Rename the dir TestSet to ch4_test_word_images_gt
CUTE80 https://github.com/ocr-algorithm-and-data/CUTE80 download -
IIIT5K https://github.com/ocr-algorithm-and-data/IIIT5K download -
SVT https://github.com/ocr-algorithm-and-data/SVT download Remove all subdirs, but the dir test_crop. Rename this dir to img
SVTP https://github.com/ocr-algorithm-and-data/SVT-Perspective download -

Training

Now you should be ready for training ๐ŸŽ‰. You can use the script train_text_recognition.py, which is in the root-directory of this repo.

Before you can start your training, you'll need to adapt the config in config.cfg. Set the values following this list:

  • train_file: Set this to the file /data/datasets_combined_filtered_23.npz
  • val_file: Set this to /data/oxford_words.validation.npz
  • keys in TEST_DATASETS set those to the corresponding npz file you got here and setup in the last step.

You can now run the training with, e.g., python train_text_recognition.py <name for the log> -g 0 -l tests --image-mode RGB --rdr 0.95 This will start the training and create a new directlry with log entries in logs/tests. Get some coffee and sleep, because the training will take some time!

You can inspect the train progress with Tensorboard. Just start Tensorboard in the root directory like so: tensorboard --logir logs.

Evaluation

Once, you've trained a model or if you just downloaded the model we provided, you can run the evaluation script on it.

If you want to know how the model performes on all datasets, you can use the script run_eval_on_all_datasets.py. Lets assume you trained a model and logs/tests/train is the path to the log dir. Now, you can run the evaluation with this command: python run_eval_on_all_datasets.py config.cfg 0 -b 16 --snapshot-dir logs/tests/train. You can also render the predictions of the model for each evaluation image by making the following changes to the command: python run_eval_on_all_datasets.py config.cfg 0 -b 1 --snapshot-dir logs/tests/train --render. You will then find the results for each image in the directory logs/tests/train/eval_bboxes.

Questions?

Feel free to open an issue! You want to contribute? Just open a PR ๐Ÿ˜„!

License

This code is licensed under GPLv3, see the file LICENSE for more information.

Citation

If you find this code useful, please cite our paper:

@misc{bartz2019kiss,
    title={KISS: Keeping It Simple for Scene Text Recognition},
    author={Christian Bartz and Joseph Bethge and Haojin Yang and Christoph Meinel},
    year={2019},
    eprint={1911.08400},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

More Repositories

1

see

Code for the AAAI 2018 publication "SEE: Towards Semi-Supervised End-to-End Scene Text Recognition"
Python
573
star
2

stn-ocr

Code for the paper STN-OCR: A single Neural Network for Text Detection and Text Recognition
Python
500
star
3

one-model-to-reconstruct-them-all

Code for our Paper "One Model to Reconstruct Them All: A Novel Way to Use the Stochastic Noise in StyleGAN"
Python
72
star
4

loans

Code for the Paper "LoANs: Weakly Supervised Object Detection with Localizer Assessor Networks"
Python
15
star
5

LabShare

Django Tool that helps everyone to get their fair share of GPU time
Python
11
star
6

visual-backprop-mxnet

Implementation of Visual Backprop for MXNet
Python
8
star
7

synthesis-in-style

Code for our Paper "Synthesis in Style: Semantic Segmentation of Historical Documents using Synthetic Data"
Python
7
star
8

handwriting-determination

Code for the Paper Synthetic Data for the Analysis of Archival Documents: Handwriting Determination
Python
3
star
9

deepdream-mxnet

An implementation of deepdream for mxnet
Python
3
star
10

VibeShare

A tool to share your music in a local network
Python
2
star
11

pytorch-training

Training Helpers for PyTorch that allow writing Object Oriented Training Loops
Python
2
star
12

eink_frame

Code for displaying information on an eink display that looks like a picture frame
C++
1
star
13

db-bullshit-bingo

Python
1
star
14

multi-route

Tool that lets you find a route using public transport and your bike
JavaScript
1
star
15

Matelight

Code for running your very own Matelight! (Under Construction)
Python
1
star
16

badge19-partymode

Partymode for your badge!!1elf!
Python
1
star
17

chainer-transformer

Transformer Implementation for Chainer
Python
1
star