Transformer-based Scene Text Recognition (Transformer-STR)
- PyTorch implementation of my new method for Scene Text Recognition (STR) based on Transformer.
I adapted the four-stage STR framework devised by deep-text-recognition-benchmark, and replaced the Pred.
stage with Transformer.
Equipped with Transformer, this method outperforms the best model of the aforementioned deep-text-recognition-benchmark by 7.6% on CUTE80.
here
Download pretrained weights fromThis pre-trained weights trained on Synthetic dataset for about 700K iters.
Git clone this repo and download the weights file, move it to checkpoints
directory.
here(provided by deep-text-recognition-benchmark)
Download lmdb dataset for traininig and evaluation fromdata_lmdb_release.zip contains below.
training datasets : MJSynth (MJ)[1] and SynthText (ST)[2]
validation datasets : the union of the training sets IC13[3], IC15[4], IIIT[5], and SVT[6].
evaluation datasets : benchmark evaluation datasets, consist of IIIT[5], SVT[6], IC03[7], IC13[3], IC15[4], SVTP[8], and CUTE[9].
Training
Please configure your data_dir
in config.py
file, then run:
python tools/train.py
Evaluation on CUTE80
The Transformer-base STR achieves 0.815972 accuracy on CUTE80, outperforming the best model of deep-text-recognition-benchmark, which is 0.74
If you want to reproduce the evaluation result, please run:
python evaluation.py
Make sure your cute80_dir
and saved_model
path is correct. you'll get the result 0.815972
Contact
Feel free to contact me ([email protected]).
License
This project is released under the Apache 2.0 license.