TextBoxes++: A Single-Shot Oriented Scene Text Detector
Introduction
This is an application for scene text detection (TextBoxes++) and recognition (CRNN).
TextBoxes++ is a unified framework for oriented scene text detection with a single network. It is an extended work of TextBoxes. CRNN is an open-source text recognizer. The code of TextBoxes++ is based on SSD and TextBoxes. The code of CRNN is modified from CRNN.
For more details, please refer to our arXiv paper.
Citing the related works
Please cite the related works in your publications if it helps your research:
@article{Liao2018Text,
title = {{TextBoxes++}: A Single-Shot Oriented Scene Text Detector},
author = {Minghui Liao, Baoguang Shi and Xiang Bai},
journal = {{IEEE} Transactions on Image Processing},
doi = {10.1109/TIP.2018.2825107},
url = {https://doi.org/10.1109/TIP.2018.2825107},
volume = {27},
number = {8},
pages = {3676--3690},
year = {2018}
}
@inproceedings{LiaoSBWL17,
author = {Minghui Liao and
Baoguang Shi and
Xiang Bai and
Xinggang Wang and
Wenyu Liu},
title = {TextBoxes: {A} Fast Text Detector with a Single Deep Neural Network},
booktitle = {AAAI},
year = {2017}
}
@article{ShiBY17,
author = {Baoguang Shi and
Xiang Bai and
Cong Yao},
title = {An End-to-End Trainable Neural Network for Image-Based Sequence Recognition
and Its Application to Scene Text Recognition},
journal = {{IEEE} TPAMI},
volume = {39},
number = {11},
pages = {2298--2304},
year = {2017}
}
Contents
Requirements
NOTE There is partial support for a docker image. See docker/README.md
. (Thank you for the PR from @mdbenito)
Torch7 for CRNN;
g++-5; cuda8.0; cudnn V5.1 (cudnn 6 and cudnn 7 may fail); opencv3.0
Please refer to Caffe Installation to ensure other dependencies;
Installation
- compile TextBoxes++ (This is a modified version of caffe so you do not need to install the official caffe)
# Modify Makefile.config according to your Caffe installation.
cp Makefile.config.example Makefile.config
make -j8
# Make sure to include $CAFFE_ROOT/python to your PYTHONPATH.
make py
- compile CRNN (Please refer to CRNN if you have trouble with the compilation.)
cd crnn/src/
sh build_cpp.sh
Docker
(Thanks for the PR from @idotobi)
Build Docke Image
docker build -t tbpp_crnn:gpu .
This can take +1h, so go get a coffee ;)
Once this is done you can start a container via nvidia-docker
.
nvidia-docker run -it --rm tbpp_crnn:gpu bash
To check if the GPU is available inside the docker container you can run nvidia-smi
.
It's recommendable to mount the ./models
and ./crnn/model/
directories to include the downloaded models.
nvidia-docker run -it \
--rm \
-v ${PWD}/models:/opt/caffe/models \
-v ${PWD}/crrn/model:/opt/caffe/crrn/model \
tbpp_crnn:gpu bash
For convenince this command is executed when running ./run.bash
.
Models
-
pre-trained model on SynthText (used for training): Dropbox; BaiduYun
-
model trained on ICDAR 2015 Incidental Text (used for testing): Dropbox; BaiduYun
Please place the above models in "./models/"
If your data is hugely different from ICDAR 2015 Incidental Text,you'd better train it on your own data based on the pre-trained model on SynthText.
-
Please place the crnn model in "./crnn/model/"
Demo
Download the ICDAR 2015 model and place it in "./models/"
python examples/text/demo.py
The detection results and recognition results are in "./demo_images"
Train
Create lmdb data
-
convert ground truth into "xml" form: example.xml
-
create train/test lists (train.txt / test.txt) in "./data/text/" with the following form:
path_to_example1.jpg path_to_example1.xml path_to_example2.jpg path_to_example2.xml
-
Run "./data/text/creat_data.sh"
Start training
1. modify the lmdb path in modelConfig.py
2. Run "python examples/text/train.py"