MASTER-PyTorch
PyTorch reimplementation of "MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" (Pattern Recognition 2021). This project is different from our original implementation that builds on the privacy codebase FastOCR of the company. You can also find Tensorflow reimplementation at MASTER-TF repository, and the performance is almost identical. (PS. Logo inspired by the Master Oogway in Kung Fu Panda)
News
- 2021/07: MASTER-mmocr, reimplementation of MASTER by mmocr. @Jiaquan Ye
- 2021/07: TableMASTER-mmocr, 2nd solution of ICDAR 2021 Competition on Scientific Literature Parsing Task B based on MASTER. @Jiaquan Ye
- 2021/07: Talk can be found at here (Chinese).
- 2021/05: Savior, which aims to provide a simple, lightweight, fast integrated, pipelined deployment framework for RPA, is now integrated MASTER for captcha recognition. @Tao Luo
- 2021/04: Slides can be found at here.
Honors based on MASTER
- 1st place (2021/05) solution to ICDAR 2021 Competition on Scientific Table Image Recognition to LaTeX (Subtask I: Table structure reconstruction)
- 1st place (2021/05) solution to ICDAR 2021 Competition on Scientific Table Image Recognition to LaTeX (Subtask II: Table content reconstruction)
- 2nd place (2021/05) solution to ICDAR 2021 Competition on Scientific Literature Parsing Task B: Table recognition
- 1st place (2020/10) solution to ICDAR 2019 Robust Reading Challenge on Reading Chinese Text on Signboard (task2)
- 2nd and 5th places (2020/10) in The 5th China Innovation Challenge on Handwritten Mathematical Expression Recognition
- 4th place (2019/08) of ICDAR 2017 Robust Reading Challenge on COCO-Text (task2)
- More will be released
Contents
Introduction
MASTER is a self-attention based scene text recognizer that (1) not only encodes the input-output attention, but also learns self-attention which encodes feature-feature and target-target relationships inside the encoder and decoder and (2) learns a more powerful and robust intermediate representation to spatial distortion and (3) owns a better training and evaluation efficiency. Overall architecture shown follows.
Requirements
- python==3.6
- torchvision==0.6.1
- pandas==1.0.5
- torch==1.5.1
- numpy==1.16.4
- tqdm==4.47.0
- Distance==0.1.3
- Pillow==7.2.0
pip install -r requirements.txt
Usage
Prepare Datasets
- Prepare the correct format of files as provided in
data
folder.- Please see data/README.md an instruction how to prepare the data in required format for MASTER.
- Synthetic image datasets: SynthText (Synth800k), MJSynth (Synth90k), SynthAdd (password:627x)
- Real image datasets: IIIT5K, SVT, IC03, IC13, IC15, COCO-Text, SVTP, CUTE80_Cropped
- An example of cropping SynthText can be found at data_utils/crop_synthtext.py
- Modify
train_dataset
andval_dataset
args inconfig.json
file, includingtxt_file
,img_root
,img_w
,img_h
. - Modify
keys.txt
inutils/keys.txt
file if needed according to the vocabulary of your dataset. - Modify
STRING_MAX_LEN
inutils/label_util.py
file if needed according to the text length of your dataset.
Distributed training with config files
Modify the configurations in configs/config.json
and dist_train.sh
files, then run:
bash dist_train.sh
The application will be launched via launch.py
on a 4 GPU node with one process per GPU (recommend).
This is equivalent to
python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=4 \
--master_addr=127.0.0.1 --master_port=5555 \
train.py -c configs/config.json -d 0,1,2,3 --local_world_size 4
and is equivalent to specify indices of available GPUs by CUDA_VISIBLE_DEVICES
instead of -d
args
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=4 \
--master_addr=127.0.0.1 --master_port=5555 \
train.py -c configs/config.json --local_world_size 4
Similarly, it can be launched with a single process that spans all 4 GPUs (if node has 4 available GPUs) using (don't recommend):
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=1 \
--master_addr=127.0.0.1 --master_port=5555 \
train.py -c configs/config.json --local_world_size 1
Using Multiple Node
You can enable multi-node multi-GPU training by setting nnodes
and node_rank
args of the commandline line on every node.
e.g., 2 nodes 4 gpus run as follows
Node 1, ip: 192.168.0.10, then run on node 1 as follows
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nnodes=2 --node_rank=0 --nproc_per_node=4 \
--master_addr=192.168.0.10 --master_port=5555 \
train.py -c configs/config.json --local_world_size 4
Node 2, ip: 192.168.0.15, then run on node 2 as follows
CUDA_VISIBLE_DEVICES=2,4,6,7 python -m torch.distributed.launch --nnodes=2 --node_rank=1 --nproc_per_node=4 \
--master_addr=192.168.0.10 --master_port=5555 \
train.py -c configs/config.json --local_world_size 4
Debug mode on one GPU/CPU training with config files
This option of training mode can debug code without distributed way. -dist
must set to false
to
turn off distributed mode. -d
specify which one gpu will be used.
python train.py -c configs/config.json -d 1 -dist false
Resuming from checkpoints
You can resume from a previously saved checkpoint by:
python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=4 \
--master_addr=127.0.0.1 --master_port=5555 \
train.py -d 0,1,2,3 --local_world_size 4 --resume path/to/checkpoint
Finetune from checkpoints
You can finetune from a previously saved checkpoint by:
python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=4 \
--master_addr=127.0.0.1 --master_port=5555 \
train.py -d 0,1,2,3 --local_world_size 4 --resume path/to/checkpoint --finetune true
Testing from checkpoints
You can predict from a previously saved checkpoint by:
python test.py --checkpoint path/to/checkpoint --img_folder path/to/img_folder \
--width 160 --height 48 \
--output_folder path/to/output_folder \
--gpu 0 --batch_size 64
Note: width
and height
must be the same as the settings used during training.
Evaluation
Evaluate squence accuracy and edit distance accuracy:
python utils/calculate_metrics.py --predict-path predict_result.json --label-path label.txt
Note: label.txt
: multi-line, every line containing {"ImageFile":"ImageFileName.jpg", "Label":"TextLabelStr"}
Customization
Checkpoints
You can specify the name of the training session in config.json
files:
"name": "MASTER_Default",
"run_id": "example"
The checkpoints will be saved in save_dir/name/run_id_timestamp/checkpoint_epoch_n
, with timestamp in mmdd_HHMMSS format.
A copy of config.json
file will be saved in the same folder.
Note: checkpoints contain:
{
'arch': arch,
'epoch': epoch,
'model_state_dict': self.model.state_dict(),
'optimizer': self.optimizer.state_dict(),
'monitor_best': self.monitor_best,
'config': self.config
}
Tensorboard Visualization
This project supports Tensorboard visualization by using either torch.utils.tensorboard
or TensorboardX.
-
Install
If you are using pytorch 1.1 or higher, install tensorboard by 'pip install tensorboard>=1.14.0'.
Otherwise, you should install tensorboardx. Follow installation guide in TensorboardX.
-
Run training
Make sure that
tensorboard
option in the config file is turned on."tensorboard" : true
-
Open Tensorboard server
Type
tensorboard --logdir saved/log/
at the project root, then server will open athttp://localhost:6006
By default, values of loss will be logged. If you need more visualizations, use add_scalar('tag', data)
, add_image('tag', image)
, etc in the trainer._train_epoch
method.
add_something()
methods in this project are basically wrappers for those of tensorboardX.SummaryWriter
and torch.utils.tensorboard.SummaryWriter
modules.
Note: You don't have to specify current steps, since WriterTensorboard
class defined at logger/visualization.py
will track current steps.
TODO
- Memory-Cache based Inference
Citations
If you find MASTER useful please cite our paper:
@article{Lu2021MASTER,
title={{MASTER}: Multi-Aspect Non-local Network for Scene Text Recognition},
author={Ning Lu and Wenwen Yu and Xianbiao Qi and Yihao Chen and Ping Gong and Rong Xiao and Xiang Bai},
journal={Pattern Recognition},
year={2021}
}
License
This project is licensed under the MIT License. See LICENSE for more details.