• Stars
    star
    151
  • Rank 246,057 (Top 5 %)
  • Language
  • Created over 5 years ago
  • Updated over 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

End-to-End Scene Text Detection and Recognition System Resources

Author: Canjie Luo, Chongyu Liu


1. Datasets

1.1 Introduction

  • SVT [15]๏ผš

    • Introduction: There are 100 training images and 250 testing images downloaded from Google Street View of road-side scenes. The labelled text can be very challenging with a wide variety of fonts, orientations, and lighting conditions. A lexicon containing 50 words (SVT-50) is also provided for each image.
    • Link: SVT-download
  • ICDAR 2003(IC03) [16]๏ผš

    • Introduction: The dataset contains a varied array of photos of the world that contain scene text. There are 251 testing images with 50 word lexicons (IC03-50) and a lexicon of all test groundtruth words (IC03-Full).
    • Link: IC03-download
  • ICDAR 2011(IC11) [17] :

    • Introduction: The dataset is an extension to the dataset used for the text locating competitions of ICDAR 2003.It includes 485 natural images in total.
    • Link: IC11-download
  • ICDAR 2013(IC13) [18]๏ผš

    • Introduction: The dataset consists of 229 training images and 233 testing images. Most text are horizontal. Three speci๏ฌc lexicons are provided, named as โ€œStrong(S)โ€, โ€œWeak(W)โ€ and โ€œGeneric(G)โ€. โ€œStrong(S)โ€ lexicon provides 100 words per-image including all words that appear in the image. โ€œWeak(W)โ€ lexicon includes all words that appear in the entire test set. And โ€œGeneric(G)โ€ lexicon is a 90k word vocabulary.
    • Link: IC13-download
  • ICDAR 2015(IC15) [19]๏ผš

    • Introduction: The dataset includes 1000 training images and 500 testing images captured by Google glasses. The text in the scene is in arbitrary orientations. Similar to ICDAR 2013, it also provides โ€œStrong(S)โ€, โ€œWeak(W)โ€ and โ€œGeneric(G)โ€ lexicons.
    • Link: IC15-download
  • Total-Text [20]๏ผš

    • Introduction: Except for the horizontal text and oriented text, Total-Text also consists of a lot of curved text. Total-Text contains 1255 training images and 300 test images. All images are annotated with polygons and transcriptions in word-level. A โ€œFullโ€ lexicon contains all words in test set is provided.
    • Link: Total-Text-download

1.2 Comparison of Datasets

Comparison of Datasets
Datasets Language Image Text instance Text Shape Annotation level Lexicon
Total Train Test Total Train Test Horizontal Arbitrary-Quadrilateral Multi-oriented Char Word Text-Line 50 1k Full None
IC03 English 509 258 251 2266 1110 1156 โœ“ โœ• โœ• โœ• โœ“ โœ• โœ“ โœ“ โœ“ โœ•
IC11 English 484 229 255 1564 ๏ฝž ๏ฝž โœ“ โœ• โœ• โœ“ โœ“ โœ• โœ• โœ• โœ• โœ“
IC13 English 462 229 233 1944 849 1095 โœ“ โœ• โœ• โœ“ โœ“ โœ• โœ• โœ• โœ• โœ“
SVT English 350 100 250 725 211 514 โœ“ โœ“ โœ• โœ“ โœ“ โœ• โœ“ โœ• โœ• โœ•
SVT-P English 238 ๏ฝž ๏ฝž 639 ๏ฝž ๏ฝž โœ“ โœ“ โœ• โœ• โœ“ โœ• โœ“ โœ• โœ“ โœ•
IC15 English 1500 1000 500 17548 122318 5230 โœ“ โœ“ โœ• โœ• โœ“ โœ• โœ• โœ• โœ• โœ“
Total-Text English 1525 1225 300 9330 ๏ฝž ๏ฝž โœ“ โœ“ โœ“ โœ• โœ“ โœ“ โœ• โœ• โœ• โœ“

2. Summary of End-to-end Scene Text Detection and Recognition Methods

2.1 Comparison of methods

Methodย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย  Modelย ย ย ย  Code Detectionย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย  Recognitionย ย ย ย ย ย ย ย ย ย ย ย  Source Time Highlightย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย 
Wang et al. [1]
โœ• Sliding windows and Random Ferns Pictorial Structures ICCV 2011 Word Re-scoring for NMS
Wang et al. [2]
โœ• CNN-based Sliding windows for classification ICPR 2012 CNN architecture
Jaderberg et al. [3]
โœ• CNN-based and saliency maps CNN classifier ECCV 2014 Data mining and annotation
Alsharif et al. [4]
โœ• CNN and hybrid HMM maxout models Segmentation-based ICLR 2014 Hybrid HMM maxout models
Yao et al. [5]
โœ• Random Forest Component Linking and Word Partition TIP 2014 (1) Detection and recognition features sharing. (2) Oriented-text. (3) A new dictionary search method
Neumann et al. [6]
โœ• Extremal Regions Clustering algorithm to group characters TPAMI 2015 Real-time performance(1.6s/image)
Jaderberg et al. [7]
โœ• Region proposal mechanism Word-level classification IJCV 2016 Trained only on data produced by a synthetic text generation engine, requiring no human labelled data
Liao et al. [8] TextBoxes โœ“ SSD-based framework CRNN AAAI 2017 An end-to-end trainable fast scene text detector
Bลญsta et al. [9] Deep TextSpotter โœ• Yolo v2 CTC ICCV 2017 Yolov2 + RPN, RNN + CTC. It is the first end-to-end trainable detection and recognition system with high speed.
Li et al. [10]
โœ• Text Proposal Network Attention ICCV 2017 TPN + RNN encoder + attention-based RNN
Sun et al. [22] TextNet โœ• Scale-aware attention backbone and Perspective RoI Transform Attention ACCV 2018 Perspective RoI Transform for Irregular text recognition
Lyu et al. [11] Mask TextSpotter โœ“ Fast R-CNN with mask branch Character segmentation ECCV 2018 Precise text detection and recognition are acquired via semantic segmentation
He et al. [12]
โœ“ Text-Alignment Layer Attention CVPR 2018 Character attention mechanism: use character spatial information as explicit supervision
Liu et al. [13] FOTS โœ“ EAST with RoIRotate CTC CVPR 2018 Little computation overhead compared to baseline text detection network (22.6fps)
Liao et al. [14] TextBoxes++ โœ“ SSD-based framework CRNN TIP 2018 Journal version of TextBoxes (multi-oriented scene text support)
Liao et al. [15] Mask TextSpotter โœ“ Mask R-CNN Character segmentation + Spatial Attention Module TPAMI 2019 Journal version of Mask TextSpotter(proposes Spatial Attention Module)
Xing et al. [23] CharNet โœ“ A character branch and a detection branch Character level ICCV 2019 Utilizing a character as basic element to overcome the main difficulty of joint optimization of text detection and RNN-based recognition
Feng et al. [24] TextDragon โœ• Local box regression, center line segmentation and RoI Sliding CTC ICCV 2019 A new differentiable operator named RoISlide connect arbitrary shaped text detection and recognition
Qin et al. [25]
โœ• Mask R-CNN with RoI masking Attention ICCV 2019 A simple yet effective RoI masking step to extract useful irregularly shaped text instance features
Qiao et al. [26] Text Perceptron โœ• Mask R-CNN with Order-aware Semantic Segmentation and Boundary Regressions Attention AAAI 2020 A novel Shape Transform Module to transform the feature regions into regular morphologies
Wang et al. [27]
โœ• Oriented Rectangular Box Detector and Boundary Point Detector Attention AAAI 2020 A set of points on the boundary of each text instance represents arbitrary shapes
Liu et al. [28] ABCNet โœ“ Bezier Curve Detection and BezierAlign CTC CVPR 2020 10 times faster than re-cent state-of-the-art methods with a competitive scene text spotting accuracy

2.2 End-to-end scene text detection and recognition results

ย ย ย ย ย ย Methodย ย ย ย ย ย ย  Model Source Time SVT SVT-50 IC03 IC11 IC13 IC15 Total-text
End-to-end Spotting End-to-end Spotting None Full None Full
50 Full None S W G S W G S W G S W G
Wang et al. [1]
ICCV 2011 ~ ~ 51
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
Wang et al. [2]
ICPR 2012 46 ~ 72 67 ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
Jaderberg et al. [3]
ECCV 2014 ~ 56 80 75 ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
Alsharif et al. [4]
ICLR 2014 ~ 48 77 70 ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
Yao et al. [5]
TIP 2014 ~ ~ ~ ~
48.6
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
Neumann et al. [6]
TPAMI 2015
68.1 ~ ~ ~ ~ 45.2 ~ ~ ~ ~ ~ 35 19.9 15.6 35 19.9 15.6 ~ ~ ~ ~ ~
Jaderberg et al. [7]
IJCV 2016 53 76 90 86 78 76 76 ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
Liao et al. [8] TextBoxes AAAI 2017 64 84 ~ ~ ~ 87 91 89 84 94 92 87 ~ ~ ~ ~ ~ ~ 36.3 48.9 ~ ~ ~
Bลญsta et al. [9] Deep TextSpotter ICCV 2017 ~ ~ ~ ~ ~ ~ 89 86 77 92 89 81 54 51 47 58 53 51 ~ ~ ~ ~ 21.85
Li et al. [10]
ICCV 2017 66.18 84.91 ~ ~ ~ 87.7 ~ ~ ~ ~ ~ ~ 91.08 89.8 84.6 94.2 92.4 88.2 ~ ~ ~ ~ ~
Sun et al. [22] TextNet ACCV 2018 ~ ~ ~ ~ ~ ~ 89.77 88.80 82.96 94.59 93.48 86.99 78.66 74.9 60.45 82.38 78.43 62.36 54.02 ~ ~ ~ ~
Lyu et al. [11] Mask TextSpotter ECCV 2018 ~ ~ ~ ~ ~ ~ 92.2 91.1 86.5 92.5 92 88.2 79.3 73 62.4 79.3 74.5 64.2 52.9 71.8 ~ ~ ~
He et al. [12]
CVPR 2018 ~ ~ ~ ~ ~ ~ 91 89 86 93 92 87 82 77 63 85 80 65 ~ ~ ~ ~ ~
Liu et al. [13] FOTS CVPR 2018 ~ ~ ~ ~ ~ ~ 91.99 90.11 84.77 95.94 93.9 87.76 83.55 79.11 65.33 87.01 82.39 67.97 ~ ~ ~ ~ ~
Liao et al. [14] TextBoxes++ TIP 2018 64 84 ~ ~ ~ ~ 93 92 85 96 95 87 73.3 65.9 51.9 76.5 69 54.4 ~ ~ ~ ~ ~
Liao et al. [15] Mask TextSpotter TPAMI 2019 ~ ~ ~ ~ ~ ~ 93.3 91.3 88.2 92.7 91.7 87.7 83 77.7 73.5 82.4 78.1 73.6 65.3 77.4 ~ ~ ~
Xing et al. [23] CharNet ICCV 2019 ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ 85.05 81.25 71.08 ~ ~ ~ 69.2 ~ ~ ~ ~
Feng et al. [24] TextDragon ICCV 2019 ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ 82.54 78.34 65.15 86.22 81.62 68.03 48.8 74.8 39.7 72.4 ~
Qin et al. [25]
ICCV 2019 ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ 85.51 81.91 69.94 ~ ~ ~ 70.7 ~ ~ ~ ~
Qiao et al. [26] Text Perceptron AAAI 2020 ~ ~ ~ ~ ~ ~ 91.4 90.7 85.8 94.9 94 88.5 80.5 76.6 65.1 84.1 79.4 67.9 69.7 78.3 57 ~ ~
Wang et al. [27]
AAAI 2020 ~ ~ ~ ~ ~ ~ 88.2 87.7 84.1 ~ ~ ~ 79.7 75.2 64.1 ~ ~ ~ 65 76.1 ~ ~ 41.3
Liu et al. [28] ABCNet CVPR 2020 ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ 69.5 78.4 45.2 74.1 ~

3. Survey

[A] [TPAMI-2015] Ye Q, Doermann D. Text detection and recognition in imagery: A survey[J]. IEEE transactions on pattern analysis and machine intelligence, 2015, 37(7): 1480-1500. paper

[B] [Frontiers-Comput. Sci-2016] Zhu Y, Yao C, Bai X. Scene text detection and recognition: Recent advances and future trends[J]. Frontiers of Computer Science, 2016, 10(1): 19-36. paper

[C] [arXiv-2018] Long S, He X, Ya C. Scene Text Detection and Recognition: The Deep Learning Era[J]. arXiv preprint arXiv:1811.04256, 2018. paper

4. OCR Service

OCR API Free
Tesseract OCR Engine ร— โˆš
Azure โˆš โˆš
ABBYY โˆš โˆš
OCR Space โˆš โˆš
SODA PDF OCR โˆš โˆš
Free Online OCR โˆš โˆš
Online OCR โˆš โˆš
Super Tools โˆš โˆš
Online Chinese Recognition โˆš โˆš
Calamari OCR ร— โˆš
Tencent OCR โˆš ร—

5. References and codes

  • [1] Wang K, Babenko B, Belongie S. End-to-end scene text recognition[C].2011 International Conference on Computer Vision. IEEE, 2011: 1457-1464. paper

  • [2] Wang T, Wu D J, Coates A, et al. End-to-end text recognition with convolutional neural networks[C]. Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012). IEEE, 2012: 3304-3308. paper

  • [3] Jaderberg M, Vedaldi A, Zisserman A. Deep features for text spotting[C]. European conference on computer vision. Springer, Cham, 2014: 512-528. paper

  • [4] Alsharif O, Pineau J. End-to-End Text Recognition with Hybrid HMM Maxout Models[C]. In ICLR 2014. paper

  • [5] Yao C, Bai X, Liu W. A unified framework for multioriented text detection and recognition[J]. IEEE Transactions on Image Processing, 2014, 23(11): 4737-4749. paper

  • [6] Neumann L, Matas J. Real-time lexicon-free scene text localization and recognition[J]. IEEE transactions on pattern analysis and machine intelligence, 2015, 38(9): 1872-1885. paper

  • [7] Jaderberg M, Simonyan K, Vedaldi A, et al. Reading text in the wild with convolutional neural networks[J]. International Journal of Computer Vision, 2016, 116(1): 1-20. paper

  • [8] Liao M, Shi B, Bai X, et al. Textboxes: A fast text detector with a single deep neural network[C]. In AAAI 2017. paper code

  • [9] Busta M, Neumann L, Matas J. Deep textspotter: An end-to-end trainable scene text localization and recognition framework[C]. Proceedings of the IEEE International Conference on Computer Vision. 2017: 2204-2212. paper

  • [10] Li H, Wang P, Shen C. Towards end-to-end text spotting with convolutional recurrent neural networks[C]. Proceedings of the IEEE International Conference on Computer Vision. 2017: 5238-5246. paper

  • [11] Lyu P, Liao M, Yao C, et al. Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes[C]. Proceedings of the European Conference on Computer Vision (ECCV). 2018: 67-83. paper code

  • [12] He T, Tian Z, Huang W, et al. An end-to-end textspotter with explicit alignment and attention[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 5020-5029. paper code

  • [13] Liu X, Liang D, Yan S, et al. FOTS: Fast oriented text spotting with a unified network[C]. Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 5676-5685. paper code

  • [14] Liao M, Shi B, Bai X. Textboxes++: A single-shot oriented scene text detector[J]. IEEE transactions on image processing, 2018, 27(8): 3676-3690. paper code

  • [15] Minghui Liao, Pengyuan Lyu, Minghang He. Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes[J]. IEEE transactions on pattern analysis and machine intelligence, 2019. paper code

  • [16] Wang,Kai, and S. Belongie. Word Spotting in the Wild. European Conference on Computer Vision(ECCV), 2010: 591-604. Paper

  • [17] S. M. Lucas, A. Panaretos, L. Sosa, A. Tang, S. Wong, R. Young,K. Ashida, H. Nagai, M. Okamoto, H. Yamamoto, H. Miyao,J. Zhu, W. Ou, C. Wolf, J. Jolion, L. Todoran, M. Worring, and X. Lin. ICDAR 2003 robust reading competitions:entries, results,and future directions. IJDAR, 7(2-3):105โ€“122, 2005. paper

  • [18] Shahab, A, Shafait, F, Dengel, A: ICDAR 2011 robust reading competition challenge 2: Reading text in scene images. In: ICDAR, 2011. Paper

  • [19] D. Karatzas, F. Shafait, S. Uchida, et al. ICDAR 2013 robust reading competition. In ICDAR, 2013. Paper

  • [20] D. Karatzas, L. Gomez-Bigorda, A. Nicolaou, S. K. Ghosh, A. D.Bagdanov, M. Iwamura, J. Matas, L. Neumann, V. R. Chandrasekhar, S. Lu, F. Shafait, S. Uchida, and E. Valveny. ICDAR 2015 competition on robust reading. In ICDAR, pages 1156โ€“1160, 2015. Paper

  • [21] Chee C K, Chan C S. Total-text: A comprehensive dataset for scene text detection and recognition.Document Analysis and Recognition (ICDAR), 2017 14th IAPR International Conference on. IEEE, 2017, 1: 935-942.Paper

  • [22] Y. Sun, C. Zhang, Z. Huang, J. Liu, J. Han, and E. Ding, TextNet: Irregular Text Reading from Images with an End-to-End Trainable Network, Asian Conference on Computer Vision (ACCV), Cham, 2018, vol. 11363, no. 1, pp. 83โ€“99.Paper

  • [23] Xing L, Tian Z, Huang W, Convolutional character networks.In ICCV, 2019.Paper code

  • [24] Feng W, He W, Yin F, et al. TextDragon: An End-to-End Framework for Arbitrary Shaped Text Spotting.In ICCV, 2019.Paper

  • [25] Qin S, Bissacco A, Raptis M, et al. Towards unconstrained end-to-end text spotting.In ICCV, 2019.Paper

  • [26] Qiao L, Tang S, Cheng Z, et al. Text Perceptron: Towards End-to-End Arbitrary-Shaped Text Spotting.In AAAI 2020.Paper

  • [27] Wang H, Lu P, Zhang H, et al. All You Need Is Boundary: Toward Arbitrary-Shaped Text Spotting. In AAAI 2020.Paper

  • [28] Liu Y, Chen H, Shen C, et al. ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network In CVPR, 2020.Paper code

If you find any problems in our resources, or any good papers/codes we have missed, please inform us at [email protected]. Thank you for your contribution.

Copyright

Copyright ยฉ 2019 SCUT-DLVC. All Rights Reserved.

Sample

More Repositories

1

SCUT-FBP5500-Database-Release

A diverse benchmark database for multi-paradigm facial beauty prediction
Python
731
star
2

Scene-Text-Recognition

603
star
3

Scene-Text-Detection

528
star
4

SCUT-HEAD-Dataset-Release

SCUT HEAD is a large-scale head detection dataset, including 4405 images labeld with 111251 heads.
461
star
5

Scene-Text-Recognition-Recommendations

Papers, Datasets, Algorithms, SOTA for STR. Long-time Maintaining
Python
313
star
6

DeRPN

A novel region proposal network for more general object detection ( including scene text detection ).
Python
155
star
7

Scene-Text-Removal

EnsNet: Ensconce Text in the Wild
Python
123
star
8

SCUT-EPT_Dataset_Release

The SCUT-EPT Dataset for the research of offline handwritten Chinese text recognition (HCTR) in educational documents has been released.
109
star
9

M6Doc

103
star
10

EPHOIE

101
star
11

SCUT-HCCDoc_Dataset_Release

76
star
12

Forward-Implementation-of-Fast-and-Compact-CNN-for-Offline-HCCR

C++
69
star
13

TKH_MTH_Datasets_Release

The Tripitaka Koreana in Han (TKH) Dataset and the Multiple Tripitaka in Han (MTH) Dataset for the research of Chinese character detection and recognition in historical documents.
60
star
14

SCUT-EnsText

53
star
15

MTHv2_Datasets_Release

50
star
16

MSDS

The official GitHub page of the MSDS dataset.
43
star
17

LAST

Read Ten Lines at One Glance: Line-Aware Semi-Autoregressive Transformer for Multi-Line Handwritten Mathematical Expression Recognition
Python
22
star
18

SCUT_FORU_DB_Release

Flickr OCR Universal Database (SCUT_FORU_DB_Release)
22
star
19

M5HisDoc

21
star
20

Water-Meter-Number-DataSet

The water-meter images are captured by camera and labeled with water-meter number, for the research of the water-meter image recognition.
17
star
21

SCUT-CAB_Dataset_Release

14
star
22

IME_Test

This project can be used to test the recognition rate of Chinese handwriting input method.
Java
7
star
23

EvaluateHandWritingAccuracy

This project can be used to test the recognition rate of Chinese handwriting input method.
Java
4
star
24

IFN_DropRegion_Data

3
star
25

PS_OLHCCR_tmep

2
star
26

DZJ_AnnotationTool

JavaScript
1
star