• Stars
    star
    155
  • Rank 239,446 (Top 5 %)
  • Language
    Python
  • License
    MIT License
  • Created almost 2 years ago
  • Updated 11 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

[ICCV 2023] Code base for Revisiting Scene Text Recognition: A Data Perspective

Revisiting Scene Text Recognition: A Data Perspective

Union14M is a large scene text recognition (STR) dataset collected from 17 publicly available datasets, which contains 4M of labeled data (Union14M-L) and 10M of unlabeled data (Union14M-U), intended to provide a more profound analysis for the STR community

arXiv preprint Gradio demo Open In Colab

Introduction Download MAERec

What's New

1. Introduction

  • Scene Text Recognition (STR) is a fundamental task in computer vision, which aims to recognize the text in natural images. STR has been developed rapidly in recent years, and recent state-of-the-arts have shown a trend of accuracy saturation on six commonly used benchmarks (IC13, IC15, SVT, IIIT5K, SVTP, CUTE80). This is a promising result, but it also raises a question: Are we done with STR? Or it's just the lack of challenges in current benchmarks that cover the drawbacks of existing methods in read-world scenarios.
  • To explore the challenges that STR models still face, we consolidate a large-scale STR dataset for analysis and identified seven open challenges. Furthermore, we propose a challenge-driven benchmark to facilitate the future development of STR. Additionally, we reveal that the utilization of massive unlabeled data through self-supervised pre-training can remarkably enhance the performance of the STR model in real-world scenarios, suggesting a practical solution for STR from a data perspective. We hope this work can spark future research beyond the realm of existing data paradigms.

2. Contents

3. Union14M Dataset

3.1. Union14M-L

  • Union14M-L contains 4M images collected from 14 public available datasets. See Source Datasets for the details of the 14 datasets. We adopt serval strategies to refine the naive concatation of the 14 datasaets, including:
    • Cropping: We use minimal axis-aligned bounding box to crop the images.
    • De-duplicate: Some datasets contains duplicate images, we remove them.
  • We also categorize the images in Union14M-L into five difficulty levels using an error voting method.

3.2. Union14M-U

  • The optimal solution to improve the performance of STR in real-world scenarios is to utilize more data for training. However, labeling text images is both costly and time-intensive, given that it involves annotating sequences and needs specialized language expertise. Therefore, it would be desirable to investigate the potential of utilizing unlabeled data via self-supervised learning for STR. To this end we collect 10M unlabeled images from 3 large datasets with an IoU Voting method

3.3. Union14M-Benchmark

  • We raise seven open challenges for STR in real-world scenarios, and propose a challenge-driven benchmark to facilitate the future development.

3.4. Download

Datasets One Drive Baidu Yun
Union14M-L & Union14M-Benchmark (12GB) One Drive Baidu Yun
Union14M-U (36.63GB) One Drive Baidu Yun
6 Common Benchmarks (17.6MB) One Drive Baidu Yun
  • The Structure of Union14M will be organized as follows:

    Structure of Union14M-L & Union14M-Benchmark
    |--Union14M-L
      |--full_images
        |--art_curve # Images collected from the 14 datasets
        |--art_scene
        |--COCOTextV2
        |--...
      |--train_annos
        |--mmocr-0.x # annotation in mmocr0.x format
          |--train_challenging.jsonl # challenging subset
          |--train_easy.jsonl # easy subset
          |--train_hard.jsonl # hard subset
          |--train_medium.jsonl # medium subset
          |--train_normal.jsonl # normal subset
          |--val_annos.jsonl # validation subset
        |--mmocr1.0.x # annotation in mmocr1.0 format
          |--...
      |--Union14M-Benchmarks
        |--artistic
          |--imgs
          |--annotation.json # annotation in mmocr1.0 format
          |--annotation.jsonl # annotation in mmocr0.x format
        |--...
    
    Structure of Union14M-U

    We store images in LMDB format, and the structure of Union14M-U will be organized as belows.

    |--Union14M-U
      |--book32_lmdb
      |--cc_lmdb
      |--openvino_lmdb
    

4. STR Models trained on Union14M-L

  • We train serval STR models on Union14M-L using MMOCR-1.0

4.1. Checkpoints

  • Evaluated on both common benchmarks and Union14M-Benchmark. Accuracy (WAICS) in $\color{grey}{grey}$ are original implementation (Trained on synthtic datasest), and accuracay in $\color{green}{green}$ are trained on Union14M-L. All the re-trained models are trained to predict upper & lower text, symbols and space.

    Models Checkpoint IIIT5K SVT IC13-1015 IC15-2077 SVTP CUTE80 Avg.
    ASTER GoogleDrive / BaiduYun / OneDrive $\color{grey}{93.57}$ \ $\color{green}{94.37}$ $\color{grey}{89.49}$ \ $\color{green}{89.03}$ $\color{grey}{92.81}$ \ $\color{green}{93.60}$ $\color{grey}{76.65}$ \ $\color{green}{78.57}$ $\color{grey}{80.62}$ \ $\color{green}{80.93}$ $\color{grey}{85.07}$ \ $\color{green}{90.97}$ $\color{grey}{86.37}$ \ $\color{green}{88.07}$
    ABINet GoogleDrive / BaiduYun / OneDrive $\color{grey}{95.23}$ \ $\color{green}{97.30}$ $\color{grey}{90.57}$ \ $\color{green}{96.45}$ $\color{grey}{93.69}$ \ $\color{green}{95.52}$ $\color{grey}{78.86}$ \ $\color{green}{85.36}$ $\color{grey}{84.03}$ \ $\color{green}{89.77}$ $\color{grey}{84.37}$ \ $\color{green}{94.79}$ $\color{grey}{87.79}$ \ $\color{green}{93.20}$
    NRTR Google Drive / BaiduYun / OneDrive $\color{grey}{91.50}$ \ $\color{green}{96.73}$ $\color{grey}{88.25}$ \ $\color{green}{93.20}$ $\color{grey}{93.69}$ \ $\color{green}{95.57}$ $\color{grey}{72.32}$ \ $\color{green}{80.74}$ $\color{grey}{77.83}$ \ $\color{green}{83.57}$ $\color{grey}{75.00}$ \ $\color{green}{92.01}$ $\color{grey}{83.09}$ \ $\color{green}{90.30}$
    SATRN Google Drive / BaiduYun / OneDrive $\color{grey}{96.00}$ \ $\color{green}{97.27}$ $\color{grey}{91.96}$ \ $\color{green}{95.36}$ $\color{grey}{96.06}$ \ $\color{green}{96.85}$ $\color{grey}{80.31}$ \ $\color{green}{87.14}$ $\color{grey}{88.37}$ \ $\color{green}{90.39}$ $\color{grey}{89.93}$ \ $\color{green}{96.18}$ $\color{grey}{90.43}$ \ $\color{green}{93.89}$
    SAR Google Drive / BaiduYun / OneDrive $\color{grey}{95.33}$ \ $\color{green}{97.07}$ $\color{grey}{88.41}$ \ $\color{green}{93.66}$ $\color{grey}{93.69}$ \ $\color{green}{95.76}$ $\color{grey}{76.02}$ \ $\color{green}{82.19}$ $\color{grey}{83.26}$ \ $\color{green}{86.98}$ $\color{grey}{90.28}$ \ $\color{green}{92.01}$ $\color{grey}{87.83}$ \ $\color{green}{91.27}$

5. MAERec

  • MAERec is a scene text recognition model composed of a ViT backbone and a Transformer decoder in auto-regressive style. It shows an outstanding performance in scene text recognition, especially when pre-trained on the Union14M-U through MAE.

  • Results of MAERec on six common benchmarks and Union14M-Benchmarks

  • Predictions of MAERec on some challenging examples

5.1. Pre-training

5.2. Fine-tuning

5.3. Evaluation

  • If you want to evaluate MAERec on benchmarks, check evaluation

5.4. Inferencing

  • If you want to inferencing MAERec on your raw pictures, check inferencing

5.5. Demo

  • We also provide a Gradio APP for MAERec, which can be used to inferencing on your own pictures. You can run it locally or play with it on 🤗HuggingFace Spaces.
  • To run it locally, you can run the following command:
      1. Install gradio and download the pretrained weights
      pip install gradio
      wget https://download.openmmlab.com/mmocr/textdet/dbnetpp/dbnetpp_resnet50-oclip_fpnc_1200e_icdar2015/dbnetpp_resnet50-oclip_fpnc_1200e_icdar2015_20221101_124139-4ecb39ac.pth -O dbnetpp.pth
      wget https://github.com/Mountchicken/Union14M/releases/download/Checkpoint/maerec_b_union14m.pth -O maerec_b.pth
      
      1. Run the gradio app
      python tools/gradio_app.py \
        --rec_config mmocr-dev-1.x/configs/textrecog/maerec/maerec_b_union14m.py \
        --rec_weight ${PATH_TO_MAEREC_B} \
        --det_config mmocr-dev-1.x/configs/textdet/dbnetpp/dbnetpp_resnet50-oclip_fpnc_1200e_icdar2015.py \
        --det_weight ${PATH_TO_DBNETPP} \
      

6. License

7. Acknowledgment

  • We sincerely thank all the constructors of the 17 datasets used in Union14M, and also the developers of MMOCR.

8. Citation

@inproceedings{jiang2023revisiting,
      title={Revisiting Scene Text Recognition: A Data Perspective}, 
      author={Qing Jiang and Jiapeng Wang and Dezhi Peng and Chongyu Liu and Lianwen Jin}
      booktitle={Proceedings of the IEEE/CVF international conference on computer vision},
      year={2023},
}

More Repositories

1

Efficient-Deep-Learning

A bag of tricks to speed up your deep learning process
Python
143
star
2

Text-Recognition-on-Cross-Domain-Datasets

Improved Text recognition algorithms on different text domains like scene text, handwritten, document, Chinese/English, even ancient books
Python
65
star
3

CodeCookbook

Cookbook for Crafting Good Code
45
star
4

CTPN_CRNN_ChineseOCR_PyQt5

CTPN and CRNN based Chinese OCR, developed with PyQt5
Python
22
star
5

Structured_Dreambooth_LoRA

Dreambooth (LoRA) with well-organized code structure. Naive adaptation from 🤗Diffusers.
Python
9
star
6

ResNet18-CIFAR10

ResNet18 on CIFAR10 reachs 95.09% Accuracy on TestSet
Python
8
star
7

ImageCaptioning-Attention-PyQt5

ImageCaptioning improved with an attention mechanism. Also a PyQt5 application
Python
8
star
8

Two-Stream-RNN-Pytorch

Modeling Temporal Dynamics and Spatial Configurations of Actions UsingTwo-Stream Recurrent Neural Networks
Python
8
star
9

Tampering-Detection

改良后的图像篡改检测APP
MATLAB
3
star
10

Chinese2English-Translation-seq2seq

Build you own translator from chinese to english with seq2seq model in pytorch😄
Python
2
star
11

Image-Hash-Detection

通过算法提取图像的哈希序列,可用于两张图片相似度判断,近似查找,抄袭检测
MATLAB
2
star
12

Image-Processing-Laboratory

一款基于matlab app 的图像处理软件。拥有多通道图显示功能,可显示傅里叶变换图,直方图,离散余弦变换图。操作包括有对比度调整,各种滤波,直方图均衡,以及各种频域处理
2
star
13

Mountchicken.github.io

HTML
1
star
14

Image-Noise-Processing-Tool

一款基于matlab app的图像噪声处理软件,可以判断图片所加噪声种类,并提供了添加噪声功能以及各种去噪功能
1
star