• Stars
    star
    126
  • Rank 284,543 (Top 6 %)
  • Language
    Python
  • License
    GNU Affero Genera...
  • Created over 3 years ago
  • Updated almost 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Dictionary-guided Scene Text Recognition (CVPR-2021)
Table of Content
  1. Introduction
  2. Dataset
  3. Getting Started
  4. Training & Evaluation
  5. Acknowledgement

Dictionary-guided Scene Text Recognition

  • We propose a novel dictionary-guided sense text recognition approach that could be used to improve many state-of-the-art models.
  • We also introduce a new benchmark dataset (namely, VinText) for Vietnamese scene text recognition.
architecture.png
Comparison between the traditional approach and our proposed approach.

Details of the dataset construction, model architecture, and experimental results can be found in our following paper:

@inproceedings{m_Nguyen-etal-CVPR21,
      author = {Nguyen Nguyen and Thu Nguyen and Vinh Tran and Triet Tran and Thanh Ngo and Thien Nguyen and Minh Hoai},
      title = {Dictionary-guided Scene Text Recognition},
      year = {2021},
      booktitle = {Proceedings of the {IEEE} Conference on Computer Vision and Pattern Recognition (CVPR)},
    }

Please CITE our paper whenever our dataset or model implementation is used to help produce published results or incorporated into other software.


Dataset

We introduce ✨ a new VinText dataset.

By downloading this dataset, USER agrees:

  • to use this dataset for research or educational purposes only
  • to not distribute or part of this dataset in any original or modified form.
  • and to cite our paper whenever this dataset are employed to help produce published results.
Name #imgs #text instances Examples
VinText 2000 About 56000 example.png

Detail about ✨ VinText dataset can be found in our paper. Download Converted dataset to try with our model

Dataset variant Input format Link download
Original x1,y1,x2,y2,x3,y3,x4,y4,TRANSCRIPT Download here
Converted dataset COCO format Download here

VinText

Extract data and copy folder to folder datasets/

datasets
└───vintext
	└───test.json
		│train.json
		|train_images
		|test_images
└───evaluation
	└───gt_vintext.zip

Getting Started

Requirements
  • python=3.7
  • torch==1.4.0
  • detectron2==0.2
Installation
conda create -n dict-guided -y python=3.7
conda activate dict-guided
conda install -y pytorch torchvision cudatoolkit=10.0 -c pytorch
python -m pip install ninja yacs cython matplotlib tqdm opencv-python shapely scipy tensorboardX pyclipper Polygon3 weighted-levenshtein editdistance

# Install Detectron2
python -m pip install detectron2==0.2 -f \
  https://dl.fbaipublicfiles.com/detectron2/wheels/cu100/torch1.4/index.html

Check out the code and install:

git clone https://github.com/nguyennm1024/dict-guided.git
cd dict-guided
python setup.py build develop
Download vintext pre-trained model
Usage

Prepare folders

mkdir sample_input
mkdir sample_output

Copy your images to sample_input/. Output images would result in sample_output/

python demo/demo.py --config-file configs/BAText/VinText/attn_R_50.yaml --input sample_input/ --output sample_output/ --opts MODEL.WEIGHTS path-to-trained_model-checkpoint
qualitative results.png
Qualitative Results on VinText.

Training and Evaluation

Training

For training, we employed the pre-trained model tt_attn_R_50 from the ABCNet repository for initialization.

python tools/train_net.py --config-file configs/BAText/VinText/attn_R_50.yaml MODEL.WEIGHTS path_to_tt_attn_R_50_checkpoint

Example:

python tools/train_net.py --config-file configs/BAText/VinText/attn_R_50.yaml MODEL.WEIGHTS ./tt_attn_R_50.pth

Trained model output will be saved in the folder output/batext/vintext/ that is then used for evaluation

Evaluation

python tools/train_net.py --eval-only --config-file configs/BAText/VinText/attn_R_50.yaml MODEL.WEIGHTS path_to_trained_model_checkpoint

Example:

python tools/train_net.py --eval-only --config-file configs/BAText/VinText/attn_R_50.yaml MODEL.WEIGHTS ./output/batext/vintext/trained_model.pth

Acknowledgement

This repository is built based-on ABCNet

More Repositories

1

PhoGPT

PhoGPT: Generative Pre-training for Vietnamese (2023)
Python
720
star
2

PhoBERT

PhoBERT: Pre-trained language models for Vietnamese (EMNLP-2020 Findings)
658
star
3

BERTweet

BERTweet: A pre-trained language model for English Tweets (EMNLP-2020)
Python
573
star
4

WaveDiff

Official Pytorch Implementation of the paper: Wavelet Diffusion Models are fast and scalable Image Generators (CVPR'23)
Python
372
star
5

CPM

💄 Lipstick ain't enough: Beyond Color-Matching for In-the-Wild Makeup Transfer (CVPR 2021)
Python
364
star
6

XPhoneBERT

XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech (INTERSPEECH 2023)
Python
292
star
7

Anti-DreamBooth

Anti-DreamBooth: Protecting users from personalized text-to-image synthesis (ICCV 2023)
Python
205
star
8

LFM

Official PyTorch implementation of the paper: Flow Matching in Latent Space
Python
184
star
9

blur-kernel-space-exploring

Exploring Image Deblurring via Blur Kernel Space (CVPR'21)
Python
137
star
10

PhoNLP

PhoNLP: A BERT-based multi-task learning model for part-of-speech tagging, named entity recognition and dependency parsing (NAACL 2021)
Python
137
star
11

VinAI_Translate

A Vietnamese-English Neural Machine Translation System (INTERSPEECH 2022)
123
star
12

MagNet

Progressive Semantic Segmentation (CVPR-2021)
Python
114
star
13

Warping-based_Backdoor_Attack-release

WaNet - Imperceptible Warping-based Backdoor Attack (ICLR 2021)
Python
111
star
14

HyperInverter

HyperInverter: Improving StyleGAN Inversion via Hypernetwork (CVPR 2022)
Python
111
star
15

BARTpho

BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese (INTERSPEECH 2022)
99
star
16

PhoWhisper

PhoWhisper: Automatic Speech Recognition for Vietnamese (2024)
96
star
17

ISBNet

ISBNet: a 3D Point Cloud Instance Segmentation Network with Instance-aware Sampling and Box-aware Dynamic Convolution (CVPR 2023)
Python
93
star
18

Dataset-Diffusion

Dataset Diffusion: Diffusion-based Synthetic Data Generation for Pixel-Level Semantic Segmentation (NeurIPS2023)
Jupyter Notebook
87
star
19

JointIDSF

BERT-based joint intent detection and slot filling with intent-slot attention mechanism (INTERSPEECH 2021)
Python
84
star
20

3D-UCaps

3D-UCaps: 3D Capsules Unet for Volumetric Image Segmentation (MICCAI 2021)
Python
65
star
21

PhoNER_COVID19

COVID-19 Named Entity Recognition for Vietnamese (NAACL 2021)
63
star
22

PCC-pytorch

A pytorch implementation of the paper "Prediction, Consistency, Curvature: Representation Learning for Locally-Linear Control"
Python
59
star
23

Counting-DETR

Few-shot Object Counting and Detection (ECCV 2022)
Python
56
star
24

PSENet-Image-Enhancement

PSENet: Progressive Self-Enhancement Network for Unsupervised Extreme-Light Image Enhancement (WACV 2023)
Python
54
star
25

LeMul

Toward Realistic Single-View 3D Object Reconstruction with Unsupervised Learning from Multiple Images (ICCV 2021)
Python
51
star
26

DSW

Distributional Sliced-Wasserstein distance code
Python
47
star
27

PhoMT

PhoMT: A High-Quality and Large-Scale Benchmark Dataset for Vietnamese-English Machine Translation (EMNLP 2021)
40
star
28

single_image_hdr

Single-Image HDR Reconstruction by Multi-Exposure Generation (WACV 2023)
Python
38
star
29

SwiftBrush

SwiftBrush: One-Step Text-to-Image Diffusion Model with Variational Score Distillation (CVPR 2024)
Python
37
star
30

tise-toolbox

TISE: Bag of Metrics for Text-to-Image Synthesis Evaluation (ECCV 2022)
Python
33
star
31

Point-Unet

Point-Unet: A Context-aware Point-based Neural Network for Volumetric Segmentation (MICCAI 2021)
Python
32
star
32

COVID19Tweet

WNUT-2020 Task 2: Identification of informative COVID-19 English Tweets
Python
30
star
33

CREPS

Efficient Scale-Invariant Generator with Column-Row Entangled Pixel Synthesis (CVPR 2023)
Python
30
star
34

ViText2SQL

ViText2SQL: A dataset for Vietnamese Text-to-SQL semantic parsing (EMNLP-2020 Findings)
28
star
35

input-aware-backdoor-attack-release

Input-aware Dynamic Backdoor Attack (NeurIPS 2020)
Python
27
star
36

QC-StyleGAN

QC-StyleGAN - Quality Controllable Image Generation and Manipulation (NeurIPS 2022)
Python
26
star
37

fsvc-ata

Inductive and Transductive Few-Shot Video Classification via Appearance and Temporal Alignments (ECCV 2022)
Python
23
star
38

GeoFormer

Geodesic-Former: a Geodesic-Guided Few-shot 3D Point Cloud Instance Segmenter (ECCV 2022)
Python
23
star
39

PhoST

A High-Quality and Large-Scale Dataset for English-Vietnamese Speech Translation (INTERSPEECH 2022)
19
star
40

MISCA

MISCA: A Joint Model for Multiple Intent Detection and Slot Filling with Intent-Slot Co-Attention (EMNLP 2023 - Findings)
Python
18
star
41

PC3-pytorch

Predictive Coding for Locally-Linear Control (ICML-2020)
Python
16
star
42

Open3DIS

Open3DIS: Open-vocabulary 3D Instance Segmentation with 2D Mask Guidance (CVPR 2024)
Python
16
star
43

EFHQ

Code and data for the CVPR24 paper "EFHQ: Multi-purpose ExtremePose-Face-HQ dataset" [CVPR'24]
Python
15
star
44

TPC-tensorflow

Temporal Predictive Coding For Model-Based Planning In Latent Space (ICML-2021)
Python
14
star
45

iFS-RCNN

iFS-RCNN: An Incremental Few-shot Instance Segmenter (CVPR 2022)
Python
14
star
46

GaPro

GaPro: Box-Supervised 3D Point Cloud Instance Segmentation Using Gaussian Processes as Pseudo Labelers (ICCV 2023)
Python
13
star
47

HyperCUT

HyperCUT: Video Sequence from a Single Blurry Image using Unsupervised Ordering (CVPR'23)
Python
12
star
48

LP-OVOD

LP-OVOD: Open-Vocabulary Object Detection by Linear Probing (WACV 2024)
Python
11
star
49

selfsup_pcd

Self-Supervised Learning with Multi-View Rendering for 3D Point Cloud Analysis (ACCV 2022)
Python
8
star
50

PointSWD

Point-set Distances for Learning Representations of 3D Point Clouds (ICCV 2021)
Python
7
star
51

PhoATIS_Disfluency

From Disfluency Detection to Intent Detection and Slot Filling (INTERSPEECH 2022)
7
star
52

JPIS

JPIS: A Joint Model for Profile-Based Intent Detection and Slot Filling with Slot-to-Intent Attention (ICASSP 2024)
Python
6
star
53

SA-DPM

Official PyTorch implementation of "On Inference Stability for Diffusion Models" (AAAI'24)
Python
5
star
54

PhoDisfluency

Disfluency Detection for Vietnamese (WNUT 2022)
4
star
55

DiverseDream

DiverseDream: A Technique to Generate Diverse 3D Objects from the Same Text Prompt (ECCV '24)
Python
3
star
56

robust-bayesian-recourse

Robust Bayesian Recourse: a robust model-agnostic algorithmic recourse method (UAI'22)
Python
2
star
57

RDUOT

Official code for ECCV 2024 paper “A high-quality robust diffusion framework for corrupted dataset”
Python
1
star
58

LAMPAT

LAMPAT: Low-rank Adaptation Multilingual Paraphrasing using Adversarial Training (AAAI'24)
Python
1
star