• Stars
    star
    1,486
  • Rank 31,621 (Top 0.7 %)
  • Language
    Python
  • License
    MIT License
  • Created over 4 years ago
  • Updated about 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

This repository contains the code and implementation details of the CascadeTabNet paper "CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents"

CascadeTabNet

PWC PWC PWC

CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents
Devashish Prasad, Ayan Gadpal, Kshitij Kapadni, Manish Visave,
CVPR Link of Paper
arXiv Link of Paper
Supplementary file
The paper was presented (Orals) at CVPR 2020 Workshop on Text and Documents in the Deep Learning Era

Virtual Oral Presentation YOUTUBE VIDEO
Cascadetabnet Demo by Bhavesh Bhatt YOUTUBE VIDEO

1. Introduction

CascadTabNet is an automatic table recognition method for interpretation of tabular data in document images. We present an improved deep learning-based end to end approach for solving both problems of table detection and structure recognition using a single Convolution Neural Network (CNN) model. CascadeTabNet is a Cascade mask Region-based CNN High-Resolution Network (Cascade mask R-CNN HRNet) based model that detects the regions of tables and recognizes the structural body cells from the detected tables at the same time. We evaluate our results on ICDAR 2013, ICDAR 2019 and TableBank public datasets. We achieved 3rd rank in ICDAR 2019 post-competition results for table detection while attaining the best accuracy results for the ICDAR 2013 and TableBank dataset. We also attain the highest accuracy results on the ICDAR 2019 table structure recognition dataset.

2. Setup

Models are developed in Pytorch based MMdetection framework (Version 1.2)

pip install -q mmcv terminaltables
git clone --branch v1.2.0 'https://github.com/open-mmlab/mmdetection.git'
cd "mmdetection"
pip install -r "/content/mmdetection/requirements/optional.txt"
python setup.py install
python setup.py develop
pip install -r {"requirements.txt"}
pip install pillow==6.2.1 
pip install mmcv==0.4.3

Code is developed under following library dependencies

PyTorch = 1.4.0
Torchvision = 0.5.0
Cuda = 10.0

pip install torch==1.4.0+cu100 torchvision==0.5.0+cu100 -f https://download.pytorch.org/whl/torch_stable.html

If you are using Google Colaboratory (Colab), Then you need add

from google.colab.patches import cv2_imshow

and replace all the cv2.imshow with cv2_imshow

3. Model Architecture

Model Computation Graph

4. Image Augmentation


Codes: Code for dilation transform Code for smudge transform

5. Benchmarking

5.1. Table Detection

1. ICDAR 13

2. ICDAR 19 (Track A Modern)

3. TableBank

TableBank Benchmarking : Official Leaderboard

TableBank Dataset Divisions : TableBank

5.2. Table Structure Recognition

1. ICDAR 19 (Track B2)

6. Model Zoo

Checkout our demo notebook for loading checkpoints and performing inference
Open In Colab
Config file for the Models
Note: Config paths are only required to change during training
Checkpoints of the Models we have trained :

Model NameCheckpoint File
General Model table detectionCheckpoint
ICDAR 13 table detectionCheckpoint
ICDAR 19 (Track A Modern) table detectionCheckpoint
Table Bank Word table detectionCheckpoint
Table Bank Latex table detectionCheckpoint
Table Bank Both table detectionCheckpoint
ICDAR 19 (Track B2 Modern) table structure recognitionCheckpoint

7. Datasets

  1. End to End Table Recognition Dataset
    We manually annotated some of the ICDAR 19 table competition (cTDaR) dataset images for cell detection in the borderless tables. More details about the dataset are mentioned in the paper.
    dataset link

  2. General Table Detection Dataset (ICDAR 19 + Marmot + Github)
    We manually corrected the annotations of Marmot and Github and combined them with ICDAR 19 dataset to create a general and robust dataset.
    dataset link

8. Training

You may refer this tutorial for training Mmdetection models on your custom datasets in colab.

You may refer this script to convert your Pascal VOC XML annotation files to a single COCO Json file.

9. Docker

The docker image of this project can be found at docker hub

It currently contains three models from model zoo. For details you can check the readme file at the docker hub.

Contact

Devashish Prasad : devashishkprasad [at] gmail [dot] com
Ayan Gadpal : ayangadpal2 [at] gmail [dot] com
Kshitij Kapadni : kshitij.kapadni [at] gmail [dot] com
Manish Visave : manishvisave149 [at] gmail [dot] com

Acknowledgements

We thank the following contributions because of which the paper was made possible

  1. The MMdetection project team for creating the amazing framework to push the state of the art computer vision research and which enabled us to experiment and build state of the art models very easily.

  2. Our college โ€Pune Institute of Computer Technologyโ€ for funding our research and giving us the opportunity to work and publish our research at an international conference.

  3. Kai Chen for endorsing our paper on the arXiv to publish a pre-print of the paper and also for maintaining the Mmdetection repository along with the team.

  4. Google Colaboratory team for providing free high end GPU resources for research and development. All of the code base was developed using Google colab and couldn't be possible without it.

  5. AP Analytica for making us aware about a similar problem statement and giving us an opportunity to work on the same.

  6. Overleaf.com for open sourcing the wonderful project which enabled us to write the research paper easily in the latex format

License

The code of CascadeTabNet is Open Source under the MIT License. There is no limitation for both acadmic and commercial usage.

Cite as

If you find this work useful for your research, please cite our paper:

@misc{ cascadetabnet2020,
    title={CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents},
    author={Devashish Prasad and Ayan Gadpal and Kshitij Kapadni and Manish Visave and Kavita Sultanpure},
    year={2020},
    eprint={2004.12629},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

More Repositories

1

LCD-OCR

This is a tesseract based OCR to read from seven segment display.
Python
40
star
2

Embossed-Text-Reader

This is a tesseract based OCR to read Embossed text on metals. It can also be use as a general OCR.
Python
27
star
3

Angle-Distance

Finding distance of a objects from a reference object or origin and their angle with respect to x-axis
Python
24
star
4

Smart-Traffic-Junction

Smart traffic junction : Traffic density estimation at junction or intersection using CCTV
Python
23
star
5

BE-IT-assignments

All three years SE IT, TE IT and BE IT lab assignment programs
CSS
21
star
6

Virtual-AI-assistant

This repository contains my Bachelor's degree final year project. It is a Google colab based interactive Virtual Assistant built using open-sourced libraries.
HTML
9
star
7

Python-Machine-Learning

This repository contains practice codes of book Python Machine Learning by Sebastian Raschka
Jupyter Notebook
6
star
8

Face-Recognition

This repository contains codes for face recognition project
Python
5
star
9

PICT-Feedback-System

This project was developed for PICT for taking feedback of teachers from students
CSS
4
star
10

blind-super-resolution

Enhanced Deep Residual Networks for Single Image Super-Resolution vs Unsupervised Degradation Representation Learning for Blind Super-Resolution vs Transformer based encoder for Unsupervised Degradation Representation Learning for Blind Super-Resolution
Python
3
star
11

Tkinter-Real-Time-Classifier

This is a tkinter based GUI for tensorflow classifier to classify various products in real time
Python
2
star
12

Pulzion-2018

This website was made for the competition PULZION 18 PICT
HTML
2
star
13

Phonegap-milkplant

CSS
2
star
14

ITC-Tobacoo

This repo contains code for the SIH 2019 project. Tobaco classification on the basis of color, texture and ripeness
Python
2
star
15

Real-time-Human-Authentication

Smart India Hackathon 2020 DRDO problem statement: CK107. Multiple camera-based dynamic human authentication for a gate. Our solution involves multiple cameras and we authenticate humans using face, gait, pose, etc features.
JavaScript
2
star
16

tf_and_torch

This repository contains my random pytorch practice scripts
Python
1
star