Document AI with Hugging Face Transformers
Document AI s a term that has become popular over the last 3 years. It defines machine learning models, tasks, and techniques to classify, parse, and extract information from documents in digital and print forms, like invoices, receipts, licenses, contracts, and business reports.
This repository contains different example and tutorials on how to get started with Document AI and Transformers. Below you can also find a compendium of available models, tasks, datasets and other resources.
Training
Inference
Data-processing
Demos/Spaces
Community:
- fedihch/InvoiceReceiptClassifierDemo
- nielsr/LayoutLMv2-FUNSD
- katanaml/LayoutLMv2-CORD
- nielsr/TrOCR-handwritten
- keras-io/ocr-for-captcha
- nielsr/dit-document-layout-analysis
- PatrickTyBrown/LoanDocumentClassifier
- Theivaprakasham/layoutlmv2_invoice
- TMsp/invoice_processing_layoutlmv3_custom
- Epoching/DocumentQA
- impira/docquery
popular models are layoutlm.... and Donut which we will use today get a first impression of how you can build you own document AI System using Hugging Face Transformers.
Machine Learning Models (Transformers)
Below you can find a table of the currently available Transformers models, who are achieving state-of-the-art performance on Document AI tasks.
Tasks
Document AI includes the following use cases and tasks:
- document classification (image-classification)
- document parsing (form understanding & information extraction)
- visual question answering
- table detection/layout analysis
- optical character recognition (OCR)
Datasets
Dataset | Task | Hugging Face Datasets |
---|---|---|
SROIE | document parsing | darentang/sroie |
RVL-CDIP | document classification | rvl_cdip |
XFUND | document parsing | ranpox/xfund |
FUNSD | document parsing | nielsr/funsd |
CORD | information extraction/parsing | naver-cola-ix/cord-v2 |
DocVQA | visual question answering | load manually |
WildReceipt | document parsing | Theivaprakasham/wildreceipt |
TableBank | table detection/layout analysis | load manually |
DocBank | table detection/layout analysis | load manually |
ReadingBank | table detection/layout analysis | load manually |
EATEN | document parsing | load manually |
PubLayNet | table detection/layout analysis | jordanparker6/publaynet |
ICDAR2019_cTDaR | table detection/layout analysis | load manually |