• Stars
    star
    280
  • Rank 147,492 (Top 3 %)
  • Language
    Jupyter Notebook
  • License
    MIT License
  • Created over 2 years ago
  • Updated 11 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Document AI with Hugging Face Transformers

Document AI s a term that has become popular over the last 3 years. It defines machine learning models, tasks, and techniques to classify, parse, and extract information from documents in digital and print forms, like invoices, receipts, licenses, contracts, and business reports.

logo

This repository contains different example and tutorials on how to get started with Document AI and Transformers. Below you can also find a compendium of available models, tasks, datasets and other resources.

Training

Inference

Data-processing

Demos/Spaces

Community:

popular models are layoutlm.... and Donut which we will use today get a first impression of how you can build you own document AI System using Hugging Face Transformers.

Machine Learning Models (Transformers)

Below you can find a table of the currently available Transformers models, who are achieving state-of-the-art performance on Document AI tasks.

model paper license checkpoints
Donut arxiv MIT huggingface
LiLT arxiv MIT huggingface
LayoutLM arxiv MIT huggingface
LMLayoutXLM arxiv CC BY-NC-SA 4.0 huggingface
LayoutLMv2 arxiv CC BY-NC-SA 4.0 huggingface
LayoutLMv3 arxiv CC BY-NC-SA 4.0 huggingface
DiT arxiv CC BY-NC-SA 4.0 huggingface
TrOCR arxiv MIT huggingface

Tasks

Document AI includes the following use cases and tasks:

  • document classification (image-classification)
  • document parsing (form understanding & information extraction)
  • visual question answering
  • table detection/layout analysis
  • optical character recognition (OCR)

Datasets

Dataset Task Hugging Face Datasets
SROIE document parsing darentang/sroie
RVL-CDIP document classification rvl_cdip
XFUND document parsing ranpox/xfund
FUNSD document parsing nielsr/funsd
CORD information extraction/parsing naver-cola-ix/cord-v2
DocVQA visual question answering load manually
WildReceipt document parsing Theivaprakasham/wildreceipt
TableBank table detection/layout analysis load manually
DocBank table detection/layout analysis load manually
ReadingBank table detection/layout analysis load manually
EATEN document parsing load manually
PubLayNet table detection/layout analysis jordanparker6/publaynet
ICDAR2019_cTDaR table detection/layout analysis load manually

APIs and existing Solutuions

Other Tools

Resources

OCR-Free Document Understanding with Donut

More Repositories

1

deep-learning-pytorch-huggingface

Jupyter Notebook
547
star
2

clipper.js

HTML to Markdown converter and crawler.
TypeScript
471
star
3

easyllm

Jupyter Notebook
428
star
4

huggingface-sagemaker-workshop-series

Enterprise Scale NLP with Hugging Face & SageMaker Workshop series
Jupyter Notebook
228
star
5

sagemaker-huggingface-llama-2-samples

Jupyter Notebook
86
star
6

cdk-samples

Python
56
star
7

llm-sagemaker-sample

Jupyter Notebook
48
star
8

serverless-bert-huggingface-aws-lambda-docker

Python
40
star
9

terraform-aws-sagemaker-huggingface

HCL
39
star
10

advanced-pii-huggingface-sagemaker

Jupyter Notebook
34
star
11

knowledge-distillation-transformers-pytorch-sagemaker

Jupyter Notebook
33
star
12

serverless-bert-with-huggingface-aws-lambda

Python
30
star
13

deep-learning-habana-huggingface

Jupyter Notebook
29
star
14

amazon-sagemaker-gpt-j-sample

Jupyter Notebook
28
star
15

optimum-transformers-optimizations

Jupyter Notebook
28
star
16

optimum-static-quantization

Jupyter Notebook
27
star
17

efsync

Python
23
star
18

setfit-few-shot-classification-sample

Jupyter Notebook
21
star
19

aws-lambda-with-docker-image

Python
21
star
20

text-generation-inference-tests

Jupyter Notebook
20
star
21

evaluate-llms

Includes examples on how to evaluate LLMs
Jupyter Notebook
19
star
22

fine-tune-GPT-2

Jupyter Notebook
17
star
23

deepspeed-sagemaker-example

Jupyter Notebook
17
star
24

deep-learning-remote-runner

Python
16
star
25

keras-vision-transformer-huggingface

Jupyter Notebook
15
star
26

serverless-machine-learning

collection of serverless machine learning use cases and examples including Hugging Face transformers, timm, Gradio
Python
15
star
27

transformers-pytorch-text-classification

Jupyter Notebook
14
star
28

aws-sagemaker-huggingface-llm

Jupyter Notebook
13
star
29

aws-neuron-samples

Python
12
star
30

terraform-aws-llm-sagemaker

HCL
12
star
31

new-serverless-bert-aws-lambda

Python
11
star
32

blog-github-actions-aws-lambda-python

Python
10
star
33

sentence-transformers-huggingface-inferentia

Jupyter Notebook
9
star
34

huggingface-container

Dockerfile
9
star
35

sagemaker-falcon-180b-samples

Jupyter Notebook
9
star
36

multilingual-serverless-qa-aws-lambda

Python
9
star
37

huggingface-inferentia2-samples

Jupyter Notebook
9
star
38

amazon-sagemaker-flan-t5-xxl

Example how to deploy FLAN-T5-XXL on Amazon SageMaker
Jupyter Notebook
8
star
39

aws-marketplace-example

TypeScript
7
star
40

sample-huggingface-sagemaker-cdk

Python
7
star
41

transformers-deepspeed

Jupyter Notebook
7
star
42

huggingface-mongodb-example

7
star
43

serverless-efs-and-aws-lambda

Python
6
star
44

github-actions

6
star
45

model-recommender

Jupyter Notebook
6
star
46

open-source-function-calling

Jupyter Notebook
6
star
47

blog-custom-github-action

Dockerfile
6
star
48

scale-machine-learning-w-pytorch

Python
5
star
49

transformers-inference-experiments

Jupyter Notebook
5
star
50

keras-financial-summarization-huggingface

Jupyter Notebook
5
star
51

open-llm-stack

Open LLM Stack to easily deploy open source Generative AI application in the cloud and for production
5
star
52

rust-machine-learning

Rust
4
star
53

onnx-transformers

Python
4
star
54

keras-layoutlm-transformers

Jupyter Notebook
4
star
55

blog-github-action-cicd-aws-s3

Vue
4
star
56

amazon-sagemaker-flan-ul2

Jupyter Notebook
4
star
57

aws-bedrock-titan-mteb

Repository to evaluate Amazon Bedrock Titan text-embeddings on MTEB
Python
4
star
58

sentence-transformers-tensorflow

Jupyter Notebook
4
star
59

langchain-tests

Jupyter Notebook
3
star
60

rust-hf-hub-loader

Rust
3
star
61

rust-stuff

Rust
3
star
62

philschmid.de

TypeScript
3
star
63

philschmid-de-v2

JavaScript
3
star
64

huggingface-sagemaker-llm-private-vpc

Jupyter Notebook
3
star
65

prosus-sagemaker-huggingface-workshop

Jupyter Notebook
3
star
66

huggingface-sagemaker-multi-container-endpoint

Jupyter Notebook
2
star
67

pytorch-bert-e2e-model

Jupyter Notebook
2
star
68

aws-devcontainer-test

Dockerfile
2
star
69

tmls-sagemaker-huggingface-workshop

Jupyter Notebook
2
star
70

llama3-aws-trainium-sample

Jupyter Notebook
2
star
71

gradio-docker

2
star
72

sagemaker-huggingface-idefics-sample

Jupyter Notebook
2
star
73

langchain-samples-and-experiments

Jupyter Notebook
2
star
74

sagemaker-cdk-samples

TypeScript
2
star
75

rust-vs-python

Python
2
star
76

rust-lambda-example

Rust AWS Lambda API Gateway CDK example
Rust
2
star
77

transformers-keras-e2e-ner

Jupyter Notebook
2
star
78

accelerate-transformers-example

2
star
79

llmperf-bench

llmperf bench is a toolkit to benchmark Hugging Face TGI with llmperf easily.
Python
2
star
80

sagemaker-debug-xla

Python
1
star
81

transformers-inferentia

Python
1
star
82

german-sentiment-bert

Jupyter Notebook
1
star
83

huggingface-course-sagemaker-talk

Jupyter Notebook
1
star
84

huggingface_sagemaker_tensorflow_distributed

Python
1
star
85

download-release-assets

Shell
1
star
86

sagemaker-beta-inference

Jupyter Notebook
1
star
87

transformers-deepspeed-expermiments

Python
1
star
88

python-project-template

Python
1
star
89

sentence-transformers-optimizations

Jupyter Notebook
1
star
90

train-6-b-gpt-j-amazon-sagemaker

Jupyter Notebook
1
star
91

lambda-apollo-dynamodb-template

TypeScript
1
star
92

speculative-decoding-medusa-example

Example repository on how to train and benchmark Medusa based Speculative Decoding with Hugging Face
1
star
93

stable-diffusion-tests

Jupyter Notebook
1
star
94

BYOC-Amazon-Sagemaker

Python
1
star
95

sample-custom-inference-sagemaker-huggingface

Python
1
star
96

epfllm-megatron-llm

Jupyter Notebook
1
star
97

personal-ai-image

Fine-tune FLUX 1.dev for personal AI photos
Python
1
star
98

philschmid-blog

TypeScript
1
star
99

sdxl-inf2-demo-spaces-gradio

Python
1
star
100

nividia-triton-distilbert-bls-classification-example

Python
1
star