Top Rating
- Top Contributors
  Discover the Top Open Source contributors by country or by language
- Interviews
  Discover real stories from Open Source developers
Discover

Discover your Favorite Language
Discover the top trending repositories and projects on Github. Explore the latest trends in your preferred languages.

Dart

Jupyter Notebook

Kotlin

Clojure

Scala

Lua

Elixir

Go

More Languages
Awesome

Awesome repositories
Discover the most awesome repositories and projects of your favorite languages. Inspired by the Awesome-* lists trend in GitHub.

Crystal

Shell

Lua

Nix

Zig

Scala

Swift

Go

More Languages
By Country

Rankings by Country
Discover the community of talented open source contributors in each country.

🇮🇲 Isle of Man

🇹🇿 Tanzania

🇯🇲 Jamaica

🇮🇸 Iceland

🇭🇺 Hungary

🇹🇼 Taiwan

🇨🇿 Czechia

🇦🇼 Aruba

All Countries Compare Countries

huridocs/pdf-tokens-type-labeler

Stars
3
Rank 3,963,521 (Top 79 %)
Language
Python
Created over 1 year ago
Updated 5 months ago

huridocs/pdf-tokens-type-labeler

huridocs

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

uwazi

Uwazi is a web-based, open-source solution for building and sharing document collections

pdf-document-layout-analysis

A Docker-powered service for PDF document layout analysis. This service provides a powerful and flexible PDF analysis service. The service allows for the segmentation and classification of different parts of PDF pages, identifying the elements such as texts, titles, pictures, tables and so on.

pdf_paragraphs_extraction

OpenEvSys

OpenEvSys is free open source software designed for use by organisations who need a software tool to manage information on human rights violations

pdf-reading-order

preserve

Preserve is a tool for capturing and saving online digital content. Integrated with Uwazi, Preserve captures content from websites, social media and communication platforms, and archives them with accompanying key metadata to ensure evidentiary value by establishing and demonstrating authenticity and chain of custody.

topic-classification

uwazi-design

pdf_metadata_extraction

pdf_information_extraction

pdf-text-extraction

This project aims to extract text from PDF files using the outputs generated by the pdf-document-layout-analysis service. By leveraging the segmentation and classification capabilities of the underlying analysis tool, this project automates the process of text extraction from PDF files.

pdf-labeled-data

semantic-search

uwazi-fixtures

pdf-table-of-contents-extractor

This project aims to extract Table of Contents (TOC) information from PDF files using the outputs generated by the pdf-document-layout-analysis service. By leveraging the segmentation and classification capabilities of the underlying analysis tool, this project automates the process of identifying and structuring the document's TOC.

uwazi-documentation

python_uwazi_API

Python API to interact with Uwazi

twitter_crawler

twitter crawler

pdf_ocr_service

An http service to OCR PDFs based on a redis queue.

pdf-document-layout-analysis-async

pdf-document-layout-analysis-async