• Stars
    star
    3
  • Rank 3,963,521 (Top 79 %)
  • Language
    Python
  • Created over 1 year ago
  • Updated 5 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

More Repositories

1

uwazi

Uwazi is a web-based, open-source solution for building and sharing document collections
TypeScript
242
star
2

pdf-document-layout-analysis

A Docker-powered service for PDF document layout analysis. This service provides a powerful and flexible PDF analysis service. The service allows for the segmentation and classification of different parts of PDF pages, identifying the elements such as texts, titles, pictures, tables and so on.
Python
51
star
3

pdf_paragraphs_extraction

Python
48
star
4

OpenEvSys

OpenEvSys is free open source software designed for use by organisations who need a software tool to manage information on human rights violations
PHP
30
star
5

pdf-reading-order

Python
11
star
6

preserve

Preserve is a tool for capturing and saving online digital content. Integrated with Uwazi, Preserve captures content from websites, social media and communication platforms, and archives them with accompanying key metadata to ensure evidentiary value by establishing and demonstrating authenticity and chain of custody.
TypeScript
6
star
7

topic-classification

Python
5
star
8

uwazi-design

4
star
9

pdf_metadata_extraction

pdf_information_extraction
Python
4
star
10

pdf-text-extraction

This project aims to extract text from PDF files using the outputs generated by the pdf-document-layout-analysis service. By leveraging the segmentation and classification capabilities of the underlying analysis tool, this project automates the process of text extraction from PDF files.
Python
4
star
11

pdf-labeled-data

TypeScript
3
star
12

semantic-search

Python
3
star
13

uwazi-fixtures

Shell
3
star
14

pdf-table-of-contents-extractor

This project aims to extract Table of Contents (TOC) information from PDF files using the outputs generated by the pdf-document-layout-analysis service. By leveraging the segmentation and classification capabilities of the underlying analysis tool, this project automates the process of identifying and structuring the document's TOC.
Python
2
star
15

uwazi-documentation

HTML
2
star
16

python_uwazi_API

Python API to interact with Uwazi
Python
2
star
17

twitter_crawler

twitter crawler
Python
1
star
18

pdf_ocr_service

An http service to OCR PDFs based on a redis queue.
Python
1
star
19

pdf-document-layout-analysis-async

pdf-document-layout-analysis-async
Python
1
star