• Stars
    star
    242
  • Rank 167,048 (Top 4 %)
  • Language
    TypeScript
  • License
    MIT License
  • Created about 9 years ago
  • Updated 22 days ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Uwazi is a web-based, open-source solution for building and sharing document collections

Uwazi Logo

Uwazi CI Maintainability Test Coverage

Uwazi is a flexible database application to capture and organise collections of information with a particular focus on document management. HURIDOCS started Uwazi and is supporting dozens of human rights organisations globally to use the tool.

Uwazi | HURIDOCS

Read the user guide

Installation guide

Dependencies

Before anything else you will need to install the application dependencies:

Production

Install/upgrade procedure

Development

If you want to use the latest development code:

$ git clone https://github.com/huridocs/uwazi.git
$ cd uwazi
$ yarn install
$ yarn blank-state

If you want to download the Uwazi repository and also download the included git submodules, such as the uwazi-fixtures, which is used for running the end-to-end testing:

$ git clone --recurse-submodules https://github.com/huridocs/uwazi.git
$ cd uwazi
$ yarn install

If the main Uwazi repository had already been cloned/downloaded and now you want to load its sub-modules, you can run

$ git submodule update --init

There may be an issue with pngquant not running correctly. If you encounter this issue, you are probably missing library libpng-dev. Please run:

$ sudo rm -rf node_modules
$ sudo apt-get install libpng-dev
$ yarn install

Docker

Infrastructure dependencies (ElasticSearch, ICU Analysis Plugin, MongoDB, Redis and Minio (S3 storage) can be installed and run via Docker Compose. ElasticSearch container will claim 2Gb of memory so be sure your Docker Engine is alloted at least 3Gb of memory (for Mac and Windows users).

$ ./run start

Development Run

$ yarn hot

This will launch a webpack server and nodemon app server for hot reloading any changes you make.

Webpack server

$ yarn webpack-server

This will launch a webpack server. You can also pass --analyzeto get a detailed info of the webpack build.

Testing

Unit and Integration tests

We test using the JEST framework (built on top of Jasmine). To run the unit and integration tests, execute

$ yarn test

This will run the entire test suite, both on server and client apps.

Some suites need MongoDB configured in Replica Set mode to run properly. The provided Docker Compose file runs MongoDB in Replica Set mode and initializes the cluster automatically, if you are using your own mongo installation Refer to MongoDB's documentation for more information.

End to End (e2e)

For End-to-End testing, we have a full set of fixtures that test the overall functionality. Be advised that, for the time being, these tests are run ON THE SAME DATABASE as the default database (uwazi_developmet), so running these tests will DELETE any exisisting data and replace it with the testing fixtures. DO NOT RUN ON PRODUCTION ENVIRONMENTS!

Running end to end tests require a running Uwazi app.

Running tests with Nightmare

$ yarn hot

On a different console tab, run

$ yarn e2e

Running tests with Puppeteer

$ DATABASE_NAME=uwazi_e2e INDEX_NAME=uwazi_e2e yarn hot

On a different console tab, run

$ yarn e2e-puppeteer

Note that if you already have an instance running, this will likely throw an error of ports already been used. Only one instance of Uwazi may be run in a the same port at the same time.

Default login

The application's default log in is admin / change this password now

Note the subtle nudge ;)

System Requirements

  • For big files with a small database footprint (such as video, audio and images) you'll need more HD space than CPU or RAM
  • For text documents you should consider some decent RAM as ElasticSearch is pretty greedy on memory for full text search

The bare minimum you need to be able to run Uwazi on-prem without bottlenecks is:

  • 4 GB of RAM (reserve 2 for Elastic and 2 for everything else)
  • 2 CPU cores
  • 20 GB of disk space

For development:

  • 8GB of RAM (depending on whether the services are running)
  • 4 CPU cores
  • 20 GB of disk space

More Repositories

1

pdf-document-layout-analysis

A Docker-powered service for PDF document layout analysis. This service provides a powerful and flexible PDF analysis service. The service allows for the segmentation and classification of different parts of PDF pages, identifying the elements such as texts, titles, pictures, tables and so on.
Python
51
star
2

pdf_paragraphs_extraction

Python
48
star
3

OpenEvSys

OpenEvSys is free open source software designed for use by organisations who need a software tool to manage information on human rights violations
PHP
30
star
4

pdf-reading-order

Python
11
star
5

preserve

Preserve is a tool for capturing and saving online digital content. Integrated with Uwazi, Preserve captures content from websites, social media and communication platforms, and archives them with accompanying key metadata to ensure evidentiary value by establishing and demonstrating authenticity and chain of custody.
TypeScript
6
star
6

topic-classification

Python
5
star
7

uwazi-design

4
star
8

pdf_metadata_extraction

pdf_information_extraction
Python
4
star
9

pdf-text-extraction

This project aims to extract text from PDF files using the outputs generated by the pdf-document-layout-analysis service. By leveraging the segmentation and classification capabilities of the underlying analysis tool, this project automates the process of text extraction from PDF files.
Python
4
star
10

pdf-labeled-data

TypeScript
3
star
11

pdf-tokens-type-labeler

Python
3
star
12

semantic-search

Python
3
star
13

uwazi-fixtures

Shell
3
star
14

pdf-table-of-contents-extractor

This project aims to extract Table of Contents (TOC) information from PDF files using the outputs generated by the pdf-document-layout-analysis service. By leveraging the segmentation and classification capabilities of the underlying analysis tool, this project automates the process of identifying and structuring the document's TOC.
Python
2
star
15

uwazi-documentation

HTML
2
star
16

python_uwazi_API

Python API to interact with Uwazi
Python
2
star
17

twitter_crawler

twitter crawler
Python
1
star
18

pdf_ocr_service

An http service to OCR PDFs based on a redis queue.
Python
1
star
19

pdf-document-layout-analysis-async

pdf-document-layout-analysis-async
Python
1
star