• Stars
    star
    258
  • Rank 158,189 (Top 4 %)
  • Language
    Python
  • License
    GNU General Publi...
  • Created over 9 years ago
  • Updated about 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelines & ingestor to Solr or Elastic search index & linked data graph database

More Repositories

1

open-semantic-search

Open Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Text Mining & Text Analytics platform (Integrates ETL for document processing, OCR for images & PDF, named entity recognition for persons, organizations & locations, metadata management by thesaurus & ontologies, search user interface & search apps for fulltext search, faceted search & knowledge graph)
Shell
970
star
2

open-semantic-entity-search-api

Open Source REST API for named entity extraction, named entity linking, named entity disambiguation, recommendation & reconciliation of entities like persons, organizations and places for (semi)automatic semantic tagging & analysis of documents by linked data knowledge graph like SKOS thesaurus, RDF ontology, database(s) or list(s) of names
Python
177
star
3

open-semantic-search-apps

Python/Django based webapps and web user interfaces for search, structure (meta data management like thesaurus, ontologies, annotations and named entities) and data import (ETL like text extraction, OCR and crawling filesystems or websites)
CSS
94
star
4

open-semantic-visual-graph-explorer

Open Semantic Visual Linked Data Graph Explorer: Open Source tool (web app) and user interace (UI) for discovery, exploration and visualization of direct and indirect connections between named entities like persons, organizations, locations & concepts from thesarus or ontologies within your documents and knowledgegraph
HTML
78
star
5

solr-ontology-tagger

Automatic tagging and analysis of documents in an Apache Solr index for faceted search by RDF(S) Ontologies & SKOS thesauri
Python
46
star
6

solr-php-ui

Solr client and user interface for search
HTML
21
star
7

solr-relevance-ranking-analysis

Solr Relevance Ranking Analysis and Visualization Tool
Python
17
star
8

open-semantic-search-appliance

Open Semantic Search Appliance (VM)
Shell
12
star
9

lexemes

Import lexemes (dictionary including different grammar forms/lexical forms for each lexical entry) from Wikidata to Apache Solr synonyms config
Python
7
star
10

solr-synonames

Import synonames (multilingual variants of first names from Wikidata) to Solr managed synonyms graph
Python
6
star
11

spacy-services.deb

Debian & Ubuntu package for REST microservices for spaCy natural language processing and machine learning framework for named entity recognition
Shell
5
star
12

tika-server.deb

Apache Tika Server as Debian GNU/Linux and Ubuntu Linux package
Dockerfile
5
star
13

open-semantic-etl-filemonitoring-remote

File monitoring of filesystem by inotify for indexing new/changed files immediately by a remote API on remote search server
Python
5
star
14

tesseract-ocr-cache

Tesseract OCR wrapper for Apache Tika and/or Open Semantic ETL caching the OCR results, so Tika-Server or Open Semantic ETL has not to reprocess slow and expensive OCR on same images again
Python
5
star
15

tika-python.deb

tika-python as Debian GNU/Linux and Ubuntu Linux package
3
star
16

neo4j.deb

Debian package of Neo4j graph database preconfigured for Open Semantic ETL and Open Semantic Search
Shell
2
star
17

solr.deb

Apache Solr as Debian package with preconfigured schema for Open Semantic ETL and Open Semantic Search
Shell
2
star