• Stars
    star
    21
  • Rank 1,078,189 (Top 22 %)
  • Language
    Python
  • License
    GNU General Publi...
  • Created over 9 years ago
  • Updated about 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Generic Environment for Context-Aware Correction of Orthography

More Repositories

1

pynlpl

PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).
Python
478
star
2

vocage

A minimalistic spaced-repetion vocabulary trainer (flashcards) for the terminal
Rust
142
star
3

clam

Quickly turn command-line applications into RESTful webservices with a web-application front-end. You provide a specification of your command line application, its input, output and parameters, and CLAM wraps around your application to form a fully fledged RESTful webservice.
Python
129
star
4

colibri-core

Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.
C++
123
star
5

flat

FoLiA Linguistic Annotation Tool -- Flat is a web-based linguistic annotation environment based around the FoLiA format (http://proycon.github.io/folia), a rich XML-based format for linguistic annotation. Flat allows users to view annotated FoLiA documents and enrich these documents with new annotations, a wide variety of linguistic annotation types is supported through the FoLiA paradigm.
JavaScript
103
star
6

LaMachine

LaMachine - A software distribution of our in-house as well as some 3rd party NLP software - Virtual Machine, Docker, or local compilation/installation script
Shell
68
star
7

folia

FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (including corpora) with linguistic annotations. A wide variety of linguistic annotations are supported, making FoLiA a useful format for NLP tasks and data interchange. Note that the actual Python library for processing FoLiA is implemented as part of PyNLPl, this contains higher-level tools that use the library as well as the full documentation, validation schemas, and set definitions
Python
60
star
8

python-frog

Python bindings to the dutch NLP tool Frog (pos tagger, lemmatiser, NER tagger, morphological analysis, shallow parser, dependency parser)
Cython
47
star
9

analiticcl

an approximate string matching or fuzzy-matching system for spelling correction, normalisation or post-OCR correction
Rust
30
star
10

python-ucto

This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. This binding makes the power of the ucto tokeniser available to Python. Ucto itself is regular-expression based, extensible, and advanced tokeniser written in C++ (http://ilk.uvt.nl/ucto).
Cython
29
star
11

codemetapy

A Python package for generating and working with codemeta
Python
24
star
12

homeassistant-config

My elaborate home automation configuration + scripts
Python
21
star
13

dotfiles

My dotfiles
Shell
20
star
14

deepfrog

An NLP-suite powered by deep learning
Rust
19
star
15

hanzigrid

Hanzi grids for studying mandarin chinese (tool & output data)
HTML
18
star
16

foliapy

An extensive Python library for dealing with FoLiA (Format for Linguistic Annotation) documents, a rich XML-based format for linguistic annotation finding application in Natural Language Processing (NLP). This library was formerly part of PyNLPl.
Python
18
star
17

procmapgen

A small toy project written in Rust: procedural generation of various kinds of grid-based maps.
Rust
16
star
18

python-timbl

python-timbl, originally developed by Sander Canisius, is a Python extension module wrapping the full TiMBL C++ programming interface. With this module, all functionality exposed through the C++ interface is also available to Python scripts. Being able to access the API from Python greatly facilitates prototyping TiMBL-based applications.
Python
16
star
19

spacy2folia

Use spaCy for NLP and output to the FoLiA XML format.
Python
12
star
20

foliatools

A number of command-line tools for working with FoLiA (Format for Linguistic Annotation). Includes validators, converters, visualisers, and more.
Python
10
star
21

pbmbmt

Phrase-based Memory-based Machine Translation
Python
10
star
22

unilangforum

UniLang Language Community - Forum
PHP
8
star
23

colibri

THIS PROJECT IS BEING RENDERED OBSOLETE BY NEWER VERSIONS colibri-core and colibri-mt !!
C++
7
star
24

valkuil-gecco

Nederlandse Spellingscontrole / Dutch spelling correction system - powered by Gecco
Python
7
star
25

nederlab-pipeline

Linguistic enrichment pipeline for historical dutch, as used in the Nederlab project
Groovy
7
star
26

anavec

Proof-of-concept spelling correction/normalisation system based on anagram vectors
Python
6
star
27

codemeta-harvester

Harvest and aggregate codemeta/schema.org software metadata from source repositories and service endpoints, automatically converting from known metadata schemes in the process
Shell
6
star
28

semeval2014task5

This is the official repository for SemEval 2014 Task 5: L2 Translation Assistant. It contains the gold standard learner corpus, evaluation results and the Python program library needed for the task. It does not contain a full translation assistance system.
HTML
5
star
29

foliadocserve

FoLiA Document Server - HTTP webservice backend for serving and annotating FoLiA documents using the FoLiA Query Language (FQL). Used by FLAT.
Python
5
star
30

piereling

Piereling is a webservice and web-application to convert between a variety of document formats, mostly from and to FoLiA XML. It is intended for NLP pipelines.
Python
5
star
31

lingua-cli

Very small simple command-line interface for language detection using lingua-rs
Rust
5
star
32

colibri-mt

A Machine Translation framework that wraps around the Moses Decoder and enables k-NN classifier techniques to be used for modelling source-side-context
C++
5
star
33

babelente

BabelEnte: Entity Extractor and Translator using BabelFy and Babelnet.org
Python
4
star
34

labirinto

A web front-end portal for a virtual laboratory of NLP tools
Vue
4
star
35

clamservices

A collection of CLAM webservices for various of our Natural Language Processing tools
Python
4
star
36

folia-rust

FoLiA library for rust (alpha)
Rust
4
star
37

codemeta-server

Server for codemeta, in memory triple store, SPARQL endpoint and simple web-based visualisation for end-user
Python
4
star
38

sesdiff

Generates a shortest edit script (Myers' diff algorithm) to indicate how to get from the strings in column A to the strings in column B. Also provides the edit distance (levenshtein).
Rust
4
star
39

alpino_clam_webservice

A CLAM-powered webservice for Alpino, a dependency parser for Dutch
Python
3
star
40

vocadata

Data for vocabulary learning
3
star
41

parseme-support

FoLiA & FLAT support for PARSEME
Python
3
star
42

spreek2schrijf

Scripts voor Spreek2Schrijf, een project met de Tweede Kamer
Python
3
star
43

svkbd

my fork of suckless' simple virtual keyboard: https://tools.suckless.org/x/svkbd/
C
3
star
44

sxmo-docs

my fork of https://git.sr.ht/~mil/sxmo-docs
Shell
2
star
45

aNtiLoPe

A collection of NLP pipelines powered by Nextflow
Groovy
2
star
46

sxmo-utils

my fork of https://git.sr.ht/~mil/sxmo-utils/
Shell
2
star
47

wrexp

Experiment Wrapper - A framework for launching and keeping track of experiments. Wrexp takes care of storing all stdout/stderr logs and mails you when experiments are completed.
JavaScript
2
star
48

wikiente

A named entity recogniser and linker based on DBPedia Spotlight, with support for the FoLiA format
Python
2
star
49

colibri-apps

Contains NLP applications using Colibri Core, suited for end-users. The applications are generally web-based.
OpenEdge ABL
2
star
50

wsd2

Python
2
star
51

colloquery

Web application for searching for phrases/collocations/synonyms in phrase translation tables
Python
2
star
52

lexmatch

Simple lexicon matcher against a text
Rust
2
star
53

colibri-utils

NLP utilities that rely on Colibri Core: currently only language identification
TeX
2
star
54

nlpsandbox

Natural Language Processing Sandbox - An experimental playground for all kinds of NLP tasks
Python
2
star
55

ssam

split sampler: split your data into multiple sets (e.g. train/test/development)
Rust
2
star
56

LaMachine-docker-test

Meta repository for docker testing of LaMachine on Travis-CI
1
star
57

dwm

my patched fork of dwm
C
1
star
58

unilang_ulr

Collection of open language resources from UniLang; containing mostly phrasebooks and stories
1
star
59

oersetter-models

Models for Oersetter, a Frisian<->Dutch Machine Translation system
1
star
60

chira

Chinese Reading Assistant, pop-up translations for Linux
Python
1
star
61

valkuil

Valkuil.net is een automatische spellingcorrector voor het Nederlands die zowel gewone typefouten als grammaticale fouten en verwarringen tussen bestaande woorden opspoort.
Lex
1
star
62

sxmo-svkbd

My fork of https://git.sr.ht/~mil/sxmo-svkbd
C
1
star
63

aur-packages

Arch User Repository packages I maintain
Shell
1
star
64

cwrap

Small C wrapper to turn a C function into a very simple webservice
C
1
star
65

campyon

Campyon is both a command-line tool as well as Python library for viewing and manipulating columned data files. It supports various filters, statistics, visualisations, and plotting.
Python
1
star
66

vocavue

A vocabulary trainer with a view
JavaScript
1
star
67

lst-chat

JavaScript
1
star
68

homepage

My website
TeX
1
star
69

hyphertool

Command-line tool for syllabification and hyphenisation for multiple languages
Rust
1
star
70

lamastats

Generates statistical reports on the usage of our software and webservices
Python
1
star
71

charfreq

Very simply command-line tool that counts (unicode) character frequency from standard input
Rust
1
star
72

colibrita

Colibrita is a proof-of-concept translation assistance system, translating L1 fragments in an L2 context, using machine learning and statistical machine translation techniques
Python
1
star