Language Machines (@LanguageMachines)
  • Stars
    star
    354
  • Global Org. Rank 29,390 (Top 10 %)
  • Registered almost 12 years ago
  • Most used languages
    C++
    56.0 %
    Python
    28.0 %
    Shell
    4.0 %
    HTML
    4.0 %
    C
    4.0 %
    Lex
    4.0 %
  • Location 🇳🇱 Netherlands
  • Country Total Rank 1,450
  • Country Ranking
    Lex
    2
    C++
    101
    HTML
    547
    Python
    681
    Shell
    699
    C
    1,766

Top repositories

1

frog

Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. All NLP modules are based on Timbl, the Tilburg memory-based learning software package.
C++
73
star
2

ucto

Unicode tokeniser. Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic preprocessing steps such as changing case that you can all use to make your text suited for further processing such as indexing, part-of-speech tagging, or machine translation. Ucto comes with tokenisation rules for several languages and can be easily extended to suit other languages. It has been incorporated for tokenizing Dutch text in Frog, our Dutch morpho-syntactic processor. http://ilk.uvt.nl/ucto --
C++
65
star
3

PICCL

A set of workflows for corpus building through OCR, post-correction and normalisation
Python
48
star
4

timbl

TiMBL implements several memory-based learning algorithms.
C++
46
star
5

LuigiNLP

A workflow system for Natural Language Processing.
Python
21
star
6

libfolia

FoLiA library for C++
C++
15
star
7

ticcltools

Tools for TICCL
C++
14
star
8

CLIN28_ST_spelling_correction

Scripts that were used for preparing and converting the Wikipedia documents that are part of the CLIN28 shared task on spelling correction
Python
10
star
9

LamaEvents

Lama Events is a calendar application listing events in the near future. The events are detected and selected by a fully automatic procedure in the Dutch Twitter stream.
HTML
10
star
10

uctodata

Datafiles for the tokenizer ucto.
Shell
9
star
11

mbt

MBT: Memory-based tagger generation and tagging MBT is a memory-based tagger-generator and tagger in one.
C++
9
star
12

ticcutils

Ticcutils, a generic utility library shared by our software.
C++
7
star
13

wopr

Memory Based Word Predictor/Language Model http://ilk.uvt.nl/wopr/
C++
5
star
14

foliautils

Command-line utilities for working with the Format for Linguistic Annotation (FoLiA), powered by libfolia (C++), written by Ko van der Sloot (CLST, Radboud University)
C++
4
star
15

quoll

Python
3
star
16

timblserver

TiMBL implements several memory-based learning algorithms. This is the server part.
C++
3
star
17

ICDAR2017-PostOCR-Ticcl

Wrapper scripts for processing ICDAR2017 PostOCR data given a TICCL ranked input list
Python
2
star
18

dimbl

Distributed Tilburg Memory Based Learner
C++
2
star
19

mbtserver

C++
1
star
20

dialect2keywords

Webinterface designed to convert words in Dutch dialects ("dialectopgaven") into standard Dutch keywords ("vernederlandste trefwoorden").
Python
1
star
21

releasereport

Python
1
star
22

paramsearch

Automated parameter optimisation for Timbl
C
1
star
23

frogdata

Data for Frog, mandatory
Lex
1
star
24

toad

Toad: Trainer Of All Data, the Frog training collection
C++
1
star
25

bp-som

BP-SOM: A hybrid of back-propagation learning in multi-layered perceptrons and self-organizing maps
C++
1
star