Centre for Language Technology, University of Copenhagen (@kuhumcst)

Top repositories

1

cstlemma

Lemmatiser for Danish, Dutch, English, German, Polish, Romanian, Russian and tens of other languages, that uses affix rules (affix: prefix, infix, suffix, circumfix). Rules are obtained by supervised learning from a full form - lemma list.
C++
32
star
2

stucco

An experimental adaptive UI toolkit.
Clojure
31
star
3

xml-hiccup

Convert XML into Hiccup in Clojure and ClojureScript.
Clojure
19
star
4

DanNet

The Danish WordNet as an RDF graph.
Clojure
18
star
5

taggerXML

Modernized version of Eric Brill's Part Of Speech tagger.
C++
17
star
6

tf-idf

A reasonably performant TF-IDF implementation.
Clojure
12
star
7

Danish-Similarity-Dataset

Gold standard resource for evaluation of Danish word embedding models.
8
star
8

rescope

Turn documents into UI components.
Clojure
7
star
9

pedestal-sp

Turn a Pedestal web service into a SAML Service Provider.
Clojure
7
star
10

rtfreader

Text segmenter and tokeniser for Danish, English and other languages. Reads an RTF or flat text file and outputs the text, one line per sentence & optionally tokenized.
C++
6
star
11

texton

Text Tonsorium - a toolbox that automatically arranges NLP tools in workflows and enacts them with user's inputs
PHP
5
star
12

Anvil-Facetracker

OpenCV-based Plugin for the Anvil annotation software that tracks faces and creates annotations when velocity or acceleration thresholds are transgressed.
Java
5
star
13

cuphic

Transform or scrape Hiccup with a declarative DSL.
Clojure
4
star
14

glossematics

The life of Louis Hjelmslev.
Clojure
4
star
15

affixtrain

Using supervised learning, create a set of affix rules for use by the CSTlemma lemmatiser.
C++
4
star
16

letterfunc

Functions for upper/lower casing, for testing whether a character is a letter and for conversion between Unicode encodings UTF-8 and UTF-16
C
2
star
17

texton-Java

Web-based workflow management system that computes candidate tool workflows given input file(s) and the user's requirements regarding the output. Afterwards, runs a workflow selected by the user from the list of candidates. Implemented in Bracmat (~75%) and Java (~25%).
Java
2
star
18

danish-semantic-reasoning-benchmark

A Danish semantic reasoning benchmark compiled from lexical semantic resources
1
star
19

qname

A QName record and conversions between QNames, Keywords, and IRI strings.
Clojure
1
star
20

texton-linguistic-resources

Linguistic resources for several of the tools included in the Text Tonsorium
Roff
1
star
21

head_movement_detection

Jupyter notebooks and training data containing manual head movement annotations, speech data and velocity, acceleration and jerk data.
Jupyter Notebook
1
star