• Stars
    star
    1
  • Language Roff
  • Created over 5 years ago
  • Updated 12 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Linguistic resources for several of the tools included in the Text Tonsorium

More Repositories

1

cstlemma

Lemmatiser for Danish, Dutch, English, German, Polish, Romanian, Russian and tens of other languages, that uses affix rules (affix: prefix, infix, suffix, circumfix). Rules are obtained by supervised learning from a full form - lemma list.
C++
32
star
2

stucco

An experimental adaptive UI toolkit.
Clojure
31
star
3

xml-hiccup

Convert XML into Hiccup in Clojure and ClojureScript.
Clojure
19
star
4

DanNet

The Danish WordNet as an RDF graph.
Clojure
18
star
5

taggerXML

Modernized version of Eric Brill's Part Of Speech tagger.
C++
17
star
6

tf-idf

A reasonably performant TF-IDF implementation.
Clojure
12
star
7

Danish-Similarity-Dataset

Gold standard resource for evaluation of Danish word embedding models.
8
star
8

rescope

Turn documents into UI components.
Clojure
7
star
9

pedestal-sp

Turn a Pedestal web service into a SAML Service Provider.
Clojure
7
star
10

rtfreader

Text segmenter and tokeniser for Danish, English and other languages. Reads an RTF or flat text file and outputs the text, one line per sentence & optionally tokenized.
C++
6
star
11

texton

Text Tonsorium - a toolbox that automatically arranges NLP tools in workflows and enacts them with user's inputs
PHP
5
star
12

Anvil-Facetracker

OpenCV-based Plugin for the Anvil annotation software that tracks faces and creates annotations when velocity or acceleration thresholds are transgressed.
Java
5
star
13

danish-semantic-reasoning-benchmark

A Danish semantic reasoning benchmark compiled from lexical semantic resources
4
star
14

cuphic

Transform or scrape Hiccup with a declarative DSL.
Clojure
4
star
15

glossematics

The life of Louis Hjelmslev.
Clojure
4
star
16

affixtrain

Using supervised learning, create a set of affix rules for use by the CSTlemma lemmatiser.
C++
4
star
17

letterfunc

Functions for upper/lower casing, for testing whether a character is a letter and for conversion between Unicode encodings UTF-8 and UTF-16
C
2
star
18

texton-Java

Web-based workflow management system that computes candidate tool workflows given input file(s) and the user's requirements regarding the output. Afterwards, runs a workflow selected by the user from the list of candidates. Implemented in Bracmat (~75%) and Java (~25%).
Java
2
star
19

qname

A QName record and conversions between QNames, Keywords, and IRI strings.
Clojure
1
star
20

head_movement_detection

Jupyter notebooks and training data containing manual head movement annotations, speech data and velocity, acceleration and jerk data.
Jupyter Notebook
1
star