Computational Linguistics and & Text Mining Lab (@cltl)

Top repositories

1

python-for-text-analysis

If you want to use Python for text analysis, this course is for you!
Jupyter Notebook
418
star
2

OpenDutchWordnet

This repo provides a python module to work with Open Dutch WordNet. It was created using python 3.4.
HTML
63
star
3

pepper

VU-CLTL Pepper/Nao Application Repository (Python 2)
Python
29
star
4

ba-text-mining

Hands-on material for the course text-mining BA, taught at VU Amsterdam
Jupyter Notebook
28
star
5

wsd-dynamic-sense-vector

HTML
24
star
6

SpaCy-to-NAF

spaCy-to-naf converter
Python
20
star
7

EventCoreference

Compares descriptions of events within and across documents to decide if they refer to the same events.
Java
19
star
8

ThesisTips

A collection of tips for writing a PhD thesis
19
star
9

KafNafParserPy

Parser for KAF NAF files written in Python
Python
15
star
10

svm_wsd

Word Sense Disambiguation system developed on the DutchSemCor project using Support Vector Machines. The input is plain text, and the output XML
Python
13
star
11

ma-hlt-labs

Human Language Technology Notebooks for Lab sessions, Master Students
Jupyter Notebook
11
star
12

opinion_miner_deluxe

Opinion miner based of Machine Learning that can be trained on a corpus of KAF/NAF files
Python
10
star
13

ma-ml4nlp-labs

Jupyter Notebook
9
star
14

BabelfyReimplementation

Reimplementation of Babelfy (http://babelfy.org)
Python
9
star
15

lexical_pattern_extractor

Lexical pattern extractor to generate patterns and target words from a seed list
Python
8
star
16

entity-identification-from-scratch

Entity recognition and linking for historical documents in Dutch, developed within the Clariah+ project at VU Amsterdam
Python
8
star
17

OntoTagger

Ontotagger inserts (semantic) labels into KAF representation on the basis of lemma or wordnet synset representations of text
Java
8
star
18

vu-rm-pip3

Dutch NewsReader pipeline
Shell
7
star
19

ecbPlus

ECB+ and derived corpora
7
star
20

WordnetTools

Set of functions to use a wordnet in Wordnet-LMF format
Java
7
star
21

ma-language-as-data-labs

This Github provides the Jupyter notebooks for the Lab sessions of the VU Language-As-Data course.
Jupyter Notebook
7
star
22

event-resource-interoperability

6
star
23

semantic_space_navigation

Jupyter Notebook
6
star
24

morphosyntactic_parser_nl

Morphosyntactic parser for Dutch based on the Alpino parser
Python
5
star
25

a-proof-zonmw

Detecting the functioning level of a patient from a free-text clinical note in Dutch.
Jupyter Notebook
5
star
26

multilingual-finegrained-entity-typing

Python
5
star
27

multilingual-wiki-event-pipeline

This project aims to extract information about incidents of a particular type. This information consists of structured data on the incidents from Wikidata, as well as unstructured description and supporting sources from Wikipedia. We obtain information from Wikipedia in multiple languages.
Python
5
star
28

EL-long-tail-phenomena

Systematic study of long tail phenomena in the task of entity linking
Jupyter Notebook
4
star
29

FormatConversions

Several conversions between formats that are commonly used by our tools
Python
4
star
30

BiographyNet

NLP tools and data used in BiographyNet
Python
4
star
31

StoryTeller

Toolkit to query the NewsReader KnowledgeStore with SPARQL and create a JSON story
HTML
4
star
32

cltl-ma-thesis

(LaTeX) MA thesis template
TeX
4
star
33

Target-Spans-Detection

Target_Spans_HateXplain
Python
4
star
34

HumanLikeEL

Human-Like Entity Linking using Contextual knowledge
Jupyter Notebook
4
star
35

WordNetSimilarity

Programs and scripts that test performance of WordNet similarity measurements using different settings
Perl
4
star
36

FrameNet-annotation-tool

Python-based command-line tool for FrameNet annotation
XSLT
4
star
37

MultiWordTagger

Reads a KAF or NAF file to detect multiword sequences of terms according the WordNet
Java
4
star
38

aproof-icf-classifier

Classifier that can read medical reports and assign a functional level classification following the WHO ICF classification scheme.
Python
4
star
39

PostmaVossenGWC2014

This repository provides the code to replicate the results from PostmaVossenGWC2014
C
3
star
40

SoNar2Naf

Converter from Folia to NAF
HTML
3
star
41

vua-wsd-sem2015

System for the CLTL participation in SemEval2015 task 13: multilingual all-words sense disambiguation and entity linking
Python
3
star
42

frame-annotation-tool

Annotation tool in JavaScript and Node.js for annotation of frames in Dutch documents.
JavaScript
3
star
43

machine-learning-for-nlp-course

releases of notebooks for students participating in machine learning for nlp
Jupyter Notebook
3
star
44

MoreIsNotAlwaysBetter

Java
3
star
45

lexical-negation-dictionary

Python
3
star
46

BiographicalDataModels

3
star
47

ma-communicative-robots

Communication robots
Python
3
star
48

multilingual_factuality

Python
3
star
49

NAF-HeidelTime

NAF (KAF) Wrapper around HeidelTime
Python
3
star
50

NewsAcquisition

Analysis and acquisition of news data from the Signal Media corpus and other news collections
Jupyter Notebook
3
star
51

tokeniser-opennlp

Tokenizer and sentence splitter based on opennlp
Python
3
star
52

reference-framing-perspective

Workshop website
3
star
53

WordNetMapper

This repo provides the possibility to map between lexical keys | offsets | ilidefs from one wordnet version to the other ["16","17","171","20","21","30"]. It makes use of the index.sense files from WordNet (http://wordnet.princeton.edu/) and the automatically generated mappings between WordNet offsets (http://nlp.lsi.upc.edu/tools/download-map.php)
HTML
3
star
54

a-proof

Tools for the text classification of clinical note in electronic patient records
Jupyter Notebook
2
star
55

ELBaselines

This repo is aimed to create baseline results for Entity Linking, by running a text against the state-of-the-art systems for entity linking, using their most standard configuration.
Python
2
star
56

LSTM-WSD

Python
2
star
57

nlpp

Script to install NLP pipeline from its components.
CWeb
2
star
58

DFNDataReleases

2
star
59

LongTailAnnotation

Annotation tool for data2text approaches
JavaScript
2
star
60

micro-portraits

Python
2
star
61

KafAnnotator

Standalone program to annotate KAF files
Java
2
star
62

Image-Specificity

Reimplementation of Jas & Parikh's (2015) image specificity metric, using word embeddings.
Python
2
star
63

voc-missives

NER and format conversion scripts for the Generale Missiven
HCL
2
star
64

TextToCoNLL

Python
2
star
65

dutch-nlp-tools

Overview of data sets and resources for Dutch
2
star
66

NAF-4-Development

Python
2
star
67

FrameNetNLTK

Python
2
star
68

SemanticOverfitting

Python
2
star
69

mergeAnnotationCAT

Script to merge files annotated from different annotators (on the same task) to better explore (dis-)agreement
Python
2
star
70

Mining-Ministers

Python
2
star
71

hpsp

Experiments with hyperspace models for selectional preference
Jupyter Notebook
2
star
72

CuriousMachine

Investigations on how to build a curious machine based on NLP technologies
Python
2
star
73

NAFFoLiAPy

Library for converting between FoLiA and NAF
Python
2
star
74

GunViolenceCorpus

2
star
75

News2RDF

Python
2
star
76

MFS_classifier

This repo contains the scripts to attempt to remove the mfs bias from a WSD system.
PostScript
2
star
77

coreference-evaluation

Evaluation package for event coreference using the reference-scorer
Java
2
star
78

SimpleTagger

Python
2
star
79

GRaSP

2
star
80

ceopathfinder

Finds a path of circumstantial relations between events on the basis of the CircumstantialEventOntology
Rich Text Format
2
star
81

run_open-sesame

Python
2
star
82

pepper_tensorflow

This is the repository for Pepper modules and external services. Use Python 3
Python
2
star
83

entity-link-postprocess

Python
2
star
84

DutchDescriptions

Dutch descriptions for the Flickr30K validation and test data, plus a cross-lingual comparison tool.
Roff
2
star
85

cltl.github.io

CLTL organization site
HTML
1
star
86

TeamRobot

Python
1
star
87

LongTailIdentity

Generating profiles of long tail identities from text
Jupyter Notebook
1
star
88

vua_factuality

Python
1
star
89

a-proof-project

JavaScript
1
star
90

PythonVirtuosoInterface

Simple interface to SPARQL for python 2 and 3 scripts
Python
1
star
91

rfp_corpus_collection

Collect a referentially grounded corpus for the 1st workshop on Reference, Framing, and Perspective (LREC-COLING 2024)
Python
1
star
92

Wikipedia_langlinks

Python
1
star
93

pwgc

tool to load the princeton wordnet gloss corpus
Python
1
star
94

relink

RElinking with CONtext - Entity linking module
1
star
95

KafKybot

Extracts tuples from KAF file using profiles
Java
1
star
96

ma-applied-tm-course

Github Repository supporting the Applied TM Course as part of the VU Text Mining Masters
Python
1
star
97

SPT_crowd_data_analysis

Code to analyze crowd annotations of property-concept pairs in terms of their relations.
Python
1
star
98

inner-outer-coreference

A repository for investigating the role of common ground in datasets of social dialogue in coreference resolution tasks
Python
1
star
99

ma-course-subjectivity-mining

Repository for the Subjectivity mining course
Python
1
star
100

BERT-WSD

Python
1
star