TurkuNLP Group - IT Department - University of Turku (@TurkuNLP)

Top repositories

1

Turku-neural-parser-pipeline

A neural parsing pipeline for segmentation, morphological tagging, dependency parsing and lemmatization with pre-trained models for more than 50 languages. Top ranker in the CoNLL-18 Shared Task.
Python
103
star
2

FinBERT

BERT model trained from scratch on Finnish
Shell
86
star
3

Finnish-dep-parser

The Finnish dependency parsing pipeline being developed by the Turku NLP group. Documentation:
Python
49
star
4

wikibert

BERT models for many languages created from Wikipedia texts
31
star
5

Text_Mining_Course

Stuff for the Text Mining course
Jupyter Notebook
25
star
6

ocr-correction

Post-processing OCR errors with seq2seq models
Python
25
star
7

finngen-tools

Tools for training causal language models for Finnish
Python
25
star
8

Deep_Learning_in_LangTech_course

Materials for the University of Turku course TKO_8965 Deep Learning in Human Language Technology (previously named TKO_2101 Natural Language Processing)
Jupyter Notebook
14
star
9

bert-eval

Python
9
star
10

turku-ner-corpus

Open broad-coverage corpus for Finnish named entity recognition.
Python
9
star
11

turku-one

Turku OntoNotes Entities Corpus (TurkuONE)
8
star
12

pubmed_parses

Syntactic parses and named entity recognition for PubMed abstracts and PubMed Central full documents
8
star
13

finnish-generative-model-eval

Evaluation of Finnish generative models
Python
6
star
14

class-explainer

Python
5
star
15

IR_Course

Stuff for the upcoming IR course 2017
Jupyter Notebook
5
star
16

Finnish_PropBank

Finnish Proposition Bank
CSS
4
star
17

intro-to-nlp

Introduction to Natural Language Processing
Jupyter Notebook
4
star
18

register-labeling

Python
4
star
19

Turku-paraphrase-corpus

Python
3
star
20

biBERT

Finnish English bilingual BERT models
3
star
21

BINF_Programming

Stuff for the BINF programming course (@fginter)
Jupyter Notebook
3
star
22

ATP_kurssi

Jupyter Notebook
3
star
23

CAFA3

University of Turku CAFA3 project
Python
3
star
24

conll17-system

Instructions for TurkuNLP system in CoNLL 2017 Shared Task on Multilingual Parsing from Raw Text to Universal Dependencies.
Shell
2
star
25

WAC-XII

Data presented in the paper "From Web Crawl to Clean Register-Annotated Corpora"
2
star
26

textual-data-analysis-course

Jupyter Notebook
2
star
27

DIKI1002-Working-with-Text-in-Python

Jupyter Notebook
2
star
28

multilingual-register-labeling

Multilingual, multilabel modeling of registers
Python
2
star
29

FinCORE

Finnish Corpus of Online REgisters
Python
2
star
30

BioCreativeVI_BioID_assignment

Python
2
star
31

BioCreativeVI_CHEMPROT_RE

Deep learning-based systems for biomedical relation extraction: recognizing the statements of relations between chemical compounds/drugs and genes/proteins from biomedical literature. The code is developed for our participation in the BioCreative VI Task 5 (CHEMPROT) challenge. Contact: [email protected]
Python
2
star
32

Corpus-linguistics

Code and data for the examples and use cases described in the article "Määrällinen korpuslingvistiikka" to be published in the book "Kielentutkimuksen metodologian käsikirja" in Finnish.
Python
2
star
33

korona-tweets

stuff for our korona-tweets
Python
1
star
34

ocr_errors_simulator

Functions and codes used to determine probabilities on OCR errors and simulate them
Python
1
star
35

Digi_menetelmat

Johdatus digitaalisiin ihmistieteisiin -kurssin työpaja "Digitaaliset ihmistieteet kielentutkimuksessa: tekstinlouhinta"
Python
1
star
36

registerlabeling

Python
1
star
37

Cell-line-recognition

Cell line names recognition and normalization
CSS
1
star
38

Multilingual-register-corpora

French Corpus of Online REgisters (FreCORE) and Swedish Corpus of Online REgisters (SweCORE)
1
star
39

BHE

End-to-end System for Bacteria Habitat Extraction: Named-entity recognition (NER), named-entity normalization, relation extraction. email: [email protected]
Python
1
star
40

dolly-fi

Finnish version of databricks-dolly-15k instruction dataset
Python
1
star
41

sentiment-target-corpus

Targeted sentiment corpus
1
star
42

dep_search

JavaScript
1
star
43

deepfin-tools

DeepFin tools
Python
1
star
44

SRNNMT

Sentence representation for translation finding
Python
1
star
45

CORE-corpus

1
star
46

oasst-fi

Open Assistant dataset translated to Finnish
Python
1
star
47

DigiHum16

Random course notes for the DigiHum16 course
Jupyter Notebook
1
star
48

wikipedia-toxicity-data-fi

Python
1
star
49

toxicity-classifier

Repository for all things related to classifying whether a text is toxic or not using data from https://github.com/TurkuNLP/wikipedia-toxicity-data-fi
Python
1
star
50

PB_solr

Work towards indexing the Finnish Parsebank in SOLR
Python
1
star
51

TDT_editor

The tree editor used to annotate the Turku Dependency Treebank. Vintage code, but putting it online in case someone finds it in any way useful.
Python
1
star
52

pytorch-registerlabeling

Jupyter Notebook
1
star