Language Technology at the University of Helsinki (@Helsinki-NLP)

Top repositories

1

Tatoeba-Challenge

Makefile
762
star
2

Opus-MT

Open neural machine translation models and web services
Python
531
star
3

OPUS-MT-train

Training open neural machine translation models
Makefile
304
star
4

prosody

Helsinki Prosody Corpus and A System for Predicting Prosodic Prominence from Text
Python
224
star
5

OpusFilter

OpusFilter - Parallel corpus processing toolkit
Python
93
star
6

HBMP

Sentence Embeddings in NLI with Iterative Refinement Encoders
Python
77
star
7

OpusTools

Python
61
star
8

OPUS-CAT

OPUS-CAT is a collection of software which make it possible to OPUS-MT neural machine translation models in professional translation. OPUS-CAT includes a local offline MT engine and a collection of CAT tool plugins.
C#
60
star
9

XED

XED multilingual emotion datasets
Jupyter Notebook
53
star
10

OPUS

The Open Parallel Corpus
JavaScript
42
star
11

UkrainianLT

A collection of links to Ukrainian language tools
26
star
12

mammoth

MAMMOTH: MAssively Multilingual Modular Open Translation @ Helsinki
Python
18
star
13

OPUS-translator

Translation demonstrator
Smalltalk
18
star
14

MuCoW

Automatically harvested multilingual contrastive word sense disambiguation test sets for machine translation
Python
15
star
15

subalign

Perl
14
star
16

sentimentator

Tool for sentiment analysis annotation
HTML
11
star
17

OPUS-MT-testsets

benchmarks for evaluating MT models
Smalltalk
9
star
18

OpusTools-perl

Perl
5
star
19

neural-search-tutorials

Additional Notebooks for the Building NLP Applications course
Jupyter Notebook
5
star
20

nli-data-sanity-check

Data and scripts for a diagnostics test suite which allows to assess whether an NLU dataset constitutes a good testbed for evaluating the models' meaning understanding capabilities.
Jupyter Notebook
5
star
21

OPUS-interface

OPUS repository interface
Python
5
star
22

OPUS-ingest

Makefile
4
star
23

LanguageCodes

Perl
4
star
24

shroom

SCSS
4
star
25

OPUS-repository

Perl
3
star
26

doclevel-MT-benchmark

Document-level Machine Translation Benchmark
Python
3
star
27

Uplug

HTML
3
star
28

americasnlp2021-st

AmericasNLP 2021 shared task
JavaScript
3
star
29

Geometry

Python
2
star
30

shared-info

2
star
31

LSDC

Low-Saxon Dialect Classification
2
star
32

pdf2xml

Perl
2
star
33

Syntactic_Debiasing

Python
2
star
34

OpusFilter-hub

A hub of OpusFilter configurations
Python
1
star
35

OPUS-index

Index of resources in OPUS
1
star
36

NLU-Course-2020

Python
1
star
37

SELF-FEIL

Emotion Lexicons for Finnish
1
star
38

OPUS-MT-dashboard

PHP
1
star
39

External-MT-leaderboard

Leaderboards for external MT models
1
star
40

nlu-dataset-diagnostics

This repository contains data and scripts to reproduce the results from our paper: How Does Data Corruption Affect Natural Language Understanding Models? A Study on GLUE datasets.
Python
1
star
41

en-fi-testsuite

WMT18 Testsuite for Finnish morphology
Python
1
star
42

finlandsvensk-AI

1
star
43

ndc-aligned

Word-aligned version of the Norwegian Dialect Corpus
Python
1
star
44

OPUS-website

OPUS website files
1
star
45

OPUS-MT-leaderboard-recipes

Makefile recipes shared between all leaderboard repos
Makefile
1
star
46

OPUS-MT-leaderboard

1
star
47

murreviikko

Dialectologically annotated and normalized dataset of dialectal Finnish tweets
Python
1
star
48

Sami-MT

machine translation for Sรกmi languages
1
star