• Stars
    star
    4
  • Rank 3,226,905 (Top 65 %)
  • Language
    Python
  • Created over 5 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Benchmarking NLP tools on Slovene, Croatian and Serbian

More Repositories

1

reldi-tagger

A tagger and lemmatiser for Croatian, Serbian and Slovene.
Python
32
star
2

csmtiser

A tool for text normalisation via character-level machine translation
Python
13
star
3

tweetcat

TweetCaT - a tool for building Twitter corpora of smaller languages or specific geographical regions
Python
12
star
4

geobert

Python
12
star
5

reldi-lib

Python
9
star
6

vejice

Python
6
star
7

megahr-crossling

Predictions on concreteness and imageability of words in 77 languages
C
6
star
8

Slovene_ASR_e2e

Automatic Speech Recognition tool
Python
6
star
9

reldi-tokeniser

A two-mode (standard, nonstandard) tokeniser for South Slavic languages
Python
5
star
10

mte-msd

MULTEXT-East morphosyntactic specifications
HTML
5
star
11

janes-ner

NER system for South Slavic languages
Python
4
star
12

redi

Diacritic restoration tool for Croatian, Serbian and Slovene
Python
4
star
13

tweetgeo

A Tool for Collecting, Visualising and Inferring from Geo-encoded Linguistic Data
Python
3
star
14

TEI-schema

Recommended TEI schema for CLARIN.SI resources, cf. also https://clarinsi.github.io/TEI-schema/
XSLT
2
star
15

reldi-api

Python
2
star
16

parlaspeech

Code for bootstrapping ASR datasets from parliamentary recordings and transcripts
Jupyter Notebook
2
star
17

Slovene_NMT

Neural Machine Translation tool
Python
2
star
18

slovene_syllable_splitter

A rule-based syllable splitter for Slovene that takes an input word and returns a list of syllables in the word, e.g. predsedovati -> ['pred', 'se', 'do', 'va', 'ti']; decembrskega -> ['de', 'cem', 'brs', 'ke', 'ga'].
Python
2
star
19

reldi-depparse

HTML
1
star
20

classla-spoken

Shell
1
star
21

jos2ud

Perl
1
star
22

cordex

Python
1
star
23

Obeliks4J

Java
1
star
24

wikitalk-extractor

A corpus extractor from the Wikipedia page and user talk pages
Python
1
star
25

benchich

BENCHić - the benchmark for Bosnian, Croatian, Montenegrin, Serbian (and friends)
Python
1
star
26

sb-abbr

NLP dataset of the Slovenian Biography
XSLT
1
star
27

drevesnik

Web portal for searching and displaying syntacically annotated corpora
JavaScript
1
star