CLARIN.SI (@clarinsi)

Top repositories

1

reldi-tagger

A tagger and lemmatiser for Croatian, Serbian and Slovene.
Python
32
star
2

csmtiser

A tool for text normalisation via character-level machine translation
Python
13
star
3

tweetcat

TweetCaT - a tool for building Twitter corpora of smaller languages or specific geographical regions
Python
12
star
4

geobert

Python
12
star
5

reldi-lib

Python
9
star
6

vejice

Python
6
star
7

megahr-crossling

Predictions on concreteness and imageability of words in 77 languages
C
6
star
8

Slovene_ASR_e2e

Automatic Speech Recognition tool
Python
6
star
9

reldi-tokeniser

A two-mode (standard, nonstandard) tokeniser for South Slavic languages
Python
5
star
10

mte-msd

MULTEXT-East morphosyntactic specifications
HTML
5
star
11

janes-ner

NER system for South Slavic languages
Python
4
star
12

redi

Diacritic restoration tool for Croatian, Serbian and Slovene
Python
4
star
13

babushka-bench

Benchmarking NLP tools on Slovene, Croatian and Serbian
Python
4
star
14

tweetgeo

A Tool for Collecting, Visualising and Inferring from Geo-encoded Linguistic Data
Python
3
star
15

TEI-schema

Recommended TEI schema for CLARIN.SI resources, cf. also https://clarinsi.github.io/TEI-schema/
XSLT
2
star
16

reldi-api

Python
2
star
17

parlaspeech

Code for bootstrapping ASR datasets from parliamentary recordings and transcripts
Jupyter Notebook
2
star
18

Slovene_NMT

Neural Machine Translation tool
Python
2
star
19

slovene_syllable_splitter

A rule-based syllable splitter for Slovene that takes an input word and returns a list of syllables in the word, e.g. predsedovati -> ['pred', 'se', 'do', 'va', 'ti']; decembrskega -> ['de', 'cem', 'brs', 'ke', 'ga'].
Python
2
star
20

reldi-depparse

HTML
1
star
21

classla-spoken

Shell
1
star
22

jos2ud

Perl
1
star
23

cordex

Python
1
star
24

Obeliks4J

Java
1
star
25

wikitalk-extractor

A corpus extractor from the Wikipedia page and user talk pages
Python
1
star
26

benchich

BENCHić - the benchmark for Bosnian, Croatian, Montenegrin, Serbian (and friends)
Python
1
star
27

sb-abbr

NLP dataset of the Slovenian Biography
XSLT
1
star
28

drevesnik

Web portal for searching and displaying syntacically annotated corpora
JavaScript
1
star