• Stars
    star
    8
  • Rank 2,099,232 (Top 42 %)
  • Language
    Python
  • License
    GNU General Publi...
  • Created almost 4 years ago
  • Updated over 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

More Repositories

1

reldi-tagger

A tagger and lemmatiser for Croatian, Serbian and Slovene.
Python
34
star
2

geobert

Python
14
star
3

csmtiser

A tool for text normalisation via character-level machine translation
Python
13
star
4

tweetcat

TweetCaT - a tool for building Twitter corpora of smaller languages or specific geographical regions
Python
12
star
5

reldi-lib

Python
9
star
6

megahr-crossling

Predictions on concreteness and imageability of words in 77 languages
C
6
star
7

Slovene_ASR_e2e

Automatic Speech Recognition tool
Python
6
star
8

reldi-tokeniser

A two-mode (standard, nonstandard) tokeniser for South Slavic languages
Python
5
star
9

mte-msd

MULTEXT-East morphosyntactic specifications
HTML
5
star
10

janes-ner

NER system for South Slavic languages
Python
4
star
11

redi

Diacritic restoration tool for Croatian, Serbian and Slovene
Python
4
star
12

babushka-bench

Benchmarking NLP tools on Slovene, Croatian and Serbian
Python
4
star
13

tweetgeo

A Tool for Collecting, Visualising and Inferring from Geo-encoded Linguistic Data
Python
3
star
14

TEI-schema

Recommended TEI schema for CLARIN.SI resources, cf. also https://clarinsi.github.io/TEI-schema/
XSLT
2
star
15

parlaspeech

Code for bootstrapping ASR datasets from parliamentary recordings and transcripts
Jupyter Notebook
2
star
16

reldi-api

Python
2
star
17

Slovene_NMT

Neural Machine Translation tool
Python
2
star
18

slovene_syllable_splitter

A rule-based syllable splitter for Slovene that takes an input word and returns a list of syllables in the word, e.g. predsedovati -> ['pred', 'se', 'do', 'va', 'ti']; decembrskega -> ['de', 'cem', 'brs', 'ke', 'ga'].
Python
2
star
19

reldi-depparse

HTML
1
star
20

classla-spoken

Shell
1
star
21

jos2ud

Perl
1
star
22

cordex

Python
1
star
23

Obeliks4J

Java
1
star
24

wikitalk-extractor

A corpus extractor from the Wikipedia page and user talk pages
Python
1
star
25

benchich

BENCHić - the benchmark for Bosnian, Croatian, Montenegrin, Serbian (and friends)
Python
1
star
26

sb-abbr

NLP dataset of the Slovenian Biography
XSLT
1
star
27

drevesnik

Web portal for searching and displaying syntacically annotated corpora
JavaScript
1
star