• Stars
    star
    32
  • Rank 778,876 (Top 16 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created almost 8 years ago
  • Updated over 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A tagger and lemmatiser for Croatian, Serbian and Slovene.

More Repositories

1

csmtiser

A tool for text normalisation via character-level machine translation
Python
13
star
2

tweetcat

TweetCaT - a tool for building Twitter corpora of smaller languages or specific geographical regions
Python
12
star
3

geobert

Python
12
star
4

reldi-lib

Python
9
star
5

vejice

Python
6
star
6

megahr-crossling

Predictions on concreteness and imageability of words in 77 languages
C
6
star
7

Slovene_ASR_e2e

Automatic Speech Recognition tool
Python
6
star
8

reldi-tokeniser

A two-mode (standard, nonstandard) tokeniser for South Slavic languages
Python
5
star
9

mte-msd

MULTEXT-East morphosyntactic specifications
HTML
5
star
10

janes-ner

NER system for South Slavic languages
Python
4
star
11

redi

Diacritic restoration tool for Croatian, Serbian and Slovene
Python
4
star
12

babushka-bench

Benchmarking NLP tools on Slovene, Croatian and Serbian
Python
4
star
13

tweetgeo

A Tool for Collecting, Visualising and Inferring from Geo-encoded Linguistic Data
Python
3
star
14

TEI-schema

Recommended TEI schema for CLARIN.SI resources, cf. also https://clarinsi.github.io/TEI-schema/
XSLT
2
star
15

reldi-api

Python
2
star
16

parlaspeech

Code for bootstrapping ASR datasets from parliamentary recordings and transcripts
Jupyter Notebook
2
star
17

Slovene_NMT

Neural Machine Translation tool
Python
2
star
18

slovene_syllable_splitter

A rule-based syllable splitter for Slovene that takes an input word and returns a list of syllables in the word, e.g. predsedovati -> ['pred', 'se', 'do', 'va', 'ti']; decembrskega -> ['de', 'cem', 'brs', 'ke', 'ga'].
Python
2
star
19

reldi-depparse

HTML
1
star
20

classla-spoken

Shell
1
star
21

jos2ud

Perl
1
star
22

cordex

Python
1
star
23

Obeliks4J

Java
1
star
24

wikitalk-extractor

A corpus extractor from the Wikipedia page and user talk pages
Python
1
star
25

benchich

BENCHić - the benchmark for Bosnian, Croatian, Montenegrin, Serbian (and friends)
Python
1
star
26

sb-abbr

NLP dataset of the Slovenian Biography
XSLT
1
star
27

drevesnik

Web portal for searching and displaying syntacically annotated corpora
JavaScript
1
star