• Stars
    star
    1
  • Language
    JavaScript
  • Created over 1 year ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Web portal for searching and displaying syntacically annotated corpora

More Repositories

1

reldi-tagger

A tagger and lemmatiser for Croatian, Serbian and Slovene.
Python
32
star
2

csmtiser

A tool for text normalisation via character-level machine translation
Python
13
star
3

tweetcat

TweetCaT - a tool for building Twitter corpora of smaller languages or specific geographical regions
Python
12
star
4

geobert

Python
12
star
5

reldi-lib

Python
9
star
6

vejice

Python
6
star
7

megahr-crossling

Predictions on concreteness and imageability of words in 77 languages
C
6
star
8

Slovene_ASR_e2e

Automatic Speech Recognition tool
Python
6
star
9

reldi-tokeniser

A two-mode (standard, nonstandard) tokeniser for South Slavic languages
Python
5
star
10

mte-msd

MULTEXT-East morphosyntactic specifications
HTML
5
star
11

janes-ner

NER system for South Slavic languages
Python
4
star
12

redi

Diacritic restoration tool for Croatian, Serbian and Slovene
Python
4
star
13

babushka-bench

Benchmarking NLP tools on Slovene, Croatian and Serbian
Python
4
star
14

tweetgeo

A Tool for Collecting, Visualising and Inferring from Geo-encoded Linguistic Data
Python
3
star
15

TEI-schema

Recommended TEI schema for CLARIN.SI resources, cf. also https://clarinsi.github.io/TEI-schema/
XSLT
2
star
16

reldi-api

Python
2
star
17

parlaspeech

Code for bootstrapping ASR datasets from parliamentary recordings and transcripts
Jupyter Notebook
2
star
18

Slovene_NMT

Neural Machine Translation tool
Python
2
star
19

slovene_syllable_splitter

A rule-based syllable splitter for Slovene that takes an input word and returns a list of syllables in the word, e.g. predsedovati -> ['pred', 'se', 'do', 'va', 'ti']; decembrskega -> ['de', 'cem', 'brs', 'ke', 'ga'].
Python
2
star
20

reldi-depparse

HTML
1
star
21

classla-spoken

Shell
1
star
22

jos2ud

Perl
1
star
23

cordex

Python
1
star
24

Obeliks4J

Java
1
star
25

wikitalk-extractor

A corpus extractor from the Wikipedia page and user talk pages
Python
1
star
26

benchich

BENCHić - the benchmark for Bosnian, Croatian, Montenegrin, Serbian (and friends)
Python
1
star
27

sb-abbr

NLP dataset of the Slovenian Biography
XSLT
1
star