• Stars
    star
    3
  • Rank 3,963,521 (Top 79 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created over 8 years ago
  • Updated about 8 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A Tool for Collecting, Visualising and Inferring from Geo-encoded Linguistic Data

More Repositories

1

reldi-tagger

A tagger and lemmatiser for Croatian, Serbian and Slovene.
Python
34
star
2

geobert

Python
14
star
3

csmtiser

A tool for text normalisation via character-level machine translation
Python
13
star
4

tweetcat

TweetCaT - a tool for building Twitter corpora of smaller languages or specific geographical regions
Python
12
star
5

reldi-lib

Python
9
star
6

vejice

Python
8
star
7

megahr-crossling

Predictions on concreteness and imageability of words in 77 languages
C
6
star
8

Slovene_ASR_e2e

Automatic Speech Recognition tool
Python
6
star
9

reldi-tokeniser

A two-mode (standard, nonstandard) tokeniser for South Slavic languages
Python
5
star
10

mte-msd

MULTEXT-East morphosyntactic specifications
HTML
5
star
11

janes-ner

NER system for South Slavic languages
Python
4
star
12

redi

Diacritic restoration tool for Croatian, Serbian and Slovene
Python
4
star
13

babushka-bench

Benchmarking NLP tools on Slovene, Croatian and Serbian
Python
4
star
14

TEI-schema

Recommended TEI schema for CLARIN.SI resources, cf. also https://clarinsi.github.io/TEI-schema/
XSLT
2
star
15

parlaspeech

Code for bootstrapping ASR datasets from parliamentary recordings and transcripts
Jupyter Notebook
2
star
16

reldi-api

Python
2
star
17

Slovene_NMT

Neural Machine Translation tool
Python
2
star
18

slovene_syllable_splitter

A rule-based syllable splitter for Slovene that takes an input word and returns a list of syllables in the word, e.g. predsedovati -> ['pred', 'se', 'do', 'va', 'ti']; decembrskega -> ['de', 'cem', 'brs', 'ke', 'ga'].
Python
2
star
19

reldi-depparse

HTML
1
star
20

classla-spoken

Shell
1
star
21

jos2ud

Perl
1
star
22

cordex

Python
1
star
23

Obeliks4J

Java
1
star
24

wikitalk-extractor

A corpus extractor from the Wikipedia page and user talk pages
Python
1
star
25

benchich

BENCHić - the benchmark for Bosnian, Croatian, Montenegrin, Serbian (and friends)
Python
1
star
26

sb-abbr

NLP dataset of the Slovenian Biography
XSLT
1
star
27

drevesnik

Web portal for searching and displaying syntacically annotated corpora
JavaScript
1
star