Discover

Stars
1
Language
HTML
License
GNU General Publi...
Created over 8 years ago
Updated over 8 years ago

ajinkyakulkarni14

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

The first goal for Language identification is to build a classifier which can convert from a sequence of characters into a classification score for languages. Suppose that we have an input sequence x (text data) and a desired output is y (Language ID). For creation of training corpus, Leipzig corpus extracted and cleaned. Afterwards, each characters in sentences is mapped to unique character id. For RNN - LSTM architecture training, sequence of character ids as inputs and output is class labels of language. We used 30K sentences for each language with 2 hidden layers of 200 nodes. It took 5 days to train the network with error rate of 3.48% for 9 European languages.

TED-Multilingual-Parallel-Corpus

TED parallel Corpora is growing collection of Bilingual parallel corpora, Multilingual parallel corpora and Monolingual corpora extracted from TED talks www.ted.com for 109 world languages.

239

ERISHA

ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available.

Python

How-I-Extracted-TED-talks-for-parallel-Corpus-

Jupyter Notebook

NAR_TTS_samples_interspeech_2022

Non-autoregressive TTS systems for expressivity transfer. Results submitted for Interspeech 2022

RNN_Machine_Transliteration-

Jupyter Notebook