Discover @ajinkyakulkarni14 Open Source projects

Ajinkya Kulkarni (@ajinkyakulkarni14)

ajinkyakulkarni14

Stars
337
Global Rank 81,046 (Top 3 %)
Followers 50
Following 12
Registered about 11 years ago
Most used languages

Jupyter Notebook
57.1 %

Python
28.6 %

HTML
14.3 %
Location 🇫🇷 France
Country Total Rank 2,210
Country Ranking

Jupyter Notebook
254

Python
1,474

HTML
5,656

TED-Multilingual-Parallel-Corpus

TED parallel Corpora is growing collection of Bilingual parallel corpora, Multilingual parallel corpora and Monolingual corpora extracted from TED talks www.ted.com for 109 world languages.

ERISHA

ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available.

How-I-Extracted-TED-talks-for-parallel-Corpus-

Jupyter Notebook

NAR_TTS_samples_interspeech_2022

Non-autoregressive TTS systems for expressivity transfer. Results submitted for Interspeech 2022

RNN_Machine_Transliteration-

Jupyter Notebook

Textgrid-Parser

Audio-Book-Corpus-for-European-Languages-

Audio Book Corpus (ABC) project has been developed to aid linguist researchers in the field of text to speech for purely academic purposes. In the current form, the corpus consists approximately 200 minutes of speech data in German language. Besides German, we are also in the process of developing Corpus Portuguese and Italian langugae. Future versions of the corpus shall encompass most European languages such as French, Spanish, Czech, Dutch, Polish, Romanian.

Jupyter Notebook

lightning-ssl

Self-Supervised methods implemented with PyTorch Lightning

Jupyter Notebook

Recurrent-Neural-Network-for-Language-Identification

The first goal for Language identification is to build a classifier which can convert from a sequence of characters into a classification score for languages. Suppose that we have an input sequence x (text data) and a desired output is y (Language ID). For creation of training corpus, Leipzig corpus extracted and cleaned. Afterwards, each characters in sentences is mapped to unique character id. For RNN - LSTM architecture training, sequence of character ids as inputs and output is class labels of language. We used 30K sentences for each language with 2 hidden layers of 200 nodes. It took 5 days to train the network with error rate of 3.48% for 9 European languages.