• Stars
    star
    108
  • Rank 321,259 (Top 7 %)
  • Language
    Shell
  • License
    Creative Commons ...
  • Created over 3 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

An expanded version of the previously released Kazakh text-to-speech (KazakhTTS) synthesis corpus. In KazakhTTS2, the overall size has increased from 93 hours to 271 hours, the number of speakers has risen from two to five (three females and two males), and the topic coverage has been diversified.

KazakhTTS RECIPE

This is the recipe of Kazakh text-to-speech model based on KazakhTTS and KazakhTTS2 corpora.

Setup and Requirements

Our code builds upon ESPnet, and requires prior installation of the framework. Please follow the installation guide and put the KazakhTTS folder inside espnet/egs2/ directory:

cd espnet/egs2
git clone https://github.com/IS2AI/Kazakh_TTS.git

Go to Kazakh_TTS/tts1 folder and create links to the dependencies:

ln -s ../../TEMPLATE/tts1/path.sh .
ln -s ../../TEMPLATE/asr1/pyscripts .
ln -s ../../TEMPLATE/asr1/scripts .
ln -s ../../../tools/kaldi/egs/wsj/s5/steps .
ln -s ../../TEMPLATE/tts1/tts.sh .
ln -s ../../../tools/kaldi/egs/wsj/s5/utils .

Downloading the dataset

Download KazakhTTS dataset and untar in the directory of your choice. Specify the path to the dataset directory (where Audio/Transcripts dirs are located) inside KazakhTTS/tts1/local/data.sh script:

db_root=/path-to-speaker-folder

For example db_root=/home/datasets/ISSAI_KazakhTTS/M1/Books

Training

To train the models, run the script ./run.sh inside KazakhTTS/tts1/ folder. GPU and RAM specifications can be found in the configuration (conf/) folder.

./run.sh --stage 1 --stop_stage 6 --train_config conf/train.yaml 

If you would like to train fastspeech/transformer models, change train_config=conf/train.yaml accordingly. The detailed description of each stage are documented in ESPNet's repository.

Pretrained models

The model was developed by the Institute of Smart Systems and Artificial Intelligence, Nazarbayev University Kazakhstan (henceforth ISSAI).

Please use the model only for a good cause and in a wise manner. You must not use the model to generate data that are obscene, offensive, or contain any discrimination with regard to religion, sex, race, language or territory of origin.

ISSAI appreciates and requires attribution. An attribution should include the title of the original paper, the author, and the name of the organization under which the development of the model took place. For example:

Mussakhojayeva, S., Janaliyeva, A., Mirzakhmetov, A., Khassanov, Y., Varol, H.A. (2021) KazakhTTS: An Open-Source Kazakh Text-to-Speech Synthesis Dataset. Proc. Interspeech 2021, 2786-2790, doi: 10.21437/Interspeech.2021-2124. The Institute of Smart Systems and Artificial Intelligence (issai.nu.edu.kz), Nazarbayev University, Kazakhstan

kaztts_female1_tacotron2_train.loss.ave

kaztts_female2_tacotron2_train.loss.ave

kaztts_female3_tacotron2_train.loss.ave

kaztts_male1_tacotron2_train.loss.ave

kaztts_male2_tacotron2_train.loss.ave

Pretrained vocoders

parallelwavegan_female1_checkpoint

parallelwavegan_female2_checkpoint

parallelwavegan_female3_checkpoint

parallelwavegan_male1_checkpoint

parallelwavegan_male2_checkpoint

Speech synthesis

You can synthesize an arbitrary text using synthesize.py script. Modify the following lines in the script:

## specify the path to vocoder's checkpoint, i.e
vocoder_checkpoint="exp/vocoder/checkpoint-400000steps.pkl"

## specify path to the main model(transformer/tacotron2/fastspeech) and its config file
config_file = "exp/tts_train_raw_char/config.yaml"
model_path = "exp/tts_train_raw_char/train.loss.ave_5best.pth"

Now you can run the script using an arbitrary text, for example:

python synthesize.py --text "бүгінде өңірде тағы бес жобаның құрылысы жүргізілуде."

The generated file will be saved in tts1/synthesized_wavs folder.

Citation

@inproceedings{mussakhojayeva21_interspeech,
  author={Saida Mussakhojayeva and Aigerim Janaliyeva and Almas Mirzakhmetov and Yerbolat Khassanov and Huseyin Atakan Varol},
  title={{KazakhTTS: An Open-Source Kazakh Text-to-Speech Synthesis Dataset}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={2786--2790},
  doi={10.21437/Interspeech.2021-2124}
}

More Repositories

1

SpeakingFaces

A large-scale publicly-available visual-thermal-audio dataset designed to encourage research in the general areas of user authentication, facial recognition, speech recognition, and human-computer interaction.
Python
79
star
2

TurkicASR

A multilingual ASR model that can recognize ten Turkic languages—Azerbaijani, Bashkir, Chuvash, Kazakh, Kyrgyz, Sakha, Tatar, Turkish, Uyghur, and Uzbek.
Python
51
star
3

ISSAI_SAIDA_Kazakh_ASR

the first industrial-scale open-source Kazakh speech corpus. KSC2 corpus subsumes the previously introduced two corpora: KSC and KazakhTTS2 and supplements additional data from other sources. KSC2 contains around 1.2k hours of high-quality transcribed data comprising over 600k utterances.
Shell
43
star
4

TurkicTTS

A multilingual text-to-speech synthesis system for ten lower-resourced Turkic languages: Azerbaijani, Bashkir, Kazakh, Kyrgyz, Sakha, Tatar, Turkish, Turkmen, Uyghur, and Uzbek.
Python
39
star
5

thermal-facial-landmarks-detection

SF-TL54: Thermal Facial Landmark Dataset with Visual Pairs.
Jupyter Notebook
34
star
6

kaz-image-captioning

ExpansionNet v2 model trained on the COCO dataset with captions translated into Kazakh
Jupyter Notebook
31
star
7

KazNERD

An open-source Kazakh named entity recognition dataset (KazNERD), annotation guidelines, and baseline NER models.
Python
22
star
8

TFW

TFW: Annotated Thermal Faces in the Wild Dataset
Jupyter Notebook
20
star
9

KazEmoTTS

An open-source Kazakh Emotional Text-to-Speech Dataset
Python
17
star
10

telegram-bot-chatgpt

Telegram bot to interact with ChatGPT via voice messages
Python
16
star
11

Chest-X-ray-module

Leveraging the recent advances in machine learning and availability of public medical imaging datasets, we created a Free Online X-Ray Diagnostic Tool using deep learning that can determine the X-ray type and visualize the pathology.
Python
14
star
12

tutorial_indoor_localization_WiFine

In this tutorial, we will load, preprocess a simplified version of the WiFine dataset. The data will be used to train a location prediction model based (a random forest regressor and a multilayer perceptron)
Jupyter Notebook
12
star
13

trimodal_person_verification

This repository contains code and data for "On the Multimodal Person Verification Using Audio-Visual-Thermal Data"
Python
11
star
14

Central-Asian-Food-Dataset

42 food classes from Kazakh National and Central Asian cuisine
Python
11
star
15

MultilingualASR

Shell
10
star
16

Kazakh_ASR

Shell
10
star
17

Kazakh-Speech-Commands-Dataset

Kazakh Speech Commands Dataset
Jupyter Notebook
9
star
18

faces-in-event-streams

This repo contains code and instructions for the detection of faces in event streams
Python
9
star
19

COVID-19-Simulator

Covid Epidemic Simulator
JavaScript
9
star
20

Uzbek_ASR

Shell
9
star
21

IMUWiFine

Python
7
star
22

Soyle

Python
5
star
23

Shear-Design-Optimization-of-RC-Column

Deep Neural Network model for the automatic design of rectangular reinforced concrete columns under axial load, biaxial bending and shear forces.
Python
4
star
24

AnyFace

Input-Agnostic Face Detection
Jupyter Notebook
4
star
25

Particle-Based-COVID19-Simulator

Particle-based COVID-19 Simulator with Contact Tracing and Testing
MATLAB
3
star
26

tutorial_COVID-19_epidemic_simulator

The workshop materials for Epidemic simulator and indoor Wi-Fi localization projects.
Python
3
star
27

KazParC

An open-source parallel corpus for machine translation across Kazakh, English, Russian, and Turkish
Jupyter Notebook
3
star
28

WiFine

A finer-level sequential dataset of WiFi received signal strengths (RSS) and corresponding (x, y, z) positions.
3
star
29

CLTL_Turkic_ASR

Automatic Speech Recognition for Turkic Languages Using Cross-Lingual Transfer Learning from Kazakh
Shell
2
star
30

Column-Design-Optimization

Column design optimization
Python
2
star
31

ExoMem-AR-Memory

ExoMem: Augmented Reality based human memory enhancement system using AI
C#
2
star
32

KazSAnDRA

An open-source Kazakh Sentiment Analysis Dataset of Reviews and Attitudes (KazSAnDRA) and baseline sentiment classification models
Python
2
star
33

city-identification

This repo contains dataset and models for city classification
Python
1
star
34

KazQAD

An open-source Kazakh Question Answering Dataset
1
star
35

AD_classifier

Jupyter Notebook
1
star
36

Vision-Language-Models-for-Activity-Recognition-and-Abnormality-Detection-for-Elderly

VLM PrismerZ model for recognition of emergency and non-emergneyc situations via vision and language transformers. PrismerZ is directed on understanding the contextual information and completing image captioning and visiom qiestion answering tasks.
1
star
37

city-sustainability-indexes

This repo contains code and models for detecting city sustainability indexes
Python
1
star
38

TatarTTS

TatarTTS: An Open-Source Text-to-Speech Synthesis Dataset for the Tatar Language
1
star
39

cargoxray

It is a dataset of X-ray images of cargo transport. The dataset includes images of railcars and trucks with trailers.
Python
1
star
40

RL_PTZ_Coverage

Reinforcement learning algorithms for PTZ (pan-tilt-zoom) system with surveillance camera
Python
1
star