• Stars
    star
    191
  • Rank 202,877 (Top 4 %)
  • Language
    Jupyter Notebook
  • License
    MIT License
  • Created over 4 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Speech Toolkit for Malaysian language, https://malaya-speech.readthedocs.io/

logo

Pypi version Python3 version MIT License total stats download stats / month discord


Malaya-Speech is a Speech-Toolkit library for Malaysian language, powered by Tensorflow and PyTorch.

Documentation

Stable released documentation is available at https://malaya-speech.readthedocs.io/en/stable/

Installing from the PyPI

$ pip install malaya-speech

It will automatically install all dependencies except for Tensorflow and PyTorch. So you can choose your own Tensorflow CPU / GPU version and PyTorch CPU / GPU version.

Only Python >= 3.6.0, Tensorflow >= 1.15.0, and PyTorch >= 1.10 are supported.

Development Release

Install from master branch,

$ pip install git+https://github.com/huseinzol05/malaya-speech.git

We recommend to use virtualenv for development.

While development released documentation is available at https://malaya-speech.readthedocs.io/en/latest/

Features

  • Age Detection, detect age in speech using Finetuned Speaker Vector.
  • Speaker Diarization, diarizing speakers using Pretrained Speaker Vector.
  • Emotion Detection, detect emotions in speech using Finetuned Speaker Vector.
  • Force Alignment, generate a time-aligned transcription of an audio file using RNNT, Wav2Vec2 CTC and Whisper Seq2Seq.
  • Gender Detection, detect genders in speech using Finetuned Speaker Vector.
  • Clean speech Detection, detect clean speech using Finetuned Speaker Vector.
  • Language Detection, detect hyperlocal languages in speech using Finetuned Speaker Vector.
  • Language Model, using KenLM, Masked language model using BERT and RoBERTa, and GPT2 to do ASR decoder scoring.
  • Multispeaker Separation, Multispeaker separation using FastSep on 8k Wav.
  • Noise Reduction, reduce multilevel noises using STFT UNET.
  • Speaker Change Detection, detect changing speakers using Finetuned Speaker Vector.
  • Speaker Count Detection, detect number of speakers using Finetuned Speaker Vector.
  • Speaker overlap Detection, detect overlap speakers using Finetuned Speaker Vector.
  • Speaker Vector, calculate similarity between speakers using Pretrained Speaker Vector.
  • Speech Enhancement, enhance voice activities using Waveform UNET.
  • SpeechSplit Conversion, detailed speaking style conversion by disentangling speech into content, timbre, rhythm and pitch using PyWorld and PySPTK.
  • Speech-to-Text, End-to-End Speech to Text for Malay, Mixed (Malay, Singlish) and Singlish using RNNT, Wav2Vec2 CTC and Whisper Seq2Seq.
  • Super Resolution, Super Resolution 4x for Waveform using ResNet UNET and Neural Vocoder.
  • Text-to-Speech, Text to Speech for Malay and Singlish using Tacotron2, FastSpeech2, FastPitch, GlowTTS, LightSpeech and VITS.
  • Vocoder, convert Mel to Waveform using MelGAN, Multiband MelGAN and Universal MelGAN Vocoder.
  • Voice Activity Detection, detect voice activities using Finetuned Speaker Vector.
  • Voice Conversion, Many-to-One and Zero-shot Voice Conversion.
  • Real time interface, provide PyAudio and TorchAudio streaming interface to do real time inference.

Pretrained Models

Malaya-Speech also released pretrained models, simply check at malaya-speech/pretrained-model

References

If you use our software for research, please cite:

@misc{Malaya, Speech-Toolkit library for bahasa Malaysia, powered by Deep Learning Tensorflow,
  author = {Husein, Zolkepli},
  title = {Malaya-Speech},
  year = {2020},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/huseinzol05/malaya-speech}}
}

Acknowledgement

Thanks to KeyReply for private V100s cloud and Mesolitica for private RTXs cloud to train Malaya-Speech models,

logo logo

More Repositories

1

Stock-Prediction-Models

Gathers machine learning and deep learning models for Stock forecasting including trading bots and simulations
Jupyter Notebook
7,822
star
2

NLP-Models-Tensorflow

Gathers machine learning and Tensorflow deep learning models for NLP problems, 1.13 < Tensorflow < 2.0
Jupyter Notebook
1,721
star
3

malaya

Natural Language Toolkit for bahasa Malaysia, https://malaya.readthedocs.io/
Jupyter Notebook
369
star
4

Gather-Deployment

Gathers Python deployment, infrastructure and practices.
Jupyter Notebook
352
star
5

malaysian-dataset

Text corpus for Malaysia, https://malaya.readthedocs.io/en/latest/Dataset.html
Jupyter Notebook
229
star
6

Machine-Learning-Numpy

Gathers Machine learning models using pure Numpy to cover feed-forward, RNN, CNN, clustering, MCMC, timeseries, tree-based, and so much more!
Jupyter Notebook
106
star
7

Self-Driving-Car-Engines

Gathers signal processing, computer vision, machine learning and deep learning for self-driving car engines.
Jupyter Notebook
80
star
8

Python-DevOps

gathers Python stack for DevOps, these are usually my basic templates use for my implementations, so, feel free to use it and evolve it! Everything is Docker!
Python
77
star
9

Deep-Learning-Tensorflow

Gathers Tensorflow deep learning models.
Jupyter Notebook
51
star
10

project-suka-suka

Husein pet projects in here!
Jupyter Notebook
49
star
11

YOLO-Object-Detection-Tensorflow

YOLO: Real-Time Object Detection using Tensorflow and easy to use
Python
44
star
12

Machine-Learning-Data-Science-Reuse

Gathers machine learning and data science techniques for problem solving.
Jupyter Notebook
36
star
13

Bahasa-NLP-Tensorflow

Gathers Tensorflow deep learning models for Bahasa Malaysia NLP problems
Jupyter Notebook
28
star
14

Signal-Classification-Comparison

Classify signal using Deep Learning on Tensorflow and various machine learning models.
Jupyter Notebook
25
star
15

Tensorflow-JS-Projects

Web projects using Tensorflow JS, Plotly, D3, Echarts, NumJS, and NumericJS
JavaScript
19
star
16

Pyspark-ML

Gathers data science and machine learning problem solving using PySpark and Hadoop.
Jupyter Notebook
10
star
17

Reinforcement-Learning-Agents

Gathers machine learning and deep learning models for Reinforcement Learning
Python
9
star
18

Neural-Network-Multilanguages

implement Artificial Neural Network on different languages
PHP
4
star
19

herpetologist

Dynamic parameter type checking for Python 3.6 and above. This able to detect deep nested variables.
Jupyter Notebook
3
star
20

water-healer

Forked of Streamz to deliver processed guarantees at least once for Kafka consumers with extra features.
Jupyter Notebook
2
star
21

Hackathon-Huseinhouse

Gathers hackathon and huseinhouse dashboards
JavaScript
2
star
22

malaya-boilerplate

Tensorflow freeze graph optimization and boilerplates to share among Malaya projects.
Python
1
star
23

malaya-graph

Knowledge Graph Toolkit for Bahasa Malaysia, https://malaya-graph.readthedocs.io/
Jupyter Notebook
1
star