• Stars
    star
    276
  • Rank 149,319 (Top 3 %)
  • Language
    Python
  • License
    Other
  • Created over 3 years ago
  • Updated over 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

PyTorch Implementation of Non-autoregressive Expressive (emotional, conversational) TTS based on FastSpeech2, supporting English, Korean, and your own languages.

Expressive-FastSpeech2 - PyTorch Implementation

Contributions

  1. Non-autoregressive Expressive TTS: This project aims to provide a cornerstone for future research and application on a non-autoregressive expressive TTS including Emotional TTS and Conversational TTS. For datasets, AIHub Multimodal Video AI datasets and IEMOCAP database are picked for Korean and English, respectively.

    Note: If you are interested in GST-Tacotron or VAE-Tacotron like expressive stylistic TTS model but under non-autoregressive decoding, you may also be interested in STYLER [demo, code].

  2. Annotated Data Processing: This project shed light on how to handle the new dataset, even with a different language, for the successful training of non-autoregressive emotional TTS.

  3. English and Korean TTS: In addition to English, this project gives a broad view of treating Korean for the non-autoregressive TTS where the additional data processing must be considered under the language-specific features (e.g., training Montreal Forced Aligner with your own language and dataset). Please closely look into text/.

  4. Adopting Own Language: For those who are interested in adapting other languages, please refer to the "Training with your own dataset (own language)" section of the categorical branch.

Repository Structure

In this project, FastSpeech2 is adapted as a base non-autoregressive multi-speaker TTS framework, so it would be helpful to read the paper and code first (Also see FastSpeech2 branch).

  1. Emotional TTS: Following branches contain implementations of the basic paradigm intorduced by Emotional End-to-End Neural Speech synthesizer.

    • categorical branch: only conditioning categorical emotional descriptors (such as happy, sad, etc.)
    • continuous branch: conditioning continuous emotional descriptors (such as arousal, valence, etc.) in addition to categorical emotional descriptors
  2. Conversational TTS: Following branch contains implementation of Conversational End-to-End TTS for Voice Agent

Citation

If you would like to use or refer to this implementation, please cite the repo.

@misc{lee2021expressive_fastspeech2,
  author = {Lee, Keon},
  title = {Expressive-FastSpeech2},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/keonlee9420/Expressive-FastSpeech2}}
}

References

More Repositories

1

PortaSpeech

PyTorch Implementation of PortaSpeech: Portable and High-Quality Generative Text-to-Speech
Python
329
star
2

Comprehensive-Transformer-TTS

A Non-Autoregressive Transformer based Text-to-Speech, supporting a family of SOTA transformers with supervised and unsupervised duration modelings. This project grows with the research community, aiming to achieve the ultimate TTS
Python
319
star
3

DiffGAN-TTS

PyTorch Implementation of DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs
Python
310
star
4

DiffSinger

PyTorch implementation of DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (focused on DiffSpeech)
Python
233
star
5

DailyTalk

Official repository of DailyTalk: Spoken Dialogue Dataset for Conversational Text-to-Speech, ICASSP 2023
Python
194
star
6

StyleSpeech

PyTorch Implementation of Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation
Python
189
star
7

Parallel-Tacotron2

PyTorch Implementation of Google's Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling
Python
187
star
8

Cross-Speaker-Emotion-Transfer

PyTorch Implementation of ByteDance's Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech
Python
180
star
9

STYLER

Official repository of STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech, INTERSPEECH 2021
Python
156
star
10

Comprehensive-E2E-TTS

A Non-Autoregressive End-to-End Text-to-Speech (text-to-wav), supporting a family of SOTA unsupervised duration modelings. This project grows with the research community, aiming to achieve the ultimate E2E-TTS
Python
144
star
11

Soft-DTW-Loss

PyTorch implementation of Soft-DTW: a Differentiable Loss Function for Time-Series in CUDA
Python
120
star
12

VAENAR-TTS

PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.
Python
71
star
13

FastPitchFormant

PyTorch Implementation of NCSOFT's FastPitchFormant: Source-filter based Decomposed Modeling for Speech Synthesis
Python
71
star
14

WaveGrad2

PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis
Python
66
star
15

Daft-Exprt

PyTorch Implementation of Daft-Exprt: Robust Prosody Transfer Across Speakers for Expressive Speech Synthesis
Python
56
star
16

Comprehensive-Tacotron2

PyTorch Implementation of Google's Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions. This implementation supports both single-, multi-speaker TTS and several techniques to enforce the robustness and efficiency of the model.
Python
45
star
17

evaluate-zero-shot-tts

Evaluation Protocol for Large-Scale Zero-Shot TTS Literature
Python
43
star
18

Robust_Fine_Grained_Prosody_Control

PyTorch Implementation of Robust and fine-grained prosody control of end-to-end speech synthesis
Python
41
star
19

Stepwise_Monotonic_Multihead_Attention

PyTorch Implementation of Stepwise Monotonic Multihead Attention similar to Enhancing Monotonicity for Robust Autoregressive Transformer TTS
Python
31
star
20

Deep-Learning-TTS-Template

This is a template for the Non-autoregressive Deep Learning-Based TTS model (in PyTorch).
Python
15
star
21

tacotron2_MMI

Another PyTorch implementation of Tacotron2 MMI (with waveglow) which supports n_frames_per_step>1 mode(reduction windows) and diagonal guided attention for robust alignments.
Jupyter Notebook
5
star
22

Fully_Hierarchical_Fine_Grained_TTS

Pytorch Implementation of Fully-hierarchical fine-grained prosody modeling for interpretable speech synthesis (Unofficial)
2
star
23

cs231n

cs231n 2020 Spring assignments implementation
Jupyter Notebook
2
star
24

pintos

KAIST CS330 OS pintos Project
HTML
1
star