• Stars
    star
    220
  • Rank 179,364 (Top 4 %)
  • Language
    C++
  • License
    Apache License 2.0
  • Created about 4 years ago
  • Updated about 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Crystal - C++ implementation of a unified framework for multilingual TTS synthesis engine with SSML specification as interface.

Crystal Text-to-Speech (TTS) Engine

C++ implementation of Crystal Text-to-Speech (TTS) engine.

The Crystal TTS engine provides an implementation of a unified framework for multilingual TTS synthesis engine – Crystal. The unified framework defines the common TTS modules for different languages and/or dialects. The interfaces between consecutive modules conform to Speech Synthesis Markup Language (SSML) specification for standardization, in-teroperability, multilinguality, and extensibility.

Architecture

Reference

For the motivation and design of the framework, you can refer to the the following paper. Please also use this paper for reference to this project:

Native Support of SSML

The framework uses Speech Synthesis Markup Language (SSML) specification as interface between different modules. Hence, the framework provides native support of SSML tags.

Meanwhile, the framework provides cst::xml::CSSMLTraversal (xml/ssml_traversal) to convert the SSML document into internal data structure for convenient processing. This means you actually donot need to take care of the complex parsing procedures of SSML document when implementing your own algorithms. What you need to do is just to implement your algorithms by overriding the functions with internal data structures for the modules in cst::tts::base::*.

Support of Dynamic Module Loading & Cross-platform

The framework provides the support of dynamic module loading on different platforms.

You can implement different algorithms for each module and compile as a new dynamic library (.dll on Windows, .so on Linux platform). The backbone of the framework cst::tts::base::CTextParser (ttsbase/tts.text/tts_textparser) and cst::tts::base::CSynthesizer (ttsbase/tts.synth/tts_synthesizer) will automatically load the modules specified by an XML based configuration file. In this way, the framework provides the flexibility in switching between different TTS engines or algorithms.

For example, the above left figure shows Concatenative Putonghua TTS engine running by specifying the "cmn.xml" as configuration input; while the above right figure shows HMM-based Chinese TTS engine running by specifying the "zh.xml" as configuration input.

Support of Multilingual TTS Engine

You can implement different TTS engines for different languages by overriding the TTSBase moduels in cst::tts::base::*. The following figure depicts the multilingual support of the architecture.

About the Project

Copyright (c) Tsinghua-CUHK Joint Research Center for Media Sciences, Technologies and Systems. All rights reserved.

http://mjrc.sz.tsinghua.edu.cn

Tsinghua-CUHK Joint Research Center has the rights to create, modify, copy, compile, remove, rename, explain and deliver the source codes.

More Repositories

1

NeuCoSVC

Python
235
star
2

VAENAR-TTS

The official implementation of VAENAR-TTS, a VAE based non-autoregressive TTS model.
Python
144
star
3

SECap

Python
85
star
4

Crystal.TTVS

Crystal TTVS engine is a real-time audio-visual Multilingual speech synthesizer with a 3D expressive avatar.
C++
83
star
5

FlatTN

Chinese Text Normalization and Dataset
Python
76
star
6

NeuFA

Neural network-based forced alignment with bidirectional attention mechanism
Python
69
star
7

SpanPSP

Python
69
star
8

LightGrad

Python
65
star
9

SnakeGAN

Please visit https://thuhcsi.github.io/SnakeGAN/
Python
33
star
10

tacotron

PyTorch implementation of Tacotron and Tacotron2
Python
32
star
11

icassp2021-emotion-tts

Please visit: https://thuhcsi.github.io/icassp2021-emotion-tts/
Python
32
star
12

DiffVar

Python
26
star
13

IJCAI2019-DRL4SER

The python implementation for paper "Towards Discriminative Representation Learning for Speech Emotion Recognition" in IJCAI-2019
Python
22
star
14

S2G-MDDiffusion

Python
22
star
15

english-conversation-corpus

English conversation corpus for conversational TTS.
Shell
19
star
16

NeuCoSVC-Demo

HTML
14
star
17

PortableTTS

Python
12
star
18

Contextual-Biasing-Dataset

open-source Mandarian biased word dataset
10
star
19

dpss-exp3-VC-BNF

Voice Conversion Experiments for THUHCSI Course : <Digital Processing of Speech Signals>
Python
9
star
20

torch_speaker

Python
8
star
21

mm2022-conversational-tts

Python
8
star
22

adsv_voting

Python
7
star
23

ExpressiveBailando

7
star
24

icassp2022-Transformer-S2A

HTML
5
star
25

mst-fastspeech2

Python
4
star
26

thuhcsi.github.io

https://thuhcsi.github.io/
HTML
3
star
27

SCNet

Python
3
star
28

icassp2022-hybrid-bottleneck-vc

Please visit https://thuhcsi.github.io/icassp2022-hybrid-bottleneck-vc/
CSS
2
star
29

icassp2023-coherent-tts

Please visit https://thuhcsi.github.io/icassp2023-coherent-tts
SCSS
2
star
30

icassp2022-FastFoley

2
star
31

dpss-exp2-HMM-2023

ex2 for dpss 2023
Python
2
star
32

icassp2022-conversational-tts

Please visit https://thuhcsi.github.io/icassp2022-conversational-tts/
1
star
33

interspeech2022-expressive-svs

Please visit https://thuhcsi.github.io/interspeech2022-expressive-svs
SCSS
1
star
34

interspeech2022-acciaccatura-svs

Please visit https://thuhcsi.github.io/interspeech2022-acciaccatura-svs/
SCSS
1
star
35

melody-unsupervised-pretraining-svs

SCSS
1
star
36

interspeech2019-tts-samples

Please visit https://thuhcsi.github.io/interspeech2019-tts-samples/
1
star
37

StyleDub

Please visit https://thuhcsi.github.io/StyleDub/
Python
1
star
38

Semi-Supervised-MDD

Python
1
star
39

icassp2023-ddpm-prosody-predictor

Please visit https://thuhcsi.github.io/icassp2023-ddpm-prosody-predictor
SCSS
1
star