• Stars
    star
    577
  • Rank 77,363 (Top 2 %)
  • Language
    Jupyter Notebook
  • License
    MIT License
  • Created about 1 year ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

[ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching

🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching

Shivam Mehta, Ruibo Tu, Jonas Beskow, Éva Székely, and Gustav Eje Henter

python pytorch lightning hydra black isort

This is the official code implementation of 🍵 Matcha-TTS.

We propose 🍵 Matcha-TTS, a new approach to non-autoregressive neural TTS, that uses conditional flow matching (similar to rectified flows) to speed up ODE-based speech synthesis. Our method:

  • Is probabilistic
  • Has compact memory footprint
  • Sounds highly natural
  • Is very fast to synthesise from

Check out our demo page and read our arXiv preprint for more details.

Pre-trained models will be automatically downloaded with the CLI or gradio interface.

Try 🍵 Matcha-TTS on HuggingFace 🤗 spaces!

Watch the teaser

Watch the video

Installation

  1. Create an environment (suggested but optional)
conda create -n matcha-tts python=3.10 -y
conda activate matcha-tts
  1. Install Matcha TTS using pip or from source
pip install matcha-tts

from source

pip install git+https://github.com/shivammehta25/Matcha-TTS.git
cd Matcha-TTS
pip install -e .
  1. Run CLI / gradio app / jupyter notebook
# This will download the required models
matcha-tts --text "<INPUT TEXT>"

or

matcha-tts-app

or open synthesis.ipynb on jupyter notebook

CLI Arguments

  • To synthesise from given text, run:
matcha-tts --text "<INPUT TEXT>"
  • To synthesise from a file, run:
matcha-tts --file <PATH TO FILE>
  • To batch synthesise from a file, run:
matcha-tts --file <PATH TO FILE> --batched

Additional arguments

  • Speaking rate
matcha-tts --text "<INPUT TEXT>" --speaking_rate 1.0
  • Sampling temperature
matcha-tts --text "<INPUT TEXT>" --temperature 0.667
  • Euler ODE solver steps
matcha-tts --text "<INPUT TEXT>" --steps 10

Train with your own dataset

Let's assume we are training with LJ Speech

  1. Download the dataset from here, extract it to data/LJSpeech-1.1, and prepare the file lists to point to the extracted data like for item 5 in the setup of the NVIDIA Tacotron 2 repo.

  2. Clone and enter the Matcha-TTS repository

git clone https://github.com/shivammehta25/Matcha-TTS.git
cd Matcha-TTS
  1. Install the package from source
pip install -e .
  1. Go to configs/data/ljspeech.yaml and change
train_filelist_path: data/filelists/ljs_audio_text_train_filelist.txt
valid_filelist_path: data/filelists/ljs_audio_text_val_filelist.txt
  1. Generate normalisation statistics with the yaml file of dataset configuration
matcha-data-stats -i ljspeech.yaml
# Output:
#{'mel_mean': -5.53662231756592, 'mel_std': 2.1161014277038574}

Update these values in configs/data/ljspeech.yaml under data_statistics key.

data_statistics:  # Computed for ljspeech dataset
  mel_mean: -5.536622
  mel_std: 2.116101

to the paths of your train and validation filelists.

  1. Run the training script
make train-ljspeech

or

python matcha/train.py experiment=ljspeech
  • for a minimum memory run
python matcha/train.py experiment=ljspeech_min_memory
  • for multi-gpu training, run
python matcha/train.py experiment=ljspeech trainer.devices=[0,1]
  1. Synthesise from the custom trained model
matcha-tts --text "<INPUT TEXT>" --checkpoint_path <PATH TO CHECKPOINT>

ONNX support

Special thanks to @mush42 for implementing ONNX export and inference support.

It is possible to export Matcha checkpoints to ONNX, and run inference on the exported ONNX graph.

ONNX export

To export a checkpoint to ONNX, first install ONNX with

pip install onnx

then run the following:

python3 -m matcha.onnx.export matcha.ckpt model.onnx --n-timesteps 5

Optionally, the ONNX exporter accepts vocoder-name and vocoder-checkpoint arguments. This enables you to embed the vocoder in the exported graph and generate waveforms in a single run (similar to end-to-end TTS systems).

Note that n_timesteps is treated as a hyper-parameter rather than a model input. This means you should specify it during export (not during inference). If not specified, n_timesteps is set to 5.

Important: for now, torch>=2.1.0 is needed for export since the scaled_product_attention operator is not exportable in older versions. Until the final version is released, those who want to export their models must install torch>=2.1.0 manually as a pre-release.

ONNX Inference

To run inference on the exported model, first install onnxruntime using

pip install onnxruntime
pip install onnxruntime-gpu  # for GPU inference

then use the following:

python3 -m matcha.onnx.infer model.onnx --text "hey" --output-dir ./outputs

You can also control synthesis parameters:

python3 -m matcha.onnx.infer model.onnx --text "hey" --output-dir ./outputs --temperature 0.4 --speaking_rate 0.9 --spk 0

To run inference on GPU, make sure to install onnxruntime-gpu package, and then pass --gpu to the inference command:

python3 -m matcha.onnx.infer model.onnx --text "hey" --output-dir ./outputs --gpu

If you exported only Matcha to ONNX, this will write mel-spectrogram as graphs and numpy arrays to the output directory. If you embedded the vocoder in the exported graph, this will write .wav audio files to the output directory.

If you exported only Matcha to ONNX, and you want to run a full TTS pipeline, you can pass a path to a vocoder model in ONNX format:

python3 -m matcha.onnx.infer model.onnx --text "hey" --output-dir ./outputs --vocoder hifigan.small.onnx

This will write .wav audio files to the output directory.

Citation information

If you use our code or otherwise find this work useful, please cite our paper:

@article{mehta2023matcha,
  title={Matcha-TTS: A fast TTS architecture with conditional flow matching},
  author={Mehta, Shivam and Tu, Ruibo and Beskow, Jonas and Sz{\'e}kely, {\'E}va and Henter, Gustav Eje},
  journal={arXiv preprint arXiv:2309.03199},
  year={2023}
}

Acknowledgements

Since this code uses Lightning-Hydra-Template, you have all the powers that come with it.

Other source code I would like to acknowledge:

  • Coqui-TTS: For helping me figure out how to make cython binaries pip installable and encouragement
  • Hugging Face Diffusers: For their awesome diffusers library and its components
  • Grad-TTS: For the monotonic alignment search source code
  • torchdyn: Useful for trying other ODE solvers during research and development
  • labml.ai: For the RoPE implementation

More Repositories

1

Neural-HMM

Neural HMMs are all you need (for high-quality attention-free TTS)
Jupyter Notebook
150
star
2

OverFlow

Putting flows on top of neural transducers for better TTS
Jupyter Notebook
63
star
3

Diff-TTSG

Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis
Python
38
star
4

BetterFastSpeech2

Jupyter Notebook
24
star
5

Match-TTSG

Jupyter Notebook
5
star
6

ScanX

This tool used nmap and scanpbnj modules to develop a mini shodan type engine that can search according to any service running on the vairous hosts, It connects the nmap results to the database providing a proper frontend with an administrative panel.
PHP
4
star
7

GitSetGo

Command Line Git Made Easy No Additional Dependencies Just Run the Script
Python
3
star
8

Matcha-TTS-checkpoints

Repository specific for hosting Matcha-TTS's checkpoints in its release. Mitigation due to the bug in gdown
3
star
9

AvizvaProject

This Project Was Made by Akshay Saruabh and Me as per our Training Project
Java
2
star
10

Information-Retrieval

Implementation of Various Techniques used in Information Retrieval Systems.
Jupyter Notebook
2
star
11

Vulnerability-Assessment-Framework

Vulnerability Assessment Tools and Scripts Developed in Python
Python
2
star
12

NLPinEnglishLearning

This repository contains code for research of Application of Natural Language Processing in Field Of English Grammar exercises
Jupyter Notebook
1
star
13

Machine-Learning-Lab

Writing Machine Learning Algorithms From Scratch
Jupyter Notebook
1
star
14

Speech-Reconstruction

Jupyter Notebook
1
star
15

ITMO_FS

Feature selection library in python
Python
1
star
16

shivammehta25.github.io

Migrating my old webpage from old shivammehta.me (Wordpress) to GitHub.
Ruby
1
star
17

PyTorchLightningSkeleton

Python
1
star