• Stars
    star
    150
  • Rank 247,323 (Top 5 %)
  • Language
    Jupyter Notebook
  • License
    MIT License
  • Created about 3 years ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Neural HMMs are all you need (for high-quality attention-free TTS)

Neural HMMs are all you need (for high-quality attention-free TTS)

Shivam Mehta, Éva Székely, Jonas Beskow, and Gustav Eje Henter

This is the official code repository for the paper "Neural HMMs are all you need (for high-quality attention-free TTS)". For audio examples, visit our demo page. pre-trained model (female) and pre-trained model (male) are also available.

Synthesising from Neural-HMM

Setup and training using LJ Speech

  1. Download and extract the LJ Speech dataset. Place it in the data folder such that the directory becomes data/LJSpeech-1.1. Otherwise update the filelists in data/filelists accordingly.
  2. Clone this repository git clone https://github.com/shivammehta25/Neural-HMM.git
    • If using single GPU checkout the branch gradient_checkpointing it will help to fit bigger batch size during training.
    • Use git clone --single-branch -b gradient_checkpointing https://github.com/shivammehta25/Neural-HMM.git for that.
  3. Initalise the submodules git submodule init; git submodule update
  4. Make sure you have docker installed and running.
    • It is recommended to use Docker (it manages the CUDA runtime libraries and Python dependencies itself specified in Dockerfile)
    • Alternatively, If you do not intend to use Docker, you can use pip to install the dependencies using pip install -r requirements.txt
  5. Run bash start.sh and it will install all the dependencies and run the container.
  6. Check src/hparams.py for hyperparameters and set GPUs.
    1. For multi-GPU training, set GPUs to [0, 1 ..]
    2. For CPU training (not recommended), set GPUs to an empty list []
    3. Check the location of transcriptions
  7. Once your filelists and hparams are updated run python generate_data_properties.py to generate data_parameters.pt for your dataset (the default data_parameters.pt is available for LJSpeech in the repository).
  8. Run python train.py to train the model.
    1. Checkpoints will be saved in the hparams.checkpoint_dir.
    2. Tensorboard logs will be saved in the hparams.tensorboard_log_dir.
  9. To resume training, run python train.py -c <CHECKPOINT_PATH>

Synthesis

  1. Download our pre-trained LJ Speech model. (This is the exact same model as system NH2 in the paper, but with training continued until reaching 200k updates total.)
  2. Download HiFi gan pretrained HiFiGAN model.
    • We recommend using fine tuned on Tacotron2 if you cannot finetune on NeuralHMM.
  3. Run jupyter notebook and open synthesis.ipynb.

Miscellaneous

Mixed-precision training or full-precision training

  • In src.hparams.py change hparams.precision to 16 for mixed precision and 32 for full precision.

Multi-GPU training or single-GPU training

  • Since the code uses PyTorch Lightning, providing more than one element in the list of GPUs will enable multi-GPU training. So change hparams.gpus to [0, 1, 2] for multi-GPU training and single element [0] for single-GPU training.

Known issues/warnings

PyTorch dataloader

  • If you encounter this error message [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool), this is a known issue in PyTorch Dataloader.
  • It will be fixed when PyTorch releases a new Docker container image with updated version of Torch. If you are not using docker this can be removed with torch > 1.9.1

Torchmetric error on RTX 3090

  • If you encoder this error message ImportError: cannot import name 'get_num_classes' from 'torchmetrics.utilities.data' (/opt/conda/lib/python3.8/site-packages/torchmetrics/utilities/data.py)
  • Update the requirement.txt file with these requirements:
torch==1.11.0a0+b6df043
--extra-index-url https://download.pytorch.org/whl/cu113
torchmetrics==0.6.0

Support

If you have any questions or comments, please open an issue on our GitHub repository.

Citation information

If you use or build on our method or code for your research, please cite our paper:

@inproceedings{mehta2022neural,
  title={Neural {HMM}s are all you need (for high-quality attention-free {TTS})},
  author={Mehta, Shivam and Sz{\'e}kely, {\'E}va and Beskow, Jonas and Henter, Gustav Eje},
  booktitle={Proc. ICASSP},
  year={2022}
}

Acknowledgements

The code implementation is based on Nvidia's implementation of Tacotron 2 and uses PyTorch Lightning for boilerplate-free code.

More Repositories

1

Matcha-TTS

[ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching
Jupyter Notebook
577
star
2

OverFlow

Putting flows on top of neural transducers for better TTS
Jupyter Notebook
63
star
3

Diff-TTSG

Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis
Python
38
star
4

BetterFastSpeech2

Jupyter Notebook
24
star
5

Match-TTSG

Jupyter Notebook
5
star
6

ScanX

This tool used nmap and scanpbnj modules to develop a mini shodan type engine that can search according to any service running on the vairous hosts, It connects the nmap results to the database providing a proper frontend with an administrative panel.
PHP
4
star
7

GitSetGo

Command Line Git Made Easy No Additional Dependencies Just Run the Script
Python
3
star
8

Matcha-TTS-checkpoints

Repository specific for hosting Matcha-TTS's checkpoints in its release. Mitigation due to the bug in gdown
3
star
9

AvizvaProject

This Project Was Made by Akshay Saruabh and Me as per our Training Project
Java
2
star
10

Information-Retrieval

Implementation of Various Techniques used in Information Retrieval Systems.
Jupyter Notebook
2
star
11

Vulnerability-Assessment-Framework

Vulnerability Assessment Tools and Scripts Developed in Python
Python
2
star
12

NLPinEnglishLearning

This repository contains code for research of Application of Natural Language Processing in Field Of English Grammar exercises
Jupyter Notebook
1
star
13

Machine-Learning-Lab

Writing Machine Learning Algorithms From Scratch
Jupyter Notebook
1
star
14

Speech-Reconstruction

Jupyter Notebook
1
star
15

ITMO_FS

Feature selection library in python
Python
1
star
16

shivammehta25.github.io

Migrating my old webpage from old shivammehta.me (Wordpress) to GitHub.
Ruby
1
star
17

PyTorchLightningSkeleton

Python
1
star