• Stars
    star
    143
  • Rank 257,007 (Top 6 %)
  • Language
    Python
  • Created about 2 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

It's a repository for implementations of neural speech editing algorithms.



Speech-Editing-Toolkit

This repo contains official PyTorch implementations of:



This repo contains unofficial PyTorch implementations of:

Supported Datasets

Our framework supports the following datasets:

  • VCTK
  • LibriTTS
  • SASE Dataset (We will publish it later)

Install Dependencies

Please install the latest numpy, torch and tensorboard first. Then run the following commands:

export PYTHONPATH=.
# install requirements.
pip install -U pip
pip install -r requirements.txt
sudo apt install -y sox libsox-fmt-mp3

Finally, install Montreal Forced Aligner following the link below:

https://montreal-forced-aligner.readthedocs.io/en/latest/

Download the pre-trained vocoder

mkdir pretrained
mkdir pretrained/hifigan_hifitts

download model_ckpt_steps_2168000.ckpt, config.yaml, from https://drive.google.com/drive/folders/1n_0tROauyiAYGUDbmoQ__eqyT_G4RvjN?usp=sharing to pretrained/hifigan_hifitts

Data Preprocess

# You can set the 'self.dataset_name' in these files as 'vctk' or 'libritts' to process these datasets. And you should also set the ``BASE_DIR`` value in ``run_mfa_train_align.sh`` to the corresponding directory. 
# The default dataset is ``vctk``.
python data_gen/tts/base_preprocess.py
python data_gen/tts/run_mfa_train_align.sh
python data_gen/tts/base_binarizer.py

Train (FluentSpeech)

# Example run for FluentSpeech.
CUDA_VISIBLE_DEVICES=0 python tasks/run.py --config egs/spec_denoiser.yaml --exp_name spec_denoiser --reset

Train (Baselines)

# Example run for CampNet.
CUDA_VISIBLE_DEVICES=0 python tasks/run.py --config egs/campnet.yaml --exp_name campnet --reset
# Example run for A3T.
CUDA_VISIBLE_DEVICES=0 python tasks/run.py --config egs/a3t.yaml --exp_name a3t --reset
# Example run for EditSpeech.
CUDA_VISIBLE_DEVICES=0 python tasks/run.py --config egs/editspeech.yaml --exp_name editspeech --reset

Pretrained Checkpoint

Here, we provide the pretrained checkpoint of fluentspeech. To start, please put the config.yaml and xxx.ckpt at ./checkpoints/spec_denoiser/.

model dataset url checkpoint name
FluentSpeech libritts-clean https://drive.google.com/drive/folders/1saqpWc4vrSgUZvRvHkf2QbwWSikMTyoo?usp=sharing model_ckpt_steps_568000.ckpt

Inference

We provide the data structure of inference in inference/example.csv. text and edited_text refer to the original text and target text. region refers to the word idx range (start from 1 ) that you want to edit. edited_region refers to the word idx range of the edited_text.

id item_name text edited_text wav_fn_orig edited_region region
0 1 "this is a libri vox recording" "this is a funny joke shows." inference/audio_backup/1.wav [3,6] [3,6]
# run with one example
python inference/tts/spec_denoiser.py --exp_name spec_denoiser

Citation

If you find this useful for your research, please star our repo.

License and Agreement

Any organization or individual is prohibited from using any technology mentioned in this paper to generate someone's speech without his/her consent, including but not limited to government leaders, political figures, and celebrities. If you do not comply with this item, you could be in violation of copyright laws.

Tips

  1. If you find the mfa_dict.txt and mfa_model.zip are missing, you need to run the preprocess script in our repo to get them. You can also download all of these files you need for inferencing the pre-trained model from https://drive.google.com/drive/folders/1H-dk7cNYVn1DSzYq_q66rS5b5xpbdBi4?usp=sharing and put them in data/processed/libritts.
  2. Please specify the MFA version as 2.0.0rc3.

If you find any other problems, please contact me.