• Stars
    star
    216
  • Rank 183,179 (Top 4 %)
  • Language
    Python
  • License
    MIT License
  • Created about 1 year ago
  • Updated 8 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

text to speech using autoregressive transformer and VITS

AutoRegressive-VITS

MQTTS branch

(WIP) text to speech using autoregressive transformer and VITS

Note

  • 此分支为AR-VITS的多码本+MQTTS分支,用于实验多码本解码,GPT-SoVITS 的单码本版本在master分支
  • 效果一般,实验性分支,而且pretrain规模较小(zh-300h ja-80h en-20h),语言较少,效果不及GPT-SoVITS
  • 没有做推理工程化加速,推理速度极极极极极慢,仅供实验使用
  • 由于底模数据原因模型基本只有中文能力,而同样由于训练集时长均较短,模型只能合成较短的句子,长句需要切片分开推理,否则会爆炸。
  • 已测试过的微调配置:30分钟数据+s2微调1200步+s1微调100步 效果 -> sample 微调数据来自Xz乔希
  • 需要指定参考音频,但此分支使用的是声纹embedding,而非prompt的方式,因此参考音频参考效果不是很强
  • 所有脚本只在linux下测试通过,未在win下测试
  • 如果无法连接huggingface下载bert、hubert等模型,建议使用export HF_ENDPOINT=https://hf-mirror.com

Acknowledgement

  • Thanks to the support of the GPUs by leng-yue

Reference

Training pipeline

  1. jointly train S2 vits decoder and quantizer
  2. extract semantic tokens
  3. train S1 text to semantic

preparation

  • download pretrained models
bash download_pretrain.sh
  • put training data in dataset_raw folder with the following structure
dataset_raw
├── zh
│   ├── spk1
│   │   ├── utt1.wav
│   │   ├── utt1.lab
│   │   ├── ...
│   ├── spk2
│   │   ├── utt1.wav
│   │   ├── utt1.lab
│   │   ├── ...

vits S2 training

  • resample.py
  • gen_phonemes.py
  • extract_ssl_s2.py
  • gen_filelist_s2.py
  • s2_train.py
python s2_train.py -c configs/s2.json -p pretrain/s2

mqtts S1 training

  • extract_vq_s1.py
  • extract_spk_embedding.py
  • gen_filelist_s1.py
  • s1_train.py
python s1_train.py

Inference

  • s1_infer.py/s2_infer.py