• This repository has been archived on 01/Oct/2021
  • Stars
    star
    380
  • Rank 112,766 (Top 3 %)
  • Language
    Python
  • License
    Other
  • Created over 5 years ago
  • Updated almost 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Phoneme multilingual(Russian-English) voice cloning based on

Multi-Tacotron Voice Cloning

This repository is a phonemic multilingual (Russian-English) implementation based on Real-Time-Voice-Cloning. it is a four-stage deep learning framework that allows to create a numerical representation of a voice from a few seconds of audio, and to use it to condition a text-to-speech model. If you only need the English version, please use the original implementation.

Π­Ρ‚ΠΎΡ‚ Ρ€Π΅ΠΏΠΎΠ·ΠΈΡ‚ΠΎΡ€ΠΈΠΉ являСтся многоязычной(русско-английской) Ρ„ΠΎΠ½Π΅ΠΌΠ½ΠΎΠΉ Ρ€Π΅Π°Π»ΠΈΠ·Π°Ρ†ΠΈΠ΅ΠΉ, основанной Π½Π° Real-Time-Voice-Cloning. Она состоит ΠΈΠ· Ρ‡Π΅Ρ‚Ρ‹Ρ€Ρ‘Ρ… Π½Π΅ΠΉΡ€ΠΎΠ½Π½Ρ‹Ρ… сСтСй, ΠΊΠΎΡ‚ΠΎΡ€Ρ‹Π΅ ΠΏΠΎΠ·Π²ΠΎΠ»ΡΡŽΡ‚ ΡΠΎΠ·Π΄Π°Π²Π°Ρ‚ΡŒ числовоС прСдставлСниС голоса ΠΈΠ· Π½Π΅ΡΠΊΠΎΠ»ΡŒΠΊΠΈΡ… сСкунд Π·Π²ΡƒΠΊΠ° ΠΈ ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΠΎΠ²Π°Ρ‚ΡŒ Π΅Π³ΠΎ для создания ΠΌΠΎΠ΄Π΅Π»ΠΈ прСобразования тСкста Π² Ρ€Π΅Ρ‡ΡŒ

Example

Quick start

Use the colab online demo

Requirements

You will need the following whether you plan to use the toolbox only or to retrain the models.

β‰₯Python 3.6.

PyTorch (>=1.0.1).

Run pip install -r requirements.txt to install the necessary packages.

A GPU is mandatory, but you don't necessarily need a high tier GPU if you only want to use the toolbox.

Pretrained models

Download the latest here.

Datasets

Name Language Link Comments My link Comments
Phoneme dictionary En, Ru En,Ru Phoneme dictionary link БовмСстил русский ΠΈ английский Ρ„ΠΎΠ½Π΅ΠΌΠ½Ρ‹ΠΉ ΡΠ»ΠΎΠ²Π°Ρ€ΡŒ
LibriSpeech En link 300 speakers, 360h clean speech
VoxCeleb En link 7000 speakers, many hours bad speech
M-AILABS Ru link 3 speakers, 46h clean speech
open_tts, open_stt Ru open_tts, open_stt many speakers, many hours bad speech link ΠŸΠΎΡ‡ΠΈΡΡ‚ΠΈΠ» 4 часа Ρ€Π΅Ρ‡ΠΈ ΠΎΠ΄Π½ΠΎΠ³ΠΎ спикСра. ΠŸΠΎΠΏΡ€Π°Π²ΠΈΠ» Π°Π½ΠΎΡ‚Π°Ρ†ΠΈΡŽ, Ρ€Π°Π·Π±ΠΈΠ» Π½Π° ΠΎΡ‚Ρ€Π΅Π·ΠΊΠΈ Π΄ΠΎ 7 сСкунд
Voxforge+audiobook Ru link Many speaker, 25h various quality link Π’Ρ‹Π±Ρ€Π°Π» Ρ…ΠΎΡ€ΠΎΡˆΠΈΠ΅ Ρ„Π°ΠΉΠ»Ρ‹. Π Π°Π·Π±ΠΈΠ» Π½Π° ΠΎΡ‚Ρ€Π΅Π·ΠΊΠΈ. Π”ΠΎΠ±Π°Π²ΠΈΠ» Π°ΡƒΠ΄ΠΈΠΎΠΊΠ½ΠΈΠ³ ΠΈΠ· ΠΈΠ½Ρ‚Π΅Ρ€Π½Π΅Ρ‚Π°. ΠŸΠΎΠ»ΡƒΡ‡ΠΈΠ»ΠΎΡΡŒ 200 спикСров ΠΏΠΎ ΠΏΠ°Ρ€Π΅ ΠΌΠΈΠ½ΡƒΡ‚ Π½Π° ΠΊΠ°ΠΆΠ΄ΠΎΠ³ΠΎ
RUSLAN Ru link One speaker, 40h good speech link ΠŸΠ΅Ρ€Π΅ΠΊΠΎΠ΄ΠΈΡ€ΠΎΠ²Π°Π» Π² 16ΠΊΠ“Ρ†
Mozilla Ru link 50 speaker, 30h good speech link ΠŸΠ΅Ρ€Π΅ΠΊΠΎΠ΄ΠΈΡ€ΠΎΠ²Π°Π» Π² 16ΠΊΠ“Ρ†, Раскидал Ρ€Π°Π·Π½Ρ‹Ρ… ΠΏΠΎΠ»ΡŒΠ·ΠΎΠ²Π°Ρ‚Π΅Π»Π΅ΠΉ ΠΏΠΎ ΠΏΠ°ΠΏΠΊΠ°ΠΌ
Russian Single Ru link One speaker, 9h good speech link ΠŸΠ΅Ρ€Π΅ΠΊΠΎΠ΄ΠΈΡ€ΠΎΠ²Π°Π» Π² 16ΠΊΠ“Ρ†

Toolbox

You can then try the toolbox:

python demo_toolbox.py -d <datasets_root>
or
python demo_toolbox.py

Wiki

Pretrained models

Π’Ρ€Π΅Π½ΠΈΡ€ΠΎΠ²ΠΊΠ° (ΠΈ для Π΄Ρ€ΡƒΠ³ΠΈΡ… языков)

Training (and for other languages)

Contribution

for any questions, please email me

Papers implemented

URL Designation Title Implementation source
1806.04558 SV2TTS Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis CorentinJ
1802.08435 WaveRNN (vocoder) Efficient Neural Audio Synthesis fatchord/WaveRNN
1712.05884 Tacotron 2 (synthesizer) Natural TTS Synthesis by Conditioning Wavenet on Mel Spectrogram Predictions Rayhane-mamah/Tacotron-2
1710.10467 GE2E (encoder) Generalized End-To-End Loss for Speaker Verification CorentinJ