• Stars
    star
    161
  • Rank 233,470 (Top 5 %)
  • Language
    Python
  • License
    MIT License
  • Created over 5 years ago
  • Updated over 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Korean TTS, Tacotron2, Wavenet

Multi-Speaker Tocotron2 + Wavenet Vocoder + Korean TTS

Tacotron2 ๋ชจ๋ธ๊ณผ Wavenet Vocoder๋ฅผ ๊ฒฐํ•ฉํ•˜์—ฌ ํ•œ๊ตญ์–ด TTS๊ตฌํ˜„ํ•˜๋Š” project์ž…๋‹ˆ๋‹ค. Tacotron2 ๋ชจ๋ธ์„ Multi-Speaker๋ชจ๋ธ๋กœ ํ™•์žฅํ–ˆ์Šต๋‹ˆ๋‹ค.

Based on

Tacotron 2

  • Tacotron ๋ชจ๋ธ์— ๊ด€ํ•œ ์„ค๋ช…์€ ์ด์ „ repo ์ฐธ๊ณ ํ•˜์‹œ๋ฉด ๋ฉ๋‹ˆ๋‹ค.
  • Tacotron2์—์„œ๋Š” ๋ชจ๋ธ ๊ตฌ์กฐ๋„ ๋ฐ”๋€Œ์—ˆ๊ณ , Location Sensitive Attention, Stop Token, Vocoder๋กœ Wavenet์„ ์ œ์•ˆํ•˜๊ณ  ์žˆ๋‹ค.
  • Tacotron2์˜ ๋Œ€ํ‘œ์ ์ธ ๊ตฌํ˜„์€ Rayhane-mamah์ž…๋‹ˆ๋‹ค. ์ด ์—ญ์‹œ, keithito, r9y9์˜ ์ฝ”๋“œ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ฐœ์ „๋œ ๊ฒƒ์ด๋‹ค.

This Project

  • Tacotron2 ๋ชจ๋ธ๋กœ ํ•œ๊ตญ์–ด TTS๋ฅผ ๋งŒ๋“œ๋Š” ๊ฒƒ์ด ๋ชฉํ‘œ์ž…๋‹ˆ๋‹ค.
  • Rayhane-mamah์˜ ๊ตฌํ˜„์€ Customization๋œ Layer๋ฅผ ๋งŽ์ด ์‚ฌ์šฉํ–ˆ๋Š”๋ฐ, ์ œ๊ฐ€ ๋ณด๊ธฐ์—๋Š” ๋„ˆ๋ฌด ๋ณต์žกํ•˜๊ฒŒ ํ•œ ๊ฒƒ ๊ฐ™์•„, Cumomization Layer๋ฅผ ๋งŽ์ด ์ค„์ด๊ณ , Tensorflow์— ๊ตฌํ˜„๋˜์–ด ์žˆ๋Š” Layer๋ฅผ ๋งŽ์ด ํ™œ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค.
  • teacher forcing ๋ฐฉ์‹์˜ train sample์€ 2000 step๋ถ€ํ„ฐ, free forcing ๋ฐฉ์‹์˜ test sample์€ 3000 step๋ถ€ํ„ฐ ์•Œ์•„๋“ค์„ ์ˆ˜ ์žˆ๋Š” ์ •๋„์˜ ์Œ์„ฑ์„ ๋งŒ๋“ค๊ธฐ ์‹œ์ž‘ํ•ฉ๋‹ˆ๋‹ค.

๋‹จ๊ณ„๋ณ„ ์‹คํ–‰

์‹คํ–‰ ์ˆœ์„œ

  • Data ์ƒ์„ฑ: ํ•œ๊ตญ์–ด data์˜ ์ƒ์„ฑ์€ ์ด์ „ repo ์ฐธ๊ณ ํ•˜์‹œ๋ฉด ๋ฉ๋‹ˆ๋‹ค.
  • ์ƒ์„ฑ๋œ Data๋Š” ์•„๋ž˜์˜ 'data_paths'์— ์ง€์ •ํ•˜๋ฉด ๋œ๋‹ค.
  • tacotron training ํ›„, synthesize.py๋กœ test.
  • wavenet training ํ›„, generate.py๋กœ test(tacotron์ด ๋งŒ๋“ค์ง€ ์•Š์€ mel spectrogram์œผ๋กœ testํ•  ์ˆ˜๋„ ์žˆ๊ณ , tacotron์ด ๋งŒ๋“  mel spectrogram์„ ์‚ฌ์šฉํ•  ์ˆ˜๋„ ์žˆ๋‹ค.)
  • 2๊ฐœ ๋ชจ๋ธ ๋ชจ๋‘ train ํ›„, tacotron์—์„œ ์ƒ์„ฑํ•œ mel spectrogram์„ wavent์— local condition์œผ๋กœ ๋„ฃ์–ด testํ•˜๋ฉด ๋œ๋‹ค.

Tacotron2 Training

  • train_tacotron2.py ๋‚ด์—์„œ '--data_paths'๋ฅผ ์ง€์ •ํ•œ ํ›„, trainํ•  ์ˆ˜ ์žˆ๋‹ค. data_path๋Š” ์—ฌ๋Ÿฌ๊ฐœ์˜ ๋ฐ์ดํ„ฐ ๋””๋ ‰ํ† ๋ฆฌ๋ฅผ ์ง€์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
parser.add_argument('--data_paths', default='.\\data\\moon,.\\data\\son')
  • train์„ ์ด์–ด์„œ ๊ณ„์†ํ•˜๋Š” ๊ฒฝ์šฐ์—๋Š” '--load_path'๋ฅผ ์ง€์ •ํ•ด ์ฃผ๋ฉด ๋œ๋‹ค.
parser.add_argument('--load_path', default='logdir-tacotron2/moon+son_2019-02-27_00-21-42')
  • model_type์€ 'single' ๋˜๋Š” ' multi-speaker'๋กœ ์ง€์ •ํ•  ์ˆ˜ ์žˆ๋‹ค. speaker๊ฐ€ 1๋ช… ์ผ ๋•Œ๋Š”, hparams์˜ model_type = 'single'๋กœ ํ•˜๊ณ  train_tacotron2.py ๋‚ด์—์„œ '--data_paths'๋ฅผ 1๊ฐœ๋งŒ ๋„ฃ์–ด์ฃผ๋ฉด ๋œ๋‹ค.
parser.add_argument('--data_paths', default='D:\\Tacotron2\\data\\moon')
  • ํ•˜์ดํผํŒŒ๋ผ๋ฉ”ํ„ฐ๋ฅผ hparmas.py์—์„œ argument๋ฅผ train_tacotron2.py์—์„œ ๋‹ค ์„ค์ •ํ–ˆ๊ธฐ ๋•Œ๋ฌธ์—, train ์‹คํ–‰์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋‹จ์ˆœํ•ฉ๋‹ˆ๋‹ค.

python train_tacotron2.py

  • train ํ›„, ์Œ์„ฑ์„ ์ƒ์„ฑํ•˜๋ ค๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์ด ํ•˜๋ฉด ๋œ๋‹ค. '--num_speaker', '--speaker_id'๋Š” ์ž˜ ์ง€์ •๋˜์–ด์•ผ ํ•œ๋‹ค.

python synthesizer.py --load_path logdir-tacotron2/moon+son_2019-02-27_00-21-42 --num_speakers 2 --speaker_id 0 --text "์˜ค์ŠคํŠธ๋ž„๋กœํ”ผํ…Œ์ฟ ์Šค ์•„ํŒŒ๋ Œ์‹œ์Šค๋Š” ๋ฉธ์ข…๋œ ์‚ฌ๋žŒ์กฑ ์ข…์œผ๋กœ, ํ˜„์žฌ์—๋Š” ๋ผˆ ํ™”์„์ด ๋ฐœ๊ฒฌ๋˜์–ด ์žˆ๋‹ค."

Wavenet Vocoder Training

  • train_vocoder.py ๋‚ด์—์„œ '--data_dir'๋ฅผ ์ง€์ •ํ•œ ํ›„, trainํ•  ์ˆ˜ ์žˆ๋‹ค.
  • memory ๋ถ€์กฑ์œผ๋กœ training ๋˜์ง€ ์•Š๊ฑฐ๋‚˜ ๋„ˆ๋ฌด ๋Š๋ฆฌ๋ฉด, hyper paramerter ์ค‘ sample_size๋ฅผ ์ค„์ด๋ฉด ๋œ๋‹ค. ๋ฌผ๋ก  batch_size๋ฅผ ์ค„์ผ ์ˆ˜๋„ ์žˆ๋‹ค.
DATA_DIRECTORY =  'D:\\Tacotron2\\data\\moon,D:\\Tacotron2\\data\\son'
parser.add_argument('--data_dir', type=str, default=DATA_DIRECTORY, help='The directory containing data')
  • train์„ ์ด์–ด์„œ ๊ณ„์†ํ•˜๋Š” ๊ฒฝ์šฐ์—๋Š” '--logdir'๋ฅผ ์ง€์ •ํ•ด ์ฃผ๋ฉด ๋œ๋‹ค.
LOGDIR = './/logdir-wavenet//train//2018-12-21T22-58-10'
parser.add_argument('--logdir', type=str, default=LOGDIR)
  • wavenet train ํ›„, tacotron์ด ์ƒ์„ฑํ•œ mel spectrogram(npyํŒŒ์ผ)์„ local condition์œผ๋กœ ๋„ฃ์–ด์„œ TTS์˜ ์ตœ์ข… ๊ฒฐ๊ณผ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค.

python generate.py --mel ./logdir-wavenet/mel-moon.npy --gc_cardinality 2 --gc_id 0 ./logdir-wavenet/train/2018-12-21T22-58-10

Result

  • Tacotron์˜ batch_size = 32, Wavenet์˜ batch_size=8. GTX 1080ti.
  • Tacotron์€ step 100K, Wavenet์€ 177K ๋งŒํผ train.
  • samples ๋””๋ ‰ํ† ๋ฆฌ์—๋Š” ์ƒ์„ฑ๋œ wavํŒŒ์ผ์ด ์žˆ๋‹ค.
  • Griffin-Lim์œผ๋กœ ์ƒ์„ฑ๋œ ๊ฒƒ๊ณผ Wavenet Vocoder๋กœ ์ƒ์„ฑ๋œ sample์ด ์žˆ๋‹ค.
  • Wavenet์œผ๋กœ ์ƒ์„ฑ๋œ ์Œ์„ฑ์€ train ๋ถ€์กฑ์œผ๋กœ ์žก์Œ์ด ์„ž์—ฌ์žˆ๋‹ค.