• Stars
    star
    330
  • Rank 127,657 (Top 3 %)
  • Language
    Python
  • License
    MIT License
  • Created over 6 years ago
  • Updated almost 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

An application for real-time voice conversion

Realtime Yukarin: an application for real-time voice conversion

Realtime Yukarin is the application for real-time voice conversion with a single command. This application needs trained deep learning models and a GPU computer. The source code is an OSS and MIT license. So you can modify this code, or use it for your applications whether commercial or non-commercial.

Japanese README

Supported environment

  • Windows
  • GeForce GTX 1060
  • 6GB GPU memory
  • Intel Core i7-7700 CPU @ 3.60GHz
  • Python 3.6

Preparation

Installation required libraries

pip install -r requirements.txt

Prepare trained models

You need two trained models, a first stage model responsible for voice conversion and a second stage model for enhancing the quality of the converted results. You can create a first stage model with Yukarin and a second stage model with Become Yukarin.

Also, for voice pitch conversion, you need a file of frequency statistics at Yukarin.

Here, each filename is as follows:

Content Filename
Frequency statistics for input voice ./sample/input_statistics.npy
Frequency statistics for target voice ./sample/target_statistics.npy
First stage model from Yukarin ./sample/model_stage1/predictor.npz
First stage's config file ./sample/model_stage1/config.json
Second stage model from Become Yukarin ./sample/model_stage2/predictor.npz
Second stage's config file ./sample/model_stage2/config.json

Verification

You can verify prepared files with executing ./check.py. The following example converts 5 seconds voice data of input.wav, and save to output.wav.

python check.py \
    --input_path 'input.wav' \
    --input_time_length 5 \
    --output_path 'output.wav' \
    --input_statistics_path './sample/input_statistics.npy' \
    --target_statistics_path './sample/target_statistics.npy' \
    --stage1_model_path './sample/model_stage1/predictor.npz' \
    --stage1_config_path './sample/model_stage1/config.json' \
    --stage2_model_path './sample/model_stage2/predictor.npz' \
    --stage2_config_path './sample/model_stage2/config.json' \

If you have problems, you can ask questions on Github Issue.

Run

To perform real-time voice conversion, create a config file config.yaml and run ./run.py.

python run.py ./config.yaml

Description of config file

# Name of input sound device. Partial Match. Details are below.
input_device_name: str

# Name of output sound device. Partial Match. Details are below.
output_device_name: str

# Input sampling rate
input_rate: int

# Output sampling rate
output_rate: int

# frame_period for Acoustic feature
frame_period: int

# Length of voice to convert at one time (seconds).
# If it is too long, delay will increase, and if it is too short, processing will not catch up.
buffer_time: float

# Method to calclate the fundamental frequency. world ofr crepe.
# CREPE needs additional libraries, details are requirements.txt
extract_f0_mode: world

# Length of voice to be synthesized at one time (number of samples)
vocoder_buffer_size: int

# Amplitude scaling for input.
# When it is more than 1, the amplitude becomes large, and when it is less than 1, the amplitude becomes small.
input_scale: float

# Amplitude scaling for output.
# When it is more than 1, the amplitude becomes large, and when it is less than 1, the amplitude becomes small.
output_scale: float

# Silence threshold for input (db).
# The smaller the value, the easier it is to silence.
input_silent_threshold: float

# Silence threshold for output (db).
# The smaller the value, the easier it is to silence.
output_silent_threshold: float

# Overlap for encoding (seconds)
encode_extra_time: float

# Overlap for converting (seconds)
convert_extra_time: float

# Overlap for decoding (seconds)
decode_extra_time: float

# Path of frequency statistics file
input_statistics_path: str
target_statistics_path: str

# Path of trained model file
stage1_model_path: str
stage1_config_path: str
stage2_model_path: str
stage2_config_path: str

(preliminary knowledge) Name of sound device

In the example below, Logitech Speaker is the name of the sound device.

License

MIT License

More Repositories

1

become-yukarin

Convert your voice to favorite voice
Python
571
star
2

yukarin

ディープラーニング声質変換の第1段階モデルの学習コード
Python
141
star
3

pytorch-trainer

PyTorch's Trainer like Chainer's Trainer
Python
46
star
4

jvs_hiho

JVS (Japanese versatile speech) コーパスの自作のラベル
Shell
31
star
5

vv_core_inference

VOICEVOXのコア内で用いられているディープラーニングモデルの推論コード
Python
27
star
6

yukarin_autoreg

Python
27
star
7

hihobot

自分のチャットボットを作る
Python
23
star
8

hihobot-synthesis

自分の声で音声合成
Python
16
star
9

openjtalk-label-getter

Python
10
star
10

kiritan_singing_label_reader

The reader for 東北きりたん歌唱データベース's label data in python.
Python
8
star
11

commecomme

ニコニコのコメントなどを画面上に表示するツール
JavaScript
6
star
12

girl_friend_factory

JavaScript
6
star
13

acoustic_feature_extractor

Python
6
star
14

yukarin_soso_connector

Python
5
star
15

hihobot-tts

自分のように対話し、自分の声で音声合成するライブラリのWebAPI化する
Python
4
star
16

iOS-Flat-UI-Libraries

The flat UI Libraries for iOS, which I collect.
3
star
17

hihobot-front

自分と音声会話するWebアプリ
JavaScript
3
star
18

voiceactress100_ruby

読み仮名(ルビ)つき声優統計コーパス音素バランス文
HTML
2
star
19

accent_estimator

Python
2
star
20

hiho-gcp

2
star
21

yukarin_nsf

Python
2
star
22

temp_cache

simply python3 library for creating temporary cache file library
Python
2
star
23

voice_encoder

Python
2
star
24

yukarin_wavegrad

Python
2
star
25

yukarin-tts-software

TypeScript
1
star
26

blog

1
star
27

yukari_direct

誰でも結月ゆかりになれるwebサービス
JavaScript
1
star
28

yukari_direct_server

Python
1
star
29

yukarin_sos

Python
1
star
30

ita_corpus_hiho

Python
1
star
31

hiroshiba_mastodon_bot

Python
1
star
32

yukarin_sosf

Python
1
star
33

paint_transfer_c92

Python
1
star
34

hiho-config

hiho's configs
1
star
35

yukarin_soso_orchestra

1
star
36

yukarin_tts_software_engine

Python
1
star
37

yukarin_wavernn

Python
1
star
38

yukarin_soso

Python
1
star
39

tornado_instant_webapi

Library for automatically generating web API from Python object based on Tornado.
Python
1
star
40

nicolive-mastodon

マストドンのトゥートをHTML5コメントジェネレーターに流し込むツール
Python
1
star
41

jvs_metadata_loader

Metadata loader for JVS (Japanese versatile speech) corpus.
Python
1
star
42

hiho_check_src

privリポジトリを持ってくるコード、こっちがpublic
Shell
1
star
43

signico_real_to_anime

2種類の画像を相互変換する
Python
1
star
44

voicevox_overview

VOICEVOXの全体像の概要
1
star
45

check_diffusion_sine

diffusionベースでサイン波を作ったりするチェック用のコード
Jupyter Notebook
1
star
46

yukarin_so

Python
1
star