• Stars
    star
    599
  • Rank 74,745 (Top 2 %)
  • Language
    Python
  • License
    MIT License
  • Created 11 months ago
  • Updated 30 days ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Webui for using XTTS and for finetuning it

XTTS-WebUI

English

Russian

Português

About the Project

XTTS-Webui is a web interface that allows you to make the most of XTTS. There are other neural networks around this interface that will improve your results. You can also fine tune the model and get a high quality voice model.

image

Key Features

  • Easy working with XTTSv2
  • Batch processing for dubbing a large number of files
  • Ability to translate any audio with voice saving
  • Ability to improve results using neural networks and audio tools automatically
  • Ability to fine tune the model and use it immediately
  • Ability to use tools such as: RVC, OpenVoice, Resemble Enhance, both together and separately
  • Ability to customize XTTS generation, all parameters, multiple speaking samples

TODO

  • Add a status bar with progress and error information
  • Integrate training into the standard interface
  • Add the ability to stream to check the result
  • Add a new way to process text for voiceover
  • Add the ability to customize speakers when batch processing
  • Add API

Installation

Use this web UI through Google Colab

Please ensure you have Python 3.10.x or Python 3.11, CUDA 11.8 or CUDA 12.1 , Microsoft Builder Tools 2019 with c++ package, and ffmpeg installed

1 Method, through scripts

Windows

To get started:

  • Run 'install.bat' file
  • To start the web UI, run 'start_xtts_webui.bat'
  • Open your preferred browser and go to local address displayed in console.

Linux

To get started:

  • Run 'install.sh' file
  • To start the web UI, run 'start_xtts_webui.sh'
  • Open your preferred browser and go to local address displayed in console.

2 Method, Manual

Follow these steps for installation:

  1. Ensure that CUDA is installed

  2. Clone the repository: git clone https://github.com/daswer123/xtts-webui

  3. Navigate into the directory: cd xtts-webui

  4. Create a virtual environment: python -m venv venv

  5. Activate the virtual environment:

    • On Windows use : venv\scripts\activate
    • On linux use : source venv\bin\activate
  6. Install PyTorch and torchaudio with pip command :

    pip install torch==2.1.1+cu118 torchaudio==2.1.1+cu118 --index-url https://download.pytorch.org/whl/cu118

  7. Install all dependencies from requirements.txt :

    pip install -r requirements.txt

Running The Application

To launch the interface please follow these steps:

Starting XTTS WebUI :

Activate your virtual environment:

venv/scripts/activate

or if you're on Linux,

source venv/bin/activate

Then start the webui for xtts by running this command:

python app.py

Here are some runtime arguments that can be used when starting the application:

Argument Default Value Description
-hs, --host 127.0.0.1 The host to bind to
-p, --port 8010 The port number to listen on
-d, --device cuda Which device to use (cpu or cuda)
-sf,--speaker_folder speakers/ Directory containing TTS samples
-o,--output "output/" Output directory
-l,--language "auto" Webui language, you can see the available translations in the i18n/locale folder.
-ms,--model-source "local" Define the model source: 'api' for latest version from repository, api inference or 'local' for using local inference and model v2.0.2
-v,-version "v2.0.2" You can specify which version of xtts to use. You can specify the name of the custom model for this purpose put the folder in models and specify the name of the folder in this flag
--lowvram Enable low vram mode which switches the model to RAM when not actively processing
--deepspeed Enable deepspeed acceleration. Works on windows on python 3.10 and 3.11
--share Allows sharing of interface outside local computer
--rvc Enable RVC post-processing, all models should locate in rvc folder

TTS -> RVC

Module for RVC, you can enable the RVC module to postprocess the received audio for this you need to add the --rvc flag if you are running in the console or write it to the startup file

In order for the model to work in RVC settings you need to select a model that you must first upload to the voice2voice/rvc folder, the model and index file must be together, the index file is optional, each model must be in a separate folder.

Differences between xtts-webui and the official webui

Data processing

  1. Updated faster-whisper to 0.10.0 with the ability to select a larger-v3 model.
  2. Changed output folder to output folder inside the main folder.
  3. If there is already a dataset in the output folder and you want to add new data, you can do so by simply adding new audio, what was there will not be processed again and the new data will be automatically added
  4. Turn on VAD filter
  5. After the dataset is created, a file is created that specifies the language of the dataset. This file is read before training so that the language always matches. It is convenient when you restart the interface

Fine-tuning XTTS Encoder

  1. Added the ability to select the base model for XTTS, as well as when you re-training does not need to download the model again.
  2. Added ability to select custom model as base model during training, which will allow finetune already finetune model.
  3. Added possibility to get optimized version of the model for 1 click ( step 2.5, put optimized version in output folder).
  4. You can choose whether to delete training folders after you have optimized the model
  5. When you optimize the model, the example reference audio is moved to the output folder
  6. Checking for correctness of the specified language and dataset language

Inference

  1. Added possibility to customize infer settings during model checking.

Other

  1. If you accidentally restart the interface during one of the steps, you can load data to additional buttons
  2. Removed the display of logs as it was causing problems when restarted
  3. The finished result is copied to the ready folder, these are fully finished files, you can move them anywhere and use them as a standard model
  4. Added support for Japanese here

More Repositories

1

xtts-api-server

A simple FastAPI Server to run XTTSv2
Python
385
star
2

xtts-finetune-webui

Slightly improved official version for finetune xtts
Python
212
star
3

rvc-python

Using RVC via console or python scripts
Python
67
star
4

deepspeed-windows-wheels

A collection of compiled wheels for deepspeed built for python 3.10 and 3.11 with support for cuda 11.8 and 12.1 for Windows
42
star
5

RVC-telegram-bot

Bot in Telegram that converts voice with rvc
Python
27
star
6

Voyager_checkpoint

Checkpoint for Voyager, 160 iterations.
JavaScript
21
star
7

silero-rvc-tts-ru-gui

Комбинация технологии silero-tts и rvc для создания любого голоса для tts
Python
13
star
8

xformers_prebuild_wheels

xformers prebuild wheels for various video cards, suitable for both paperspace and google colab
Python
12
star
9

xtts-finetune-tests

In this repository I will be running various experiments on finetune different parts for xtts
Python
10
star
10

openvoice-cli

Instant voice cloning by MyShell. Via the console
Python
6
star
11

ClothSwapTgBot

Бот который умеет менять одежду у людей и аниме персонажей
JavaScript
6
star
12

canvas-zoom-showcase

Demonstration of features in canvas-zoom
5
star
13

simple-audio-slicer

Easy audio slicing solution for preparing a dataset for RVC or So-VITS-SVC
Python
5
star
14

silero-tts-enhanced

Silero TTS Enhanced is a Python library that enhances the original Silero TTS project, providing a convenient way to synthesize speech from text using Silero TTS models. It offers a user-friendly interface for both standalone script usage and integration into Python projects, along with additional features
Python
5
star
15

VoicePolyglot

Trying to get your voice in another language with intonation preservation. The project uses XTTS , Whisper
Python
5
star
16

wd14-tagger-api-server

FastApi server for WD Tagger 1.4
Python
5
star
17

canvas-zoom-lite

a light version of canvas-zoom that contains only the functions you need
JavaScript
4
star
18

portable-maker

This project will help you quickly create a portable version of your python project
Batchfile
3
star
19

painthua-flask

HTML
2
star
20

v-express-webui

Webui for v-express
Python
2
star
21

stable-waifu-tg-public

Проект вдохновленный и скопированный с телеграмм бота @stablewaifu
JavaScript
1
star
22

tg_bot_template

My TG bot template
JavaScript
1
star
23

reverse-proxy

A simple reverse proxy to access ChatGPT and Claude 3 through countries that do not have access to them.
Python
1
star
24

tg_youtube_dowloader

Бот в телеграмм, которые может скачивать видео с ютуба и создавать инструментал к песням
Python
1
star
25

book-translator

Translate book via LLM
1
star
26

yandex-tts-free

Free use of YandexSpeachKit in Russian language
Python
1
star
27

split_audio_speakers

split the audio into all speakers
Python
1
star