• Stars
    star
    893
  • Rank 51,128 (Top 2 %)
  • Language
    Python
  • License
    MIT License
  • Created over 4 years ago
  • Updated 8 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Open Text to Speech Server

Open Text to Speech Server

Unifies access to multiple open source text to speech systems and voices for many languages.

Supports a subset of SSML that can use multiple voices, text to speech systems, and languages!

<speak>
  The 1st thing to remember is that 27 languages are supported in Open TTS as of 10/13/2021 at 3pm.

  <voice name="glow-speak:en-us_mary_ann">
    <s>
      The current voice can be changed, even to a different text to speech system!
    </s>
  </voice>

  <voice name="coqui-tts:en_vctk#p228">
    <s>Breaks are possible</s>
    <break time="0.5s" />
    <s>between sentences.</s>
  </voice>

  <s lang="en">
    One language is never enough
  </s>
  <s lang="de">
   Eine Sprache ist niemals genug
  </s>
  <s lang="ja">
    言θͺžγ‚’δΈ€γ€γ―ζ±Ίγ—γ¦θΆ³γ‚Šγͺい
  </s>
  <s lang="sw">
    Lugha moja haitoshi
  </s>
</speak>

See the full SSML example (use synesthesiam/opentts:all Docker image with all voices included)

Listen to voice samples

Web interface screenshot

Voices

  • Larynx
    • English (27), German (7), French (3), Spanish (2), Dutch (4), Russian (3), Swedish (1), Italian (2), Swahili (1)
    • Model types available: GlowTTS
    • Vocoders available: HiFi-Gan (3 levels of quality)
    • Patched embedded version of Larynx 1.0
  • Glow-Speak
    • English (2), German (1), French (1), Spanish (1), Dutch (1), Russian (1), Swedish (1), Italian (1), Swahili (1), Greek (1), Finnish (1), Hungarian (1), Korean (1)
    • Model types available: GlowTTS
    • Vocoders available: HiFi-Gan (3 levels of quality)
  • Coqui-TTS
    • English (110), Japanese (1), Chinese (1)
    • Patched embedded version of Coqui-TTS 0.3.1
  • nanoTTS
    • English (2), German (1), French (1), Italian (1), Spanish (1)
  • MaryTTS
    • English (7), German (3), French (4), Italian (1), Russian (1), Swedish (1), Telugu (1), Turkish (1)
    • Includes embedded MaryTTS
  • flite
    • English (19), Hindi (1), Bengali (1), Gujarati (3), Kannada (1), Marathi (2), Punjabi (1), Tamil (1), Telugu (3)
  • Festival
    • English (9), Spanish (1), Catalan (1), Czech (4), Russian (1), Finnish (2), Marathi (1), Telugu (1), Hindi (1), Italian (2), Arabic (2)
    • Spanish/Catalan/Finnish use ISO-8859-15 encoding
    • Czech uses ISO-8859-2 encoding
    • Russian is transliterated from Cyrillic to Latin script automatically
    • Arabic uses UTF-8 and is diacritized with mishkal
  • eSpeak
    • Supports huge number of languages/locales, but sounds robotic

Running

Basic OpenTTS server:

$ docker run -it -p 5500:5500 synesthesiam/opentts:<LANGUAGE>

where <LANGUAGE> is one of:

  • all (All languages)
  • ar (Arabic)
  • bn (Bengali)
  • ca (Catalan)
  • cs (Czech)
  • de (German)
  • el (Greek)
  • en (English)
  • es (Spanish)
  • fi (Finnish)
  • fr (French)
  • gu (Gujarati)
  • hi (Hindi)
  • hu (Hungarian)
  • it (Italian)
  • ja (Japanese)
  • kn (Kannada)
  • ko (Korean)
  • mr (Marathi)
  • nl (Dutch)
  • pa (Punjabi)
  • ru (Russian)
  • sv (Swedish)
  • sw (Swahili)
  • ta (Tamil)
  • te (Telugu)
  • tr (Turkish)
  • zh (Chinese)

Visit http://localhost:5500

For HTTP API test page, visit http://localhost:5500/openapi/

Exclude eSpeak (robotic voices):

$ docker run -it -p 5500:5500 synesthesiam/opentts:<LANGUAGE> --no-espeak

WAV Cache

You can have the OpenTTS server cache WAV files with --cache:

$ docker run -it -p 5500:5500 synesthesiam/opentts:<LANGUAGE> --cache

This will store WAV files in a temporary directory (inside the Docker container). A specific directory can also be used:

$ docker run -it -v /path/to/cache:/cache -p 5500:5500 synesthesiam/opentts:<LANGUAGE> --cache /cache

HTTP API Endpoints

See swagger.yaml

  • GET /api/tts
    • ?voice - voice in the form tts:voice (e.g., espeak:en)
    • ?text - text to speak
    • ?cache - disable WAV cache with false
    • Returns audio/wav bytes
  • GET /api/voices
    • Returns JSON object
    • Keys are voice ids in the form tts:voice
    • Values are objects with:
      • id - voice identifier for TTS system (string)
      • name - friendly name of voice (string)
      • gender - M or F (string)
      • language - 2-character language code (e.g., "en")
      • locale - lower-case locale code (e.g., "en-gb")
      • tts_name - name of text to speech system
    • Filter voices using query parameters:
      • ?tts_name - only text to speech system(s)
      • ?language - only language(s)
      • ?locale - only locale(s)
      • ?gender - only gender(s)
  • GET /api/languages
    • Returns JSON list of supported languages
    • Filter languages using query parameters:
      • ?tts_name - only text to speech system(s)

SSML

A subset of SSML is supported:

  • <speak> - wrap around SSML text
    • lang - set language for document
  • <s> - sentence (disables automatic sentence breaking)
    • lang - set language for sentence
  • <w> / <token> - word (disables automatic tokenization)
  • <voice name="..."> - set voice of inner text
    • voice - name or language of voice
      • Name format is tts:voice (e.g., "glow-speak:en-us_mary_ann") or tts:voice#speaker_id (e.g., "coqui-tts:en_vctk#p228")
      • If one of the supported languages, a preferred voice is used (override with --preferred-voice <lang> <voice>)
  • <say-as interpret-as=""> - force interpretation of inner text
    • interpret-as one of "spell-out", "date", "number", "time", or "currency"
    • format - way to format text depending on interpret-as
      • number - one of "cardinal", "ordinal", "digits", "year"
      • date - string with "d" (cardinal day), "o" (ordinal day), "m" (month), or "y" (year)
  • <break time=""> - Pause for given amount of time
    • time - seconds ("123s") or milliseconds ("123ms")
  • <sub alias=""> - substitute alias for inner text

MaryTTS Compatible Endpoint

Use OpenTTS as a drop-in replacement for MaryTTS.

The voice format is <TTS_SYSTEM>:<VOICE_NAME>. Visit the OpenTTS web UI and copy/paste the "voice id" of your favorite voice here.

You may need to change the port in your docker run command to -p 59125:5500 for compatibility with existing software.

Larynx Voice Quality

On the Raspberry Pi, you may need to lower the quality of Larynx voices to get reasonable response times.

This is done by appending the quality level to the end of your voice:

tts:
  - platform: marytts
    voice:larynx:harvard;low

Available quality levels are high (the default), medium, and low.

Note that this only applies to Larynx and Glow-Speak voices.

Speaker ID

For multi-speaker models (currently just coqui-tts:en_vctk), you can append a speaker name or id to your voice:

tts:
  - platform: marytts
    voice:coqui-tts:en_vctk#p228

You can get the available speaker names from /api/voices or provide a 0-based index instead:

tts:
  - platform: marytts
    voice:coqui-tts:en_vctk#42

Default Larynx Settings

Default settings for Larynx can be provided on the command-line:

  • --larynx-quality - vocoder quality ("high", "medium", or "low", default: "high")
  • --larynx-noise-scale - voice volatility (0-1, default: 0.667)
  • --larynx-length-scale - voice speed (< 1 is faster, default: 1.0)

Building From Source

OpenTTS uses Docker buildx to build multi-platform images based on Debian bullseye.

Before building, make sure to download the voices you want to the voices directory. Each TTS system that uses external voices has a sub-directory with instructions on how to download voices.

If you only plan to build an image for your current platform, you should be able to run:

make <lang>

from the root of the cloned repository, where <lang> is one of the supported languages. If it builds successfully, you can run it with:

make <lang>-run

For example, the English image can be built and run with:

make en
make en-run

Under the hood, this does two things:

  1. Runs the configure script with --languages <lang>
  2. Runs docker buildx build with the appropriate arguments

You can manually run the configure script -- see ./configure --help for more options. This script generates the following files (used by the build process):

  • build_packages - Debian packages installed with apt-get during the build only
  • packages - Debian packages installed with apt-get for runtime
  • python_packages - Python packages installed with pip
  • .dockerignore - Files that docker will ignore during building ("!" inverts)
  • .dockerargs - Command-line arguments passed to docker buildx build

Multi-Platform images

To build an image for a different platform, you need to initialize a docker buildx builder:

docker run --rm --privileged multiarch/qemu-user-static --reset -p yes
docker buildx create --config /etc/docker/buildx.conf --use --name mybuilder
docker buildx use mybuilder
docker buildx inspect --bootstrap

NOTE: For some reason, you have to do these steps each time you reboot. If you see errors like "Error while loading /usr/sbin/dpkg-split: No such file or directory", run docker buildx rm mybuilder and re-run the steps above.

When you run make, specify the platform(s) you want to build for:

DOCKER_PLATFORMS='--platform linux/amd64,linux/arm64,linux/arm/v7' make <lang>

You may place pre-compiled Python wheels in the download directory. They will be used during the installation of Python packages.

More Repositories

1

voice2json

Command-line tools for speech and intent recognition on Linux
Python
1,085
star
2

rhasspy

Rhasspy voice assistant for offline home automation
HTML
942
star
3

homeassistant-satellite

Streaming audio satellite for Home Assistant
Python
187
star
4

docker-mozillatts

Docker image for Mozilla TTS server
Python
179
star
5

old-custom-components

A voice assistant toolkit for Home Assistant
Python
75
star
6

magicpy

An autostereogram (MagicEye) image generator written in Python
Python
70
star
7

coqui-docker

Docker images for Coqui AI
Shell
55
star
8

hassio-addons

My Hass.IO add-ons
Shell
43
star
9

docker-marytts

MaryTTS text to speech server and a collection of voices for various languages
Shell
33
star
10

voice-recorder

Simple tkinter application for recorded voice samples with text prompts
Python
17
star
11

eyecode

Python library for analyzing gaze data from programmers
JavaScript
17
star
12

jsgf-gen

Tool for generating tagged sentences from JSGF grammars
Java
14
star
13

voice2json-profiles

Speech models and artifacts for voice2json
Python
11
star
14

jsgf2fst

Python
9
star
15

pt-br_pocketsphinx-cmu

Portuguese voice2json profile based on Pocketsphinx
Python
7
star
16

zh-cn_pocketsphinx-cmu

Mandarin voice2json profile based on Pocketsphinx
Python
7
star
17

homeassistant-pipeline

Websocket client for Assist audio pipeline
Python
7
star
18

en-us_deepspeech-mozilla

U.S. English profile for Mozilla DeepSpeech
Python
7
star
19

openwakeword-satellite

Basic satellite for Home Assistant running openWakeWord locally
Python
6
star
20

ru_pocketsphinx-cmu

Russian voice2json profile based on Pocketsphinx
Python
6
star
21

eyecode-tools

A collection of tools for analyzing data from my eyeCode experiment
Python
5
star
22

novice

Special Python image submodule for beginners
Python
5
star
23

en-us_kaldi-zamia

U.S. English voice2json profile based on Kaldi
Python
5
star
24

en-us_pocketsphinx-cmu

U.S. English voice2json profile based on Pocketsphinx
Python
5
star
25

de_deepspeech-aashishag

German profile using Mozilla's DeepSpeech and Aashishag Model
Python
5
star
26

el-gr_pocketsphinx-cmu

Greek voice2json profile based on Pocketsphinx
Python
5
star
27

mnemofy

Python utility to convert between words and mnemonic numbers
Python
4
star
28

rhasspy-profiles

Language-specific profiles for Rhasspy Hass.io add-on
Makefile
3
star
29

motion-sensor

Wakes/sleeps a Raspberry Pi display using a PIR sensor
Python
3
star
30

pl_julius-github

Polish voice2json profile based on Julius
Python
3
star
31

de_kaldi-zamia

German voice2json profile based on Kaldi
Python
3
star
32

wav-chunk

Read or write INFO chunks in WAV files
Python
3
star
33

artwork

Some of my art (for some definition of art)
Makefile
3
star
34

fr_kaldi-guyot

French profile for voice2json using Kaldi with Paul Guyot's TDN 250 model
Python
3
star
35

docker-deepvoice3

DeepVoice3 web server with pre-trained English models
Python
2
star
36

rhasspy-asr-kaldi

Automated speech recognition library for Rhasspy using Kaldi
Shell
2
star
37

pt-synesthesiam

CMU Sphinx acoustic model for Portugese (pt-br)
Jupyter Notebook
2
star
38

word2phonemes

Grapheme to phoneme guesser using PyTorch
Python
2
star
39

vi_kaldi-montreal

Vietnamese voice2json profile based on Kaldi
Python
2
star
40

epub3-marytts

MaryTTS voice project builder for pre-aligned EPUB 3 audio e-books
Python
2
star
41

esphome-nabu

C++
2
star
42

nexus

A collection of Cognitive Science experimental games
C#
2
star
43

hi_pocketsphinx-cmu

Hindi voice2json profile based on Pocketsphinx
Python
1
star
44

mycroft-precise-trainer

Text to speech wake word training scripts for Mycroft Precise
Python
1
star
45

sv_kaldi-montreal

Swedish voice2json profile based on Kaldi
Python
1
star
46

public-domain-sounds

Compressed WAV files from Public Domain Sounds
1
star
47

de_pocketsphinx-cmu

German voice2json profile based on Pocketsphinx
Python
1
star
48

pocketsphinx-python

Version of Python Pocketsphinx without sound
Python
1
star
49

es_pocketsphinx-cmu

Spanish voice2json profile based on Pocketsphinx
Python
1
star
50

2014-03-10-uva

Software Carpentry repository for University of Virginia bootcamp
Python
1
star
51

lutz

C++ library to compute Lutz complexity of a graph
C++
1
star
52

coqui-tts-tests

Test sound files for Coqui TTS
HTML
1
star
53

marytts-txt2wav

Command-line utility for text to speech with MaryTTS
Java
1
star
54

nl_kaldi-cgn

Voice2json profile for Dutch based on Kaldi CGN model
Python
1
star
55

rhasspy-nlu

Intent recognition library for Rhasspy
Python
1
star
56

kaldi-docker

Dockerizing a sub-set of Kaldi
Dockerfile
1
star
57

ko-kr_kaldi-montreal

Korean voice2json profile based on Kaldi
Python
1
star
58

ca-es_pocketsphinx-cmu

Catalan voice2json profile based on Pocketsphinx
Python
1
star
59

spatial_entropy

Computes an entropy profile for an image using moving averages
Python
1
star
60

kz_pocketsphinx-cmu

Kazakh voice2json profile based on Pocketsphinx
Python
1
star
61

wav-decoder

Basic WAV file decoder in C++
C++
1
star