• This repository has been archived on 21/Jun/2023
  • Stars
    star
    821
  • Rank 54,217 (Top 2 %)
  • Language
    Python
  • License
    MIT License
  • Created over 3 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

End to end text to speech system using gruut and onnx

Larynx

🎥 DEMO VIDEO

Offline end-to-end text to speech system using gruut and onnx (architecture). There are 50 voices available across 9 languages.

curl https://raw.githubusercontent.com/rhasspy/larynx/master/docker/larynx-server \
    > ~/bin/larynx-server && chmod +755 ~/bin/larynx-server
larynx-server

Visit http://localhost:5002 for the test page. See http://localhost:5002/openapi/ for HTTP endpoint documentation.

Larynx screenshot

Supports a subset of SSML that can use multiple voices and languages!

<speak>
  The 1st thing to remember is that 9 languages are supported in Larynx TTS as of 10/19/2021 at 10:39am.

  <voice name="harvard">
    <s>
      The current voice can be changed!
    </s>
  </voice>

  <voice name="northern_english_male">
    <s>Breaks are possible</s>
    <break time="0.5s" />
    <s>between sentences.</s>
  </voice>

  <s lang="en">
    One language is never enough
  </s>
  <s lang="de">
   Eine Sprache ist niemals genug
  </s>
  <s lang="sw">
    Lugha moja haitoshi
  </s>
</speak>

Larynx's goals are:

  • "Good enough" synthesis to avoid using a cloud service
  • Faster than realtime performance on a Raspberry Pi 4 (with low quality vocoder)
  • Broad language support (9 languages)
  • Voices trained purely from public datasets

You can use Larynx to:

Samples

Listen to voice samples from all of the pre-trained voices.


Docker Installation

Pre-built Docker images are available for the following platforms:

  • linux/amd64 - desktop/laptop/server
  • linux/arm64 - Raspberry Pi 64-bit
  • linux/arm/v7 - Raspberry Pi 32-bit

These images include a single English voice, but many more can be downloaded from within the web interface.

The larynx and larynx-server shell scripts wrap the Docker images, allowing you to use Larynx as a command-line tool.

To manually run the Larynx web server in Docker:

docker run \
    -it \
    -p 5002:5002 \
    -e "HOME=${HOME}" \
    -v "$HOME:${HOME}" \
    -v /usr/share/ca-certificates:/usr/share/ca-certificates \
    -v /etc/ssl/certs:/etc/ssl/certs \
    -w "${PWD}" \
    --user "$(id -u):$(id -g)" \
    rhasspy/larynx

Downloaded voices will be stored in ${HOME}/.local/share/larynx.

Visit http://localhost:5002 for the test page. See http://localhost:5002/openapi/ for HTTP endpoint documentation.

Debian Installation

Pre-built Debian packages for bullseye are available for download with the name larynx-tts_<VERSION>_<ARCH>.deb where ARCH is one of amd64 (most desktops, laptops), armhf (32-bit Raspberry Pi), and arm64 (64-bit Raspberry Pi)

Example installation on a typical desktop:

sudo apt install ./larynx-tts_<VERSION>_amd64.deb

From there, you may run the larynx command or larynx-server to start the web server (http://localhost:5002).

Python Installation

You may need to install the following dependencies (besides Python 3.7+):

sudo apt-get install libopenblas-base libgomp1 libatomic1

On 32-bit ARM systems (Raspberry Pi), you will also need:

sudo apt-get install libatlas3-base libgfortran5

Next, create a Python virtual environment:

python3 -m venv larynx_venv
source larynx_venv/bin/activate

pip3 install --upgrade pip
pip3 install --upgrade wheel setuptools

Next, install larynx:

pip3 install -f 'https://synesthesiam.github.io/prebuilt-apps/' -f 'https://download.pytorch.org/whl/cpu/torch_stable.html' larynx

Then run larynx or larynx.server for the web server. You may also execute the Python module directly with python3 -m larynx and python3 -m larynx.server.

Voice/Vocoder Download

Voices and vocoders are automatically downloaded when used on the command-line or in the web server. You can also manually download each voice. Extract them to ${HOME}/.local/share/larynx/voices so that the directory structure follows the pattern ${HOME}/.local/share/larynx/voices/<language>,<voice>.


Command-Line Interface

Larynx has a flexible command-line interface, available with:

  • The larynx script for Docker
  • The larynx command from the Debian package
  • larynx or python3 -m larynx for Python installations

Basic Synthesis

larynx -v <VOICE> "<TEXT>" > output.wav

where <VOICE> is a language name (en, de, etc) or a voice name (ljspeech, thorsten, etc). <TEXT> may contain multiple sentences, which will be combined in the final output WAV file. These can also be split into separate WAV files.

To adjust the quality of the output, use -q <QUALITY> where <QUALITY> is "high" (slowest), "medium", or "low" (fastest).

SSML Synthesis

larynx --ssml -v <VOICE> "<SSML>" > output.wav

where <SSML> is valid SSML. Not all features are supported; for example:

  • Breaks (pauses) can only occur between sentences and can only be specified in seconds or milliseconds
  • Voices can only be referenced by name
  • Custom lexicons are not yet supported (you can use <phoneme ph="...">, however)

If your SSML contains <mark> tags, add --mark-file <FILE> to the command-line. As the marks are encountered (between sentences), their names will be written on separate lines to the file.

CUDA Accelerated Synthesis

The --cuda flag will make use of a GPU if its available to PyTorch:

larynx --cuda 'This is spoken on the GPU.' > output.wav

Adding the --half flag will enable half-precision inference, which is often faster:

larynx --cuda --half 'This is spoken on the GPU even faster.' > output.wav

For CUDA acceleration to work, your voice must contain a PyTorch checkpoint file (generator.pth). Older Larynx voices did not have these, so you may need to re-download your voices.

Long Texts

If your text is very long, and you would like to listen to it as its being synthesized, use the --raw-stream option:

larynx -v en --raw-stream < long.txt | aplay -r 22050 -c 1 -f S16_LE

Each input line will be synthesized and written the standard out as raw 16-bit 22050Hz mono PCM. By default, 5 sentences will be kept in an output queue, only blocking synthesis when the queue is full. You can adjust this value with --raw-stream-queue-size. Additionally, you can adjust --max-thread-workers to change how many threads are available for synthesis.

If your long text is fixed-width with blank lines separating paragraphs like those from Project Gutenberg, use the --process-on-blank-line option so that sentences will not be broken at line boundaries. For example, you can listen to "Alice in Wonderland" like this:

curl --output - 'https://www.gutenberg.org/files/11/11-0.txt' | \
    larynx -v ek --raw-stream --process-on-blank-line | aplay -r 22050 -c 1 -f S16_LE

Multiple WAV Output

With --output-dir set to a directory, Larynx will output a separate WAV file for each sentence:

larynx -v en 'Test 1. Test 2.' --output-dir /path/to/wavs

By default, each WAV file will be named using the (slightly modified) text of the sentence. You can have WAV files named using a timestamp instead with --output-naming time. For full control of the output naming, the --csv command-line flag indicates that each sentence is of the form id|text where id will be the name of the WAV file.

cat << EOF |
s01|The birch canoe slid on the smooth planks.
s02|Glue the sheet to the dark blue background.
s03|It's easy to tell the depth of a well.
s04|These days a chicken leg is a rare dish.
s05|Rice is often served in round bowls.
s06|The juice of lemons makes fine punch.
s07|The box was thrown beside the parked truck.
s08|The hogs were fed chopped corn and garbage.
s09|Four hours of steady work faced us.
s10|Large size in stockings is hard to sell.
EOF
  larynx --csv --voice en --output-dir /path/to/wavs

Interactive Mode

With no text input and no output directory, Larynx will switch into interactive mode. After entering a sentence, it will be played with --play-command (default is play from SoX).

larynx -v en
Reading text from stdin...
Hello world!<ENTER>

Use CTRL+D or CTRL+C to exit.

GlowTTS Settings

The GlowTTS voices support two additional parameters:

  • --noise-scale - determines the speaker volatility during synthesis (0-1, default is 0.667)
  • --length-scale - makes the voice speaker slower (> 1) or faster (< 1)

Vocoder Settings

  • --denoiser-strength - runs the denoiser if > 0; a small value like 0.005 is a good place to start.

List Voices and Vocoders

larynx --list

MaryTTS Compatible API

To use Larynx as a drop-in replacement for a MaryTTS server (e.g., for use with Home Assistant), run:

docker run \
    -it \
    -p 59125:5002 \
    -e "HOME=${HOME}" \
    -v "$HOME:${HOME}" \
    -v /usr/share/ca-certificates:/usr/share/ca-certificates \
    -v /etc/ssl/certs:/etc/ssl/certs \
    -w "${PWD}" \
    --user "$(id -u):$(id -g)" \
    rhasspy/larynx

The /process HTTP endpoint should now work for voices formatted as <LANG> or <VOICE>, e.g. en or harvard.

You can specify the vocoder quality by adding ;<QUALITY> to the MaryTTS voice where QUALITY is "high", "medium", or "low".

For example: en;low will use the lowest quality (but fastest) vocoder. This is usually necessary to get decent performance on a Raspberry Pi.


SSML

A subset of SSML is supported (use --ssml):

  • <speak> - wrap around SSML text
    • lang - set language for document
  • <s> - sentence (disables automatic sentence breaking)
    • lang - set language for sentence
  • <w> / <token> - word (disables automatic tokenization)
  • <voice name="..."> - set voice of inner text
    • voice - name or language of voice
  • <say-as interpret-as=""> - force interpretation of inner text
    • interpret-as one of "spell-out", "date", "number", "time", or "currency"
    • format - way to format text depending on interpret-as
      • number - one of "cardinal", "ordinal", "digits", "year"
      • date - string with "d" (cardinal day), "o" (ordinal day), "m" (month), or "y" (year)
  • <break time=""> - Pause for given amount of time
    • time - seconds ("123s") or milliseconds ("123ms")
  • <mark name=""> - User-defined mark (written to --mark-file or part of TextToSpeechResult)
    • name - name of mark
  • <sub alias=""> - substitute alias for inner text
  • <phoneme ph="..."> - supply phonemes for inner text
    • ph - phonemes for each word of inner text, separated by whitespace
  • <lexicon id="..."> - inline pronunciation lexicon
    • id - unique id of lexicon (used in <lookup ref="...">)
    • One or more <lexeme> child elements with:
      • <grapheme role="...">WORD</grapheme> - word text (optional [role][#word-roles])
      • <phoneme>P H O N E M E S</phoneme> - word pronunciation (phonemes separated by whitespace)
  • <lookup ref="..."> - use inline pronunciation lexicon for child elements
    • ref - id from a <lexicon id="...">

Word Roles

During phonemization, word roles are used to disambiguate pronunciations. Unless manually specified, a word's role is derived from its part of speech tag as gruut:<TAG>. For initialisms and spell-out, the role gruut:letter is used to indicate that e.g., "a" should be spoken as /eɪ/ instead of /ə/.

For en-us, the following additional roles are available from the part-of-speech tagger:

  • gruut:CD - number
  • gruut:DT - determiner
  • gruut:IN - preposition or subordinating conjunction
  • gruut:JJ - adjective
  • gruut:NN - noun
  • gruut:PRP - personal pronoun
  • gruut:RB - adverb
  • gruut:VB - verb
  • gruut:VB - verb (past tense)

Inline Lexicons

Inline pronunciation lexicons are supported via the <lexicon> and <lookup> tags. gruut diverges slightly from the SSML standard here by only allowing lexicons to be defined within the SSML document itself. Additionally, the id attribute of the <lexicon> element can be left off to indicate a "default" inline lexicon that does not require a corresponding <lookup> tag.

For example, the following document will yield three different pronunciations for the word "tomato":

<?xml version="1.0"?>
<speak version="1.1"
       xmlns="http://www.w3.org/2001/10/synthesis"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
                 http://www.w3.org/TR/speech-synthesis11/synthesis.xsd"
       xml:lang="en-US">

  <lexicon xml:id="test" alphabet="ipa">
    <lexeme>
      <grapheme>
        tomato
      </grapheme>
      <phoneme>
        <!-- Individual phonemes are separated by whitespace -->
        t ə m ˈɑ t oʊ
      </phoneme>
    </lexeme>
    <lexeme>
      <grapheme role="fake-role">
        tomato
      </grapheme>
      <phoneme>
        <!-- Made up pronunciation for fake word role -->
        t ə m ˈi t oʊ
      </phoneme>
    </lexeme>
  </lexicon>

  <w>tomato</w>
  <lookup ref="test">
    <w>tomato</w>
    <w role="fake-role">tomato</w>
  </lookup>
</speak>

The first "tomato" will be looked up in the U.S. English lexicon (/t ə m ˈeɪ t oʊ/). Within the <lookup> tag's scope, the second and third "tomato" words will be looked up in the inline lexicon. The third "tomato" word has a role attached (selecting a made up pronunciation in this case).

Even further from the SSML standard, gruut allows you to leave off the <lexicon> id entirely. With no id, a <lookup> tag is no longer needed, allowing you to override the pronunciation of any word in the document:

<?xml version="1.0"?>
<speak version="1.1"
       xmlns="http://www.w3.org/2001/10/synthesis"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
                 http://www.w3.org/TR/speech-synthesis11/synthesis.xsd"
       xml:lang="en-US">

  <!-- No id means change all words without a lookup -->
  <lexicon>
    <lexeme>
      <grapheme>
        tomato
      </grapheme>
      <phoneme>
        t ə m ˈɑ t oʊ
      </phoneme>
    </lexeme>
  </lexicon>

  <w>tomato</w>
</speak>

This will yield a pronunciation of /t ə m ˈɑ t oʊ/ for all instances of "tomato" in the document (unless they have a <lookup>).


Text to Speech Models

Vocoders

  • Hi-Fi GAN
    • Universal large (slowest)
    • VCTK "small"
    • VCTK "medium" (fastest)

Benchmarks

The following benchmarks were run on:

  • Core i7-8750H (amd64)
  • Raspberry Pi 4 (aarch64)
  • Raspberry Pi 3 (armv7l)

Multiple runs were done at each quality level, with the first run being discarded so that cache for the model files was hot.

The RTF (real-time factor) is computed as the time taken to synthesize audio divided by the duration of the synthesized audio. An RTF less than 1 indicates that audio was able to be synthesized faster than real-time.

Platform Quality RTF
amd64 high 0.25
amd64 medium 0.06
amd64 low 0.05
-------- ------- ---
aarch64 high 4.28
aarch64 medium 1.82
aarch64 low 0.56
-------- ------- ---
armv7l high 16.83
armv7l medium 7.16
armv7l low 2.22

See the benchmarking scripts in scripts/ for more details.


Architecture

Larynx breaks text to speech into 4 distinct steps:

  1. Text to IPA phonemes (gruut)
  2. Phonemes to ids (phonemes.txt file from voice)
  3. Phoneme ids to mel spectrograms (glow-tts)
  4. Mel spectrograms to waveforms (hifi-gan)

Larynx architecture

Voices are trained on phoneme ids and mel spectrograms. For each language, the voice with the most data available was used as a base model and fine-tuned.

More Repositories

1

piper

A fast, local neural text to speech system
C++
4,087
star
2

rhasspy

Offline private voice assistant for many human languages
Shell
2,278
star
3

wyoming-satellite

Remote voice satellite using Wyoming protocol
Python
379
star
4

rhasspy3

An open source voice assistant toolkit for many human languages
Python
256
star
5

gruut

A tokenizer, text cleaner, and phonemizer for many human languages.
Python
248
star
6

snowboy-seasalt

Web interface for creating snowboy personal wake words locally
JavaScript
94
star
7

piper-recording-studio

Local voice recording for creating Piper datasets
JavaScript
73
star
8

gruut-ipa

Python library for manipulating pronunciations using the International Phonetic Alphabet (IPA)
Python
69
star
9

wyoming-openwakeword

Wyoming protocol server for openWakeWord wake word detection system
Python
63
star
10

wyoming

Peer-to-peer protocol for voice assistants
Python
59
star
11

piper-phonemize

C++ library for converting text to phonemes for Piper
C++
52
star
12

hassio-addons

Add-ons for Home Assistant's Hass.IO
Dockerfile
49
star
13

wyoming-faster-whisper

Wyoming protocol server for faster whisper speech to text system
Python
46
star
14

rhasspy-silence

Silence detection in audio stream using webrtcvad
Python
45
star
15

wyoming-addons

Docker builds for Home Assistant add-ons using Wyoming protocol
Dockerfile
44
star
16

rhasspy-wake-raven

Wake word detection engine based on Snips Personal Wakeword Detector
Python
40
star
17

espeak-phonemizer

Uses ctypes and libespeak-ng to transform test into IPA phonemes
Python
19
star
18

wyoming-piper

Wyoming protocol server for Piper text to speech system
Python
19
star
19

rhasspy-hermes-app

Helper library to create voice apps for Rhasspy in Python using the Hermes protocol
Python
16
star
20

pl_deepspeech-jaco

Polish profile for Rhasspy using Jaco's DeepSpeech model
Python
15
star
21

rhasspy-asr-kaldi

Speech to text library for Rhasspy using Kaldi
Python
14
star
22

fa_kaldi-rhasspy

Persian Kaldi profile for Rhasspy built from open speech data
Shell
14
star
23

openWakeWord-cpp

C++ version of openWakeWord
C++
14
star
24

rhasspy-speakers-cli-hermes

MQTT service for Rhasspy audio output with external program using the Hermes protocol
Python
12
star
25

glow-tts-train

An implementation of GlowTTS designed to work with Gruut
Python
12
star
26

glow-speak

Neural text to speech system that uses eSpeak as a text/phoneme front-end
Python
12
star
27

rhasspy-microphone-cli-hermes

Records audio from an external program and publishes WAV chunks according to the Hermes protocol
Python
12
star
28

larynx_old

Text to speech system based on MozillaTTS and gruut
Python
12
star
29

wyoming-snowboy

Wyoming protocol server for snowboy wake word detection system
Python
11
star
30

phonetisaurus-pypi

Python wrapper for phonetisaurus grapheme to phoneme tool
Python
11
star
31

rhasspy-nlu

Natural language understanding library for Rhasspy
Python
11
star
32

wyoming-porcupine1

Wyoming protocol server for porcupine1 wake word detection system
Python
10
star
33

webrtc-noise-gain

Tiny wrapper around webrtc-audio-processing for noise suppression/auto gain only
C++
10
star
34

piper-sample-generator

Generate samples using Piper to train wake word models
Python
9
star
35

rhasspy-satellite

Collection of Rhasspy libraries for satellites only
Shell
9
star
36

tts-prompts

Phonetically balanced text to speech sentences
8
star
37

rhasspy-client

Client library for talking to remote Rhasspy server
Python
7
star
38

snowman-enroll

Custom wake word creation for snowboy using snowman
C++
7
star
39

kaldi-align

A basic forced aligner using Kaldi and gruut
Python
7
star
40

hifi-gan-train

Implementation of Hi-Fi GAN vocoder
Python
6
star
41

wyoming-whisper-cpp

Wyoming protocol server for whisper.cpp
C++
6
star
42

cs_kaldi-rhasspy

Czech Kaldi profile for Rhasspy built from open speech data
Python
6
star
43

phonemes2ids

Flexible tool for assigning integer ids to phonemes
Python
6
star
44

it_kaldi-rhasspy

Italian Kaldi profile for Rhasspy built from open speech data
Python
6
star
45

ipa2kaldi

Tool for creating Kaldi nnet3 recipes using the International Phonetic Alphabet (IPA)
Python
6
star
46

rhasspy-tts-cli-hermes

MQTT service for text to speech with external program using the Hermes protocol
Shell
6
star
47

fr_kaldi-rhasspy

French Kaldi profile for Rhasspy built from open speech data
Python
5
star
48

wyoming-snd-external

Wyoming protocol server that calls an external program to play audio
Python
5
star
49

sv_kaldi-rhasspy

Swedish Kaldi profile for Rhasspy built from open speech data
Python
5
star
50

rhasspy-microphone-pyaudio-hermes

MQTT service for audio input from PyAudio using Hermes protocol
Shell
5
star
51

piper-samples

Samples for Piper text to speech system
Python
5
star
52

wyoming-vosk

Wyoming protocol server for the vosk speech to text system
Python
5
star
53

wyoming-mic-external

Wyoming protocol server that calls an external program to get microphone input
Python
5
star
54

pysilero-vad

Mike/Projects/pysilero-vad.git
Python
5
star
55

rhasspy-asr-deepspeech-hermes

MQTT service for Rhasspy using Mozilla's DeepSpeech with the Hermes protocol
Python
5
star
56

dataset-voice-kerstin

Voice dataset for native female German speaker
5
star
57

de_larynx-thorsten

German voice for Larynx based on the thorsten dataset
4
star
58

wav2mel

Transform audio files into mel spectrograms for text-to-speech model training
Python
4
star
59

rhasspy-server-hermes

Web server interface to Rhasspy with Hermes back-end
JavaScript
4
star
60

wiktionary2dict

Tool for extracting IPA pronunciations from Wiktionary XML dump
Python
4
star
61

nl_larynx-rdh

Dutch text to speech voice for Larynx built from rdh dataset
3
star
62

es_kaldi-rhasspy

Spanish Kaldi profile for Rhasspy built from open speech data
Python
3
star
63

ru_kaldi-rhasspy

Russian Kaldi profile for Rhasspy built from open speech data
Python
3
star
64

wyoming-handle-external

Wyoming protocol server that calls an external program to handle intents
Python
3
star
65

vits-train

Training for VITS text to speech system
Python
3
star
66

es_larynx-css10

Spanish text to speech voice for Larynx built from CSS10 corpus
3
star
67

rhasspy-rasa-nlu-hermes

MQTT service for natural language understanding in Rhasspy using Rasa NLU with the Hermes protocol
Python
3
star
68

vox-check

Website for contributing voice recordings and vertifications
JavaScript
3
star
69

rhasspy-hermes

Python classes for Hermes protocol
HTML
2
star
70

rhasspy-asr-pocketsphinx

Speech to text for Rhasspy using Pocketsphinx
Python
2
star
71

speexdsp-cli

Tiny program to filter an audio stream through speex for noise suppression
C++
2
star
72

rhasspy-homeassistant-hermes

MQTT service for handling intents using Home Assistant
Python
2
star
73

bemused-client

Streaming TFLite keyword detector
Python
2
star
74

es_deepspeech-jaco

Spanish profile for Rhasspy using Jaco's DeepSpeech model
Python
2
star
75

ru_larynx-nikolaev

Russian text to speech voice for Larynx built from M-AI Labs corpus
2
star
76

en-us_larynx-kathleen

English voice for Larynx based on the kathleen dataset
1
star
77

rhasspy-web-vue

Vue-based web interface to Rhasspy
JavaScript
1
star
78

mitlm

Modified version of MIT language modeling toolkit
C++
1
star
79

rhasspy-wake-porcupine-hermes

MQTT service for wake word detection using the Hermes protocol
Python
1
star
80

rhasspy-asr-deepspeech

Rhasspy wrapper for Deepspeech ASR
Python
1
star
81

rhasspy-python-template

Template for Rhasspy repositories with Python code
1
star
82

rhasspy-asr-vosk-hermes

MQTT service for speech to text with Vosk using Hermes protocol
Python
1
star
83

ar_kaldi-rhasspy

Kaldi profile for Arabic trained from open speech data
Python
1
star
84

models

Centralized place to store model files
1
star
85

rhasspy-tag-action

Python
1
star
86

rhasspy-wake-precise-hermes

MQTT wake word service for Rhasspy with Mycroft Precise using the Hermes protocol
Python
1
star
87

rhasspy-skills

Collection of custom skills for Rhasspy
Python
1
star
88

rhasspy-wake-snowboy-hermes

MQTT service for wake word detection with snowboy using Hermes protocol
Python
1
star
89

rhasspy-remote-http-hermes

MQTT service to use remote Rhasspy server with the Hermes protocol
Python
1
star
90

rhasspy-junior

A single-file voice assistant framework
Python
1
star
91

rhasspy-profile

Python library for Rhasspy settings
Python
1
star
92

rhasspy-tts-wavenet-hermes

MQTT service for text to speech using Google's Wavenet and the Hermes protocol
Python
1
star
93

it_deepspeech-mozillaitalia

Rhasspy profile for Italian based on Mozilla Italia DeepSpeech model
Python
1
star
94

dataset-voice-flemishguy

Voice dataset for native male Dutch speaker
1
star
95

rhasspy-asr-kaldi-hermes

MQTT service for speech to text with Kaldi using Hermes protocol
Python
1
star
96

rhasspy-asr

Shared Python classes for speech to text
Python
1
star
97

rhasspy-tts-larynx-hermes

MQTT text to speech service based on Larynx using the Hermes protocol
Python
1
star
98

it_deepspeech-jaco

Italian profile for Rhasspy using the Jaco DeepSpeech model
Python
1
star
99

dataset-voice-nathalie

Voice dataset for native female Dutch speaker
1
star