• Stars
    star
    248
  • Rank 160,222 (Top 4 %)
  • Language
    Python
  • License
    MIT License
  • Created over 3 years ago
  • Updated 5 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A tokenizer, text cleaner, and phonemizer for many human languages.

Gruut

A tokenizer, text cleaner, and IPA phonemizer for several human languages that supports SSML.

from gruut import sentences

text = 'He wound it around the wound, saying "I read it was $10 to read."'

for sent in sentences(text, lang="en-us"):
    for word in sent:
        if word.phonemes:
            print(word.text, *word.phonemes)

which outputs:

He h ˈi
wound w ˈaʊ n d
it ˈɪ t
around ɚ ˈaʊ n d
the ð ə
wound w ˈu n d
, |
saying s ˈeɪ ɪ ŋ
I ˈaɪ
read ɹ ˈɛ d
it ˈɪ t
was w É™ z
ten t ˈɛ n
dollars d ˈɑ l ɚ z
to t É™
read ɹ ˈi d
. ‖

Note that "wound" and "read" have different pronunciations when used in different (grammatical) contexts.

A subset of SSML is also supported:

from gruut import sentences

ssml_text = """<?xml version="1.0" encoding="ISO-8859-1"?>
<speak version="1.1" xmlns="http://www.w3.org/2001/10/synthesis"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
                http://www.w3.org/TR/speech-synthesis11/synthesis.xsd"
    xml:lang="en-US">
<s>Today at 4pm, 2/1/2000.</s>
<s xml:lang="it">Un mese fà, 2/1/2000.</s>
</speak>"""

for sent in sentences(ssml_text, ssml=True):
    for word in sent:
        if word.phonemes:
            print(sent.idx, word.lang, word.text, *word.phonemes)

with the output:

0 en-US Today t ə d ˈeɪ
0 en-US at ˈæ t
0 en-US four f ˈɔ ɹ
0 en-US P p ˈi
0 en-US M ˈɛ m
0 en-US , |
0 en-US February f ˈɛ b j u ˌɛ ɹ i
0 en-US first f ˈɚ s t
0 en-US , |
0 en-US two t ˈu
0 en-US thousand θ ˈaʊ z ə n d
0 en-US . ‖
1 it Un u n
1 it mese ˈm e s e
1 it fà f a
1 it , |
1 it due d j u
1 it gennaio d͡ʒ e n n ˈa j o
1 it duemila d u e ˈm i l a
1 it . ‖

See the documentation for more details.

Installation

pip install gruut

Languages besides English can be added during installation. For example, with French and Italian support:

pip install -f 'https://synesthesiam.github.io/prebuilt-apps/' gruut[fr,it]

The extra pip repo is needed for an updated num2words fork that includes support for more languages.

You may also manually download language files and use put them in $XDG_CONFIG_HOME/gruut/ ($HOME/.config/gruut by default).

gruut will look for language files in the directory $XDG_CONFIG_HOME/gruut/<lang>/ if the corresponding Python package is not installed. Note that <lang> here is the full language name, e.g. de-de instead of just de.

Supported Languages

gruut currently supports:

  • Arabic (ar)
  • Czech (cs or cs-cz)
  • German (de or de-de)
  • English (en or en-us)
  • Spanish (es or es-es)
  • Farsi/Persian (fa)
  • French (fr or fr-fr)
  • Italian (it or it-it)
  • Luxembourgish (lb)
  • Dutch (nl)
  • Russian (ru or ru-ru)
  • Swedish (sv or sv-se)
  • Swahili (sw)

The goal is to support all of voice2json's languages

Dependencies

  • Python 3.7 or higher
  • Linux
    • Tested on Debian Bullseye
  • num2words fork and Babel
    • Currency/number handling
    • num2words fork includes additional language support (Arabic, Farsi, Swedish, Swahili)
  • gruut-ipa
    • IPA pronunciation manipulation
  • pycrfsuite
    • Part of speech tagging and grapheme to phoneme models
  • pydateparser
    • Date parsing for multiple languages

Numbers, Dates, and More

gruut can automatically verbalize numbers, dates, and other expressions. This is done in a locale-aware manner for both parsing and verbalization, so "1/1/2020" may be interpreted as "M/D/Y" or "D/M/Y" depending on the word or sentence's language (e.g., <s lang="...">).

The following types of expressions can be automatically expanded into words by gruut:

  • Numbers - "123" to "one hundred and twenty three" (disable with verbalize_numbers=False or --no-numbers)
    • Relies on Babel for parsing and num2words for verbalization
  • Dates - "1/1/2020" to "January first, twenty twenty" (disable with verbalize_dates=False or --no-dates)
    • Relies on pydateparser for parsing and both Babel and num2words for verbalization
  • Currency - "$10" to "ten dollars" (disable with verbalize_currency=False or --no-currency)
    • Relies on Babel for parsing and both Babel and num2words for verbalization
  • Times - "12:01am" to "twelve oh one A M" (disable with verbalize_times=False or --no-times)
    • English only
    • Relies on num2words for verbalization

Command-Line Usage

The gruut module can be executed with python3 -m gruut --language <LANGUAGE> <TEXT> or with the gruut command (from setup.py).

The gruut command is line-oriented, consuming text and producing JSONL. You will probably want to install jq to manipulate the JSONL output from gruut.

Plain Text

Takes raw text and outputs JSONL with cleaned words/tokens.

echo 'This, right here, is some "RAW" text!' \
   | gruut --language en-us \
   | jq --raw-output '.words[].text'
This
,
right
here
,
is
some
"
RAW
"
text
!

More information is available in the full JSON output:

gruut --language en-us 'More  text.' | jq .

Output:

{
  "idx": 0,
  "text": "More text.",
  "text_with_ws": "More text.",
  "text_spoken": "More text",
  "par_idx": 0,
  "lang": "en-us",
  "voice": "",
  "words": [
    {
      "idx": 0,
      "text": "More",
      "text_with_ws": "More ",
      "leading_ws": "",
      "training_ws": " ",
      "sent_idx": 0,
      "par_idx": 0,
      "lang": "en-us",
      "voice": "",
      "pos": "JJR",
      "phonemes": [
        "m",
        "ˈɔ",
        "ɹ"
      ],
      "is_major_break": false,
      "is_minor_break": false,
      "is_punctuation": false,
      "is_break": false,
      "is_spoken": true,
      "pause_before_ms": 0,
      "pause_after_ms": 0
    },
    {
      "idx": 1,
      "text": "text",
      "text_with_ws": "text",
      "leading_ws": "",
      "training_ws": "",
      "sent_idx": 0,
      "par_idx": 0,
      "lang": "en-us",
      "voice": "",
      "pos": "NN",
      "phonemes": [
        "t",
        "ˈɛ",
        "k",
        "s",
        "t"
      ],
      "is_major_break": false,
      "is_minor_break": false,
      "is_punctuation": false,
      "is_break": false,
      "is_spoken": true,
      "pause_before_ms": 0,
      "pause_after_ms": 0
    },
    {
      "idx": 2,
      "text": ".",
      "text_with_ws": ".",
      "leading_ws": "",
      "training_ws": "",
      "sent_idx": 0,
      "par_idx": 0,
      "lang": "en-us",
      "voice": "",
      "pos": null,
      "phonemes": [
        "‖"
      ],
      "is_major_break": true,
      "is_minor_break": false,
      "is_punctuation": false,
      "is_break": true,
      "is_spoken": false,
      "pause_before_ms": 0,
      "pause_after_ms": 0
    }
  ],
  "pause_before_ms": 0,
  "pause_after_ms": 0
}

For the whole input line and each word, the text property contains the processed input text with normalized whitespace while text_with_ws retains the original whitespace. The text_spoken property only contains words that are spoken, so punctuation and breaks are excluded.

Within each word, there is:

  • idx - zero-based index of the word in the sentence
  • sent_idx - zero-based index of the sentence in the input text
  • pos - part of speech tag (if available)
  • phonemes - list of IPA phonemes for the word (if available)
  • is_minor_break - true if "word" separates phrases (comma, semicolon, etc.)
  • is_major_break - true if "word" separates sentences (period, question mark, etc.)
  • is_break - true if "word" is a major or minor break
  • is_punctuation - true if "word" is a surrounding punctuation mark (quote, bracket, etc.)
  • is_spoken - true if not a break or punctuation

See python3 -m gruut <LANGUAGE> --help for more options.

SSML

A subset of SSML is supported:

  • <speak> - wrap around SSML text
    • lang - set language for document
  • <p> - paragraph
    • lang - set language for paragraph
  • <s> - sentence (disables automatic sentence breaking)
    • lang - set language for sentence
  • <w> / <token> - word (disables automatic tokenization)
    • lang - set language for word
    • role - set word role (see word roles)
  • <lang lang="..."> - set language inner text
  • <voice name="..."> - set voice of inner text
  • <say-as interpret-as=""> - force interpretation of inner text
    • interpret-as one of "spell-out", "date", "number", "time", or "currency"
    • format - way to format text depending on interpret-as
      • number - one of "cardinal", "ordinal", "digits", "year"
      • date - string with "d" (cardinal day), "o" (ordinal day), "m" (month), or "y" (year)
  • <break time=""> - Pause for given amount of time
    • time - seconds ("123s") or milliseconds ("123ms")
  • <mark name=""> - User-defined mark (marks_before and marks_after attributes of words/sentences)
    • name - name of mark
  • <sub alias=""> - substitute alias for inner text
  • <phoneme ph="..."> - supply phonemes for inner text
    • ph - phonemes for each word of inner text, separated by whitespace
  • <lexicon id="..."> - inline or external pronunciation lexicon
    • id - unique id of lexicon (used in <lookup ref="...">)
    • uri - if empty or missing, lexicon is inline
    • One or more <lexeme> child elements with:
      • Optional role="..." ([word roles][#word-roles] separated by whitespace)
      • <grapheme>WORD</grapheme> - word text
      • <phoneme>P H O N E M E S</phoneme> - word pronunciation (phonemes separated by whitespace)
  • <lookup ref="..."> - use pronunciation lexicon for child elements
    • ref - id from a <lexicon id="...">

Word Roles

During phonemization, word roles are used to disambiguate pronunciations. Unless manually specified, a word's role is derived from its part of speech tag as gruut:<TAG>. For initialisms and spell-out, the role gruut:letter is used to indicate that e.g., "a" should be spoken as /eɪ/ instead of /ə/.

For en-us, the following additional roles are available from the part-of-speech tagger:

  • gruut:CD - number
  • gruut:DT - determiner
  • gruut:IN - preposition or subordinating conjunction
  • gruut:JJ - adjective
  • gruut:NN - noun
  • gruut:PRP - personal pronoun
  • gruut:RB - adverb
  • gruut:VB - verb
  • gruut:VB - verb (past tense)

Inline Lexicons

Inline pronunciation lexicons are supported via the <lexicon> and <lookup> tags. gruut diverges slightly from the SSML standard here by allowing lexicons to be defined within the SSML document itself (url is blank or missing). Additionally, the id attribute of the <lexicon> element can be left off to indicate a "default" inline lexicon that does not require a corresponding <lookup> tag.

For example, the following document will yield three different pronunciations for the word "tomato":

<?xml version="1.0"?>
<speak version="1.1"
       xmlns="http://www.w3.org/2001/10/synthesis"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
                 http://www.w3.org/TR/speech-synthesis11/synthesis.xsd"
       xml:lang="en-US">

  <lexicon xml:id="test" alphabet="ipa">
    <lexeme>
      <grapheme>
        tomato
      </grapheme>
      <phoneme>
        <!-- Individual phonemes are separated by whitespace -->
        t ə m ˈɑ t oʊ
      </phoneme>
    </lexeme>
    <lexeme>
      <grapheme role="fake-role">
        tomato
      </grapheme>
      <phoneme>
        <!-- Made up pronunciation for fake word role -->
        t ə m ˈi t oʊ
      </phoneme>
    </lexeme>
  </lexicon>

  <w>tomato</w>
  <lookup ref="test">
    <w>tomato</w>
    <w role="fake-role">tomato</w>
  </lookup>
</speak>

The first "tomato" will be looked up in the U.S. English lexicon (/t ə m ˈeɪ t oʊ/). Within the <lookup> tag's scope, the second and third "tomato" words will be looked up in the inline lexicon. The third "tomato" word has a role attached (selecting a made up pronunciation in this case).

Even further from the SSML standard, gruut allows you to leave off the <lexicon> id entirely. With no id, a <lookup> tag is no longer needed, allowing you to override the pronunciation of any word in the document:

<?xml version="1.0"?>
<speak version="1.1"
       xmlns="http://www.w3.org/2001/10/synthesis"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
                 http://www.w3.org/TR/speech-synthesis11/synthesis.xsd"
       xml:lang="en-US">

  <!-- No id means change all words without a lookup -->
  <lexicon>
    <lexeme>
      <grapheme>
        tomato
      </grapheme>
      <phoneme>
        t ə m ˈɑ t oʊ
      </phoneme>
    </lexeme>
  </lexicon>

  <w>tomato</w>
</speak>

This will yield a pronunciation of /t ə m ˈɑ t oʊ/ for all instances of "tomato" in the document (unless they have a <lookup>).

Intended Audience

gruut is useful for transforming raw text into phonetic pronunciations, similar to phonemizer. Unlike phonemizer, gruut looks up words in a pre-built lexicon (pronunciation dictionary) or guesses word pronunciations with a pre-trained grapheme-to-phoneme model. Phonemes for each language come from a carefully chosen inventory.

For each supported language, gruut includes a:

  • A word pronunciation lexicon built from open source data
  • A pre-trained grapheme-to-phoneme model for guessing word pronunciations

Some languages also include:

More Repositories

1

piper

A fast, local neural text to speech system
C++
4,087
star
2

rhasspy

Offline private voice assistant for many human languages
Shell
2,278
star
3

larynx

End to end text to speech system using gruut and onnx
Python
821
star
4

wyoming-satellite

Remote voice satellite using Wyoming protocol
Python
379
star
5

rhasspy3

An open source voice assistant toolkit for many human languages
Python
256
star
6

snowboy-seasalt

Web interface for creating snowboy personal wake words locally
JavaScript
94
star
7

piper-recording-studio

Local voice recording for creating Piper datasets
JavaScript
73
star
8

gruut-ipa

Python library for manipulating pronunciations using the International Phonetic Alphabet (IPA)
Python
69
star
9

wyoming-openwakeword

Wyoming protocol server for openWakeWord wake word detection system
Python
63
star
10

wyoming

Peer-to-peer protocol for voice assistants
Python
59
star
11

piper-phonemize

C++ library for converting text to phonemes for Piper
C++
52
star
12

hassio-addons

Add-ons for Home Assistant's Hass.IO
Dockerfile
49
star
13

wyoming-faster-whisper

Wyoming protocol server for faster whisper speech to text system
Python
46
star
14

rhasspy-silence

Silence detection in audio stream using webrtcvad
Python
45
star
15

wyoming-addons

Docker builds for Home Assistant add-ons using Wyoming protocol
Dockerfile
44
star
16

rhasspy-wake-raven

Wake word detection engine based on Snips Personal Wakeword Detector
Python
40
star
17

espeak-phonemizer

Uses ctypes and libespeak-ng to transform test into IPA phonemes
Python
19
star
18

wyoming-piper

Wyoming protocol server for Piper text to speech system
Python
19
star
19

rhasspy-hermes-app

Helper library to create voice apps for Rhasspy in Python using the Hermes protocol
Python
16
star
20

pl_deepspeech-jaco

Polish profile for Rhasspy using Jaco's DeepSpeech model
Python
15
star
21

rhasspy-asr-kaldi

Speech to text library for Rhasspy using Kaldi
Python
14
star
22

fa_kaldi-rhasspy

Persian Kaldi profile for Rhasspy built from open speech data
Shell
14
star
23

openWakeWord-cpp

C++ version of openWakeWord
C++
14
star
24

rhasspy-speakers-cli-hermes

MQTT service for Rhasspy audio output with external program using the Hermes protocol
Python
12
star
25

glow-tts-train

An implementation of GlowTTS designed to work with Gruut
Python
12
star
26

glow-speak

Neural text to speech system that uses eSpeak as a text/phoneme front-end
Python
12
star
27

rhasspy-microphone-cli-hermes

Records audio from an external program and publishes WAV chunks according to the Hermes protocol
Python
12
star
28

larynx_old

Text to speech system based on MozillaTTS and gruut
Python
12
star
29

wyoming-snowboy

Wyoming protocol server for snowboy wake word detection system
Python
11
star
30

phonetisaurus-pypi

Python wrapper for phonetisaurus grapheme to phoneme tool
Python
11
star
31

rhasspy-nlu

Natural language understanding library for Rhasspy
Python
11
star
32

wyoming-porcupine1

Wyoming protocol server for porcupine1 wake word detection system
Python
10
star
33

webrtc-noise-gain

Tiny wrapper around webrtc-audio-processing for noise suppression/auto gain only
C++
10
star
34

piper-sample-generator

Generate samples using Piper to train wake word models
Python
9
star
35

rhasspy-satellite

Collection of Rhasspy libraries for satellites only
Shell
9
star
36

tts-prompts

Phonetically balanced text to speech sentences
8
star
37

rhasspy-client

Client library for talking to remote Rhasspy server
Python
7
star
38

snowman-enroll

Custom wake word creation for snowboy using snowman
C++
7
star
39

kaldi-align

A basic forced aligner using Kaldi and gruut
Python
7
star
40

hifi-gan-train

Implementation of Hi-Fi GAN vocoder
Python
6
star
41

wyoming-whisper-cpp

Wyoming protocol server for whisper.cpp
C++
6
star
42

cs_kaldi-rhasspy

Czech Kaldi profile for Rhasspy built from open speech data
Python
6
star
43

phonemes2ids

Flexible tool for assigning integer ids to phonemes
Python
6
star
44

it_kaldi-rhasspy

Italian Kaldi profile for Rhasspy built from open speech data
Python
6
star
45

ipa2kaldi

Tool for creating Kaldi nnet3 recipes using the International Phonetic Alphabet (IPA)
Python
6
star
46

rhasspy-tts-cli-hermes

MQTT service for text to speech with external program using the Hermes protocol
Shell
6
star
47

fr_kaldi-rhasspy

French Kaldi profile for Rhasspy built from open speech data
Python
5
star
48

wyoming-snd-external

Wyoming protocol server that calls an external program to play audio
Python
5
star
49

sv_kaldi-rhasspy

Swedish Kaldi profile for Rhasspy built from open speech data
Python
5
star
50

rhasspy-microphone-pyaudio-hermes

MQTT service for audio input from PyAudio using Hermes protocol
Shell
5
star
51

piper-samples

Samples for Piper text to speech system
Python
5
star
52

wyoming-vosk

Wyoming protocol server for the vosk speech to text system
Python
5
star
53

wyoming-mic-external

Wyoming protocol server that calls an external program to get microphone input
Python
5
star
54

pysilero-vad

Mike/Projects/pysilero-vad.git
Python
5
star
55

rhasspy-asr-deepspeech-hermes

MQTT service for Rhasspy using Mozilla's DeepSpeech with the Hermes protocol
Python
5
star
56

dataset-voice-kerstin

Voice dataset for native female German speaker
5
star
57

de_larynx-thorsten

German voice for Larynx based on the thorsten dataset
4
star
58

wav2mel

Transform audio files into mel spectrograms for text-to-speech model training
Python
4
star
59

rhasspy-server-hermes

Web server interface to Rhasspy with Hermes back-end
JavaScript
4
star
60

wiktionary2dict

Tool for extracting IPA pronunciations from Wiktionary XML dump
Python
4
star
61

nl_larynx-rdh

Dutch text to speech voice for Larynx built from rdh dataset
3
star
62

es_kaldi-rhasspy

Spanish Kaldi profile for Rhasspy built from open speech data
Python
3
star
63

ru_kaldi-rhasspy

Russian Kaldi profile for Rhasspy built from open speech data
Python
3
star
64

wyoming-handle-external

Wyoming protocol server that calls an external program to handle intents
Python
3
star
65

vits-train

Training for VITS text to speech system
Python
3
star
66

es_larynx-css10

Spanish text to speech voice for Larynx built from CSS10 corpus
3
star
67

rhasspy-rasa-nlu-hermes

MQTT service for natural language understanding in Rhasspy using Rasa NLU with the Hermes protocol
Python
3
star
68

vox-check

Website for contributing voice recordings and vertifications
JavaScript
3
star
69

rhasspy-hermes

Python classes for Hermes protocol
HTML
2
star
70

rhasspy-asr-pocketsphinx

Speech to text for Rhasspy using Pocketsphinx
Python
2
star
71

speexdsp-cli

Tiny program to filter an audio stream through speex for noise suppression
C++
2
star
72

rhasspy-homeassistant-hermes

MQTT service for handling intents using Home Assistant
Python
2
star
73

bemused-client

Streaming TFLite keyword detector
Python
2
star
74

es_deepspeech-jaco

Spanish profile for Rhasspy using Jaco's DeepSpeech model
Python
2
star
75

ru_larynx-nikolaev

Russian text to speech voice for Larynx built from M-AI Labs corpus
2
star
76

en-us_larynx-kathleen

English voice for Larynx based on the kathleen dataset
1
star
77

rhasspy-web-vue

Vue-based web interface to Rhasspy
JavaScript
1
star
78

mitlm

Modified version of MIT language modeling toolkit
C++
1
star
79

rhasspy-wake-porcupine-hermes

MQTT service for wake word detection using the Hermes protocol
Python
1
star
80

rhasspy-asr-deepspeech

Rhasspy wrapper for Deepspeech ASR
Python
1
star
81

rhasspy-python-template

Template for Rhasspy repositories with Python code
1
star
82

rhasspy-asr-vosk-hermes

MQTT service for speech to text with Vosk using Hermes protocol
Python
1
star
83

ar_kaldi-rhasspy

Kaldi profile for Arabic trained from open speech data
Python
1
star
84

models

Centralized place to store model files
1
star
85

rhasspy-tag-action

Python
1
star
86

rhasspy-wake-precise-hermes

MQTT wake word service for Rhasspy with Mycroft Precise using the Hermes protocol
Python
1
star
87

rhasspy-skills

Collection of custom skills for Rhasspy
Python
1
star
88

rhasspy-wake-snowboy-hermes

MQTT service for wake word detection with snowboy using Hermes protocol
Python
1
star
89

rhasspy-remote-http-hermes

MQTT service to use remote Rhasspy server with the Hermes protocol
Python
1
star
90

rhasspy-junior

A single-file voice assistant framework
Python
1
star
91

rhasspy-profile

Python library for Rhasspy settings
Python
1
star
92

rhasspy-tts-wavenet-hermes

MQTT service for text to speech using Google's Wavenet and the Hermes protocol
Python
1
star
93

it_deepspeech-mozillaitalia

Rhasspy profile for Italian based on Mozilla Italia DeepSpeech model
Python
1
star
94

dataset-voice-flemishguy

Voice dataset for native male Dutch speaker
1
star
95

rhasspy-asr-kaldi-hermes

MQTT service for speech to text with Kaldi using Hermes protocol
Python
1
star
96

rhasspy-asr

Shared Python classes for speech to text
Python
1
star
97

rhasspy-tts-larynx-hermes

MQTT text to speech service based on Larynx using the Hermes protocol
Python
1
star
98

it_deepspeech-jaco

Italian profile for Rhasspy using the Jaco DeepSpeech model
Python
1
star
99

dataset-voice-nathalie

Voice dataset for native female Dutch speaker
1
star