• Stars
    star
    101
  • Rank 338,166 (Top 7 %)
  • Language
    Ruby
  • License
    MIT License
  • Created over 13 years ago
  • Updated almost 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A ruby library for TTS & ASR document preparation

Gem Version Build Status Code Climate Coverage Status

RubySpeech

RubySpeech is a library for constructing and parsing Text to Speech (TTS) and Automatic Speech Recognition (ASR) documents such as SSML, GRXML and NLSML. Such documents can be constructed to be processed by TTS and ASR engines, parsed as the result from such, or used in the implementation of such engines.

Dependencies

pcre (except on JRuby)

On OSX with Homebrew

brew install pcre

On Ubuntu/Debian

sudo apt-get install libpcre3 libpcre3-dev

On CentOS

sudo yum install pcre-devel

Installation

gem install ruby_speech

Ruby Version Compatability

  • CRuby 2.1+
  • JRuby 9.1+

Library

SSML

RubySpeech provides a DSL for constructing SSML documents like so:

require 'ruby_speech'

speak = RubySpeech::SSML.draw do
  voice gender: :male, name: 'fred' do
    string "Hi, I'm Fred. The time is currently "
    say_as interpret_as: 'date', format: 'dmy' do
      "01/02/1960"
    end
  end
end

speak.to_s

becomes:

<speak xmlns="http://www.w3.org/2001/10/synthesis" version="1.0" xml:lang="en-US">
  <voice gender="male" name="fred">
    Hi, I'm Fred. The time is currently <say-as format="dmy" interpret-as="date">01/02/1960</say-as>
  </voice>
</speak>

Once your Speak is fully prepared and you're ready to send it off for processing, you must call to_doc on it to add the XML header:

<?xml version="1.0"?>
<speak xmlns="http://www.w3.org/2001/10/synthesis" version="1.0" xml:lang="en-US">
  <voice gender="male" name="fred">
    Hi, I'm Fred. The time is currently <say-as format="dmy" interpret-as="date">01/02/1960</say-as>
  </voice>
</speak>

You may also then need to call to_s.

GRXML

Construct a GRXML (SRGS) document like this:

require 'ruby_speech'

grammy = RubySpeech::GRXML.draw mode: :dtmf, root: 'pin' do
  rule id: 'digit' do
    one_of do
      ('0'..'9').map { |d| item { d } }
    end
  end

  rule id: 'pin', scope: 'public' do
    one_of do
      item do
        item repeat: '4' do
          ruleref uri: '#digit'
        end
        "#"
      end
      item do
        "* 9"
      end
    end
  end
end

grammy.to_s

which becomes

<grammar xmlns="http://www.w3.org/2001/06/grammar" version="1.0" xml:lang="en-US" mode="dtmf" root="pin">
  <rule id="digit">
    <one-of>
      <item>0</item>
      <item>1</item>
      <item>2</item>
      <item>3</item>
      <item>4</item>
      <item>5</item>
      <item>6</item>
      <item>7</item>
      <item>8</item>
      <item>9</item>
    </one-of>
  </rule>
  <rule id="pin" scope="public">
    <one-of>
      <item><item repeat="4"><ruleref uri="#digit"/></item>#</item>
      <item>* 9</item>
    </one-of>
  </rule>
</grammar>

Built-in grammars

There are some grammars pre-defined which are available from the RubySpeech::GRXML::Builtins module like so:

require 'ruby_speech'

RubySpeech::GRXML::Builtins.currency

which yields

<grammar xmlns="http://www.w3.org/2001/06/grammar" version="1.0" xml:lang="en-US" mode="dtmf" root="currency">
  <rule id="currency" scope="public">
    <item repeat="0-">
      <ruleref uri="#digit"/>
    </item>
    <item>*</item>
    <item repeat="2">
      <ruleref uri="#digit"/>
    </item>
  </rule>
  <rule id="digit">
    <one-of>
      <item>0</item>
      <item>1</item>
      <item>2</item>
      <item>3</item>
      <item>4</item>
      <item>5</item>
      <item>6</item>
      <item>7</item>
      <item>8</item>
      <item>9</item>
    </one-of>
  </rule>
</grammar>

These grammars come from the VoiceXML specification, and can be used as indicated there (including parameterisation). They can be used just like any you would manually create, and there's nothing special about them except that they are already defined for you. A full list of available grammars can be found in the API documentation.

These grammars are also available via URI like so:

require 'ruby_speech'

RubySpeech::GRXML.from_uri('builtin:dtmf/boolean?y=3;n=4')

Grammar matching

It is possible to match some arbitrary input against a GRXML grammar, like so:

require 'ruby_speech'

>> grammar = RubySpeech::GRXML.draw mode: :dtmf, root: 'pin' do
  rule id: 'digit' do
    one_of do
      ('0'..'9').map { |d| item { d } }
    end
  end

  rule id: 'pin', scope: 'public' do
    one_of do
      item do
        item repeat: '4' do
          ruleref uri: '#digit'
        end
        "#"
      end
      item do
        "* 9"
      end
    end
  end
end

matcher = RubySpeech::GRXML::Matcher.new grammar

>> matcher.match '*9'
=> #<RubySpeech::GRXML::Match:0x00000100ae5d98
      @mode = :dtmf,
      @confidence = 1,
      @utterance = "*9",
      @interpretation = "*9"
    >
>> matcher.match '1234#'
=> #<RubySpeech::GRXML::Match:0x00000100b7e020
      @mode = :dtmf,
      @confidence = 1,
      @utterance = "1234#",
      @interpretation = "1234#"
    >
>> matcher.match '5678#'
=> #<RubySpeech::GRXML::Match:0x00000101218688
      @mode = :dtmf,
      @confidence = 1,
      @utterance = "5678#",
      @interpretation = "5678#"
    >
>> matcher.match '1111#'
=> #<RubySpeech::GRXML::Match:0x000001012f69d8
      @mode = :dtmf,
      @confidence = 1,
      @utterance = "1111#",
      @interpretation = "1111#"
    >
>> matcher.match '111'
=> #<RubySpeech::GRXML::NoMatch:0x00000101371660>

NLSML

Natural Language Semantics Markup Language is the format used by many Speech Recognition engines and natural language processors to add semantic information to human language. RubySpeech is capable of generating and parsing such documents.

It is possible to generate an NLSML document like so:

require 'ruby_speech'

nlsml = RubySpeech::NLSML.draw grammar: 'http://flight' do
  interpretation confidence: 0.6 do
    input "I want to go to Pittsburgh", mode: :voice

    instance do
      airline do
        to_city 'Pittsburgh'
      end
    end
  end

  interpretation confidence: 0.4 do
    input "I want to go to Stockholm"

    instance do
      airline do
        to_city "Stockholm"
      end
    end
  end
end

nlsml.to_s

becomes:

<?xml version="1.0"?>
<result xmlns="http://www.ietf.org/xml/ns/mrcpv2" grammar="http://flight">
  <interpretation confidence="0.6">
    <input mode="voice">I want to go to Pittsburgh</input>
    <instance>
      <airline>
        <to_city>Pittsburgh</to_city>
      </airline>
    </instance>
  </interpretation>
  <interpretation confidence="0.4">
    <input>I want to go to Stockholm</input>
    <instance>
      <airline>
        <to_city>Stockholm</to_city>
      </airline>
    </instance>
  </interpretation>
</result>

It's also possible to parse an NLSML document and extract useful information from it. Taking the above example, one may do:

document = RubySpeech.parse nlsml.to_s

document.match? # => true
document.interpretations # => [
      {
        confidence: 0.6,
        input: { mode: :voice, content: 'I want to go to Pittsburgh' },
        instance: { airline: { to_city: 'Pittsburgh' } }
      },
      {
        confidence: 0.4,
        input: { content: 'I want to go to Stockholm' },
        instance: { airline: { to_city: 'Stockholm' } }
      }
    ]
document.best_interpretation # => {
          confidence: 0.6,
          input: { mode: :voice, content: 'I want to go to Pittsburgh' },
          instance: { airline: { to_city: 'Pittsburgh' } }
        }

Check out the YARD documentation for more

Features:

SSML

  • Document construction
  • <voice/>
  • <prosody/>
  • <emphasis/>
  • <say-as/>
  • <break/>
  • <audio/>
  • <p/> and <s/>
  • <phoneme/>
  • <sub/>

Misc

  • <mark/>
  • <desc/>

GRXML

  • Document construction
  • <item/>
  • <one-of/>
  • <rule/>
  • <ruleref/>
  • <tag/>
  • <token/>

NLSML

  • Document construction
  • Simple data extraction from documents

TODO:

SSML

  • <lexicon/>
  • <meta/> and <metadata/>

GRXML

  • <meta/> and <metadata/>
  • <example/>
  • <lexicon/>

Links:

Note on Patches/Pull Requests

  • Fork the project.
  • Make your feature addition or bug fix.
  • Add tests for it. This is important so I don't break it in a future version unintentionally.
  • Commit, do not mess with rakefile, version, or history.
    • If you want to have your own version, that is fine but bump version in a commit by itself so I can ignore when I pull
  • Send me a pull request. Bonus points for topic branches.

Copyright

Copyright (c) 2013 Ben Langfeld. MIT licence (see LICENSE for details).

More Repositories

1

adhearsion

A Ruby framework for building telephony applications
Ruby
608
star
2

blather

XMPP/Jabber Library and DSL for Ruby written on EventMachine and Nokogiri.
Ruby
557
star
3

Telephony-Dev-Box

Development environments for supported Adhearsion telephony engines using Vagrant
Ruby
55
star
4

punchblock

Telephony middleware library for Ruby
Ruby
40
star
5

restful_clicktocall

An example Adhearsion component performing a Click to Call via the Adhearsion RESTful API
JavaScript
24
star
6

ruby_ami

Ruby
22
star
7

att_speech

A Ruby library for consuming the AT&T Speech API for speech to text.
Ruby
20
star
8

SIPtreadmill

Web application to faciliate benchmarking and testing SIP based services
CSS
20
star
9

adhearsion-asterisk

Asterisk specific features for Adhearsion
Ruby
17
star
10

restful_adhearsion

Ruby library for consuming the Adhearsion RESTful RPC API
Ruby
14
star
11

ruby_fs

Ruby
12
star
12

adhearsion-drb

adhearsion-drb is an Adhearsion Plugin providing DRb connectivity. It allows third party ruby clients to connect to an Adhearsion instance for RPC.
Ruby
10
star
13

voicemail

Adhearsion voicemail plugin
Ruby
8
star
14

has-guarded-handlers

Add event handlers to your Ruby objects, and guard them against unnecessary invokation
Ruby
8
star
15

whats-up-adhearsion

A Gem for monitoring a Adhearsion instance
Ruby
7
star
16

virginia

A Reel interface to Adhearsion
Ruby
7
star
17

electric_slide

Adhearsion plugin implementing generic call queueing algorithms
Ruby
6
star
18

adhearsion-rails

Ruby
6
star
19

adhearsion-activerecord

DEPRECATED. Advice is not to use ActiveRecord in Adhearsion apps. See https://github.com/adhearsion/adhearsion-activerecord/issues/2 and similar.
Ruby
6
star
20

adhearsion-asr

Adds speech recognition support to Adhearsion as a plugin.
Ruby
6
star
21

mojo-auth.ex

MojoAuth implementation in Elixir
Elixir
5
star
22

future-resource

Wait on resources being set in the future
Ruby
5
star
23

activerecord-wrap-with-connection

Ruby
5
star
24

matrioska

In-app calls on Adhearsion
Ruby
5
star
25

adhearsion-ivr

IVR menu builder steps for Adhearsion
Ruby
5
star
26

adhearsion_freeswitch

Platform-specific integration between Adhearison and FreeSWITCH
Ruby
3
star
27

ahn-restful-rpc

Ruby
3
star
28

adhearsion-xmpp

Ruby
3
star
29

ahn_load_test

Ruby
3
star
30

fs-min-config

3
star
31

ahn_test_component

Adhearsion example Gem-based component
Ruby
2
star
32

adhearsion-ims

Adhearsion plugin for integration with IP Multimedia Subsystems (IMS) using Rayo
Ruby
2
star
33

ahncloud

Adhearsion Cloud
Ruby
2
star
34

adhearsion-ldap

Ruby
2
star
35

mojo-auth

MojoAuth is a set of standard approaches to cross-app authentication based on HMAC
1
star
36

mojo-auth.rb

Implementation of MojoAuth in Ruby
Ruby
1
star
37

adhearsion-reporter

Ruby
1
star
38

ahn_logviz

Adhearsion log parsing, visualization and storage
Ruby
1
star
39

ahn-simon-game

Ruby
1
star
40

cfahn

Ruby
1
star
41

asterisk-rayo

1
star
42

adhearsion-tropo-conference

1
star
43

ahnhub

Adhearsion component library
JavaScript
1
star
44

ahn-components

Ruby
1
star
45

adhearsion-i18n

Internationalization for Adhearsion apps
Ruby
1
star
46

plugin-demo

Ruby
1
star
47

ruby_jid

A Ruby representation of an XMPP JID
Ruby
1
star
48

ahn-stomp

Ruby
1
star
49

cspeech

Speech document (SSML, SRGS, NLSML) modelling and matching for C
C
1
star
50

punchblock-console

An interactive debugging console for Punchblock
Ruby
1
star