• Stars
    star
    156
  • Rank 235,169 (Top 5 %)
  • Language
    C++
  • License
    MIT License
  • Created about 9 years ago
  • Updated about 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Swift wrapper around Pocketsphinx

TLSphinx

TLSphinx is a Swift wrapper around Pocketsphinx, a portable library based on CMU Sphinx, that allow an application to perform speech recognition without the audio ever leaving the device

This repository has two main parts. The first is a syntetized version of the pocketsphinx and sphinx base repositories with a module map to access the library as a Clang module. This module is accessed under the name Shpinx and has two submodules: Pocket and Base in reference to pocketsphinx and sphinx base.

The second part is TLSphinx, a Swift framework that uses the Sphinx Clang module and exposes a Swift-like API that talks to pocketsphinx.

Note: I write a blog post about TLSphinx here at the Tryolabs Blog. Check it out for a short history about why I wrote this.

Usage

The framework provides three classes:

  • Config describe the configuration needed to recognize speech.
  • Decoder is the main class that provides the API to perform all decoding.
  • Hypotesis is the result of a decode attempt. It has a text and a score properties.

Config

Represents the cmd_ln_t opaque structure in Sphinx. The default constructor takes an array of tuples with the form (param name, param value) where "param name" is the name of one of the parameters recognized by Sphinx. In this example we are passing the acustic model, the language model and the dictionary. For a complete list of recognized parameters check the Sphinx docs.

The class has a public property to turn on/off the debug info from Sphinx:

public var showDebugInfo: Bool

Decoder

Represent the ps_decoder_t opaque struct in Sphinx. The default constructor take a Config object as parameter.

This has the functions to perform the decode from a file or from the mic. The result is returned in an optional Hypotesis object, following the naming convention of the Pocketsphinx API. The functions are:

To decode speech from a file:

public func decodeSpeechAtPath (filePath: String, complete: (Hypotesis?) -> ())

The audio pointed by filePath must have the following characteristics:

  • single-channel (monaural)
  • little-endian
  • unheadered
  • 16-bit signed
  • PCM
  • sampled at 16000 Hz

To control the size of the buffer used to read the file, the Decoder class has a public property

public var bufferSize: Int

To decode a live audio stream from the mic:

public func startDecodingSpeech (utteranceComplete: (Hypotesis?) -> ())
public func stopDecodingSpeech ()

You can use the same Decoder instance many times.

Hypotesis

This struct represents the result of a decode attempt. It has a text property with the best scored text and a score with the score value. This struct implements Printable so you can print it with println(hypotesis_value).

Examples

Processing an Audio File

As an example let's see how to decode the speech in an audio file. To do so you first need to create a Config object and pass it to the Decoder constructor. With the decoder you can perform automatic speech recognition from an audio file like so:

import TLSphinx

let hmm = ...   // Path to the acustic model
let lm = ...    // Path to the languaje model
let dict = ...  // Path to the languaje dictionary

if let config = Config(args: ("-hmm", hmm), ("-lm", lm), ("-dict", dict)) {
  if let decoder = Decoder(config:config) {
      
      let audioFile = ... // Path to an audio file
      
      decoder.decodeSpeechAtPath(audioFile) {
          
          if let hyp: Hypotesis = $0 {
              // Print the decoder text and score
              println("Text: \(hyp.text) - Score: \(hyp.score)")
          } else {
              // Can't decode any speech because of an error
          }
      }
  } else {
      // Handle Decoder() fail
  }
} else {
  // Handle Config() fail  
}

The decode is performed with the decodeSpeechAtPath function in the bacground. Once the process finishes, the complete closure is called in the main thread.

Speech from the Mic

import TLSphinx

let hmm = ...   // Path to the acoustic model
let lm = ...    // Path to the language model
let dict = ...  // Path to the language dictionary

if let config = Config(args: ("-hmm", hmm), ("-lm", lm), ("-dict", dict)) {
  if let decoder = Decoder(config:config) {
      
      decoder.startDecodingSpeech {
          
          if let hyp: Hypotesis = $0 {
              println(hyp)
          } else {
              // Can't decode any speech because an error
          }
      }
  } else {
      // Handle Decoder() fail
  }
} else {
  // Handle Config() fail  
}

//At some point in the future stop listen to the mic
decoder.stopDecodingSpeech()

Installation

The easiest way to integrate TLSphinx is using Carthage or a similar method to get the framework bundle. This lets you integrate the framework and the Sphinx module without magic.

Carthage

In your Cartfile add a reference to the last version of TLSphinx:

github "Tryolabs/TLSphinx" ~> 1.0.2

Then run carthage update and follow the standar installation instructions described on the Carthage site.

You must also tell XCode where to find Sphinx module that is located in the Carthage checkout. To do so:

  • add $(SRCROOT)/Carthage/Checkouts/TLSphinx/Sphinx/include to Header Search Paths recursive
  • add $(SRCROOT)/Carthage/Checkouts/TLSphinx/Sphinx/lib to Library Search Paths recursive
  • in Swift Compiler - Search Paths add $(SRCROOT)/Carthage/Checkouts/TLSphinx/Sphinx/include to Import Paths

Manual

Download the project from this repository and drag the TLSpinx project to your XCode project. If you encounter any errors about missing headers and/or libraries for Sphinx please add the Spinx/include directory to your header search path and Sphinx/lib to the library search path and mark it as recursive.

Community

Join us on Slack!

Author

BrunoBerisso, [email protected]

License

TLSphinx is available under the MIT license. See the LICENSE file for more info.

More Repositories

1

luminoth

Deep Learning toolkit for Computer Vision.
Python
2,398
star
2

norfair

Lightweight Python library for adding real-time multi-object tracking to any detector.
Python
2,335
star
3

requestium

Integration layer between Requests and Selenium for automation of web actions.
Python
1,821
star
4

metamon

Collection of Ansible playbooks to quickly start your Django Application
Shell
340
star
5

fetch-it

An enhanced HTTP client based on fetch.
JavaScript
237
star
6

aws-workshop

Learn to deploy real applications in a scalable way, using Amazon Web Services.
Python
152
star
7

react-examples

Examples of using React
JavaScript
131
star
8

awesome-tryo

A curated list of awesome resources we use at Tryolabs
116
star
9

TLMetaResolver

TLMetaResolver is an extension to UIWebView written in Swift that adds the ability to parse the meta tags in the loaded web page
Swift
80
star
10

django-kitsune

Host server monitoring app for Django Admin. Allows to schedule checks on hosts and notify results to administrators by mail.
Python
66
star
11

taggerine

Annotation tool for images
JavaScript
64
star
12

TLAnimatedSegue

Segue for present controllers with custom animations.
Objective-C
58
star
13

daywatch

E-commerce scraping and analytics platform.
Python
53
star
14

graphql-parser

GraphQL parser for Python
Python
48
star
15

django-tastypie-extendedmodelresource

An extension for TastyPie's ModelResource, to allow features such as easily having multiple nested resources.
Python
44
star
16

stable-diffusion-dreambooth

A notebook containing code to train your own Dreambooth model using Stable Diffusion.
Jupyter Notebook
43
star
17

nginx-docker

Based on official nginx Docker image and h5bp, with templating and custom intialization script support
Shell
38
star
18

soccer-video-analytics

Demo on how to compute soccer ball possession automatically using AI.
Python
37
star
19

vierjavibot

JavaScript
30
star
20

libreQDA

JavaScript
29
star
21

TLFormView

A universal iOS form
Objective-C
25
star
22

object-detection-workshop

Learn the inners of object detection with Deep Learning by understanding Faster R-CNN model, and how to use Luminoth to solve real-world problems.
Jupyter Notebook
25
star
23

lambda-mailer

Uses AWS lambda to create a serverless endpoint for processing a contact form.
Python
24
star
24

social-media-scraper

Scrapes social media handles out of websites.
JavaScript
17
star
25

nvd3-tags

Declarative NVD3 charts
JavaScript
13
star
26

python-simple-getty

Python
8
star
27

norfair-ros

ROS package for multi-object tracking using Norfair.
Python
8
star
28

causal_inference

Measure the impact of an intervention in a time series, using different sources as references.
Python
6
star
29

jimbot

CoffeeScript
5
star
30

fashion-assistant

Jupyter Notebook
5
star
31

khipu-2023

4
star
32

norfair-ros-dev

Full ROS environment combining different nodes for object detection and tracking using Norfair.
Python
3
star
33

dvc-template

A template repository for projects using DVC
Python
3
star
34

squat-wars

Squat counter game featured at Khipu 2023 running on a Raspberry Pi 4 together with a Coral TPU
Python
2
star
35

TryoCoQA

A Conversational Question Answering dataset for Tryolabs' blog posts.
2
star
36

cookiecutter-django-docker

Python
2
star
37

temporian-examples

2
star
38

python-workshop

Code for the Python workshop on building the snake game using pygame.
Python
1
star