• Stars
    star
    424
  • Rank 101,700 (Top 3 %)
  • Language
    C
  • License
    Mozilla Public Li...
  • Created over 2 years ago
  • Updated about 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Speech recognition tool to convert audio to text transcripts, for Linux and Raspberry Pi.

Spchcat

Speech recognition tool to convert audio to text transcripts, for Linux and Raspberry Pi.

Description

spchcat is a command-line tool that reads in audio from .WAV files, a microphone, or system audio inputs and converts any speech found into text. It runs locally on your machine, with no web API calls or network activity, and is open source. It is built on top of Coqui's speech to text library, TensorFlow, KenLM, and data from Mozilla's Common Voice project.

It supports multiple languages thanks to Coqui's library of models. The accuracy of the recognized text will vary widely depending on the language, since some have only small amounts of training data. You can help improve future models by contributing your voice.

Installation

x86

On Debian-based x86 Linux systems like Ubuntu you should be able to install the latest .deb package by downloading and double-clicking it. Other distributions are currently unsupported. The tool requires PulseAudio, which is already present on most desktop systems, but can be installed manually.

There's a notebook you can run in Colab at notebooks/install.ipynb that shows all installation steps.

Raspberry Pi

To install on a Raspberry Pi, download the latest .deb installer package and either double-click on it from the desktop, or run dpkg -i ~/Downloads/spchcat_0.0-2_armhf.deb from the terminal. It will take several minutes to unpack all the language files. This version has only been tested on the latest release of Raspbian, released October 30th 2021, and on a Raspberry Pi 4. It's expected to fail on Raspberry Pi 1's and 0's, due to their CPU architecture.

Usage

After installation, you should be able to run it with no arguments to start capturing audio from the default microphone source, with the results output to the terminal:

spchcat

After you've run the command, start speaking, and you should see the words you're saying appear. The speech recognition is still a work in progress, and the accuracy will depend a lot on the noise levels, your accent, and the complexity of the words, but hopefully you should see something close enough to be useful for simple note taking or other purposes.

System Audio

If you don't have a microphone attached, or want to transcribe audio coming from another program, you can set the --source argument to 'system'. This will attempt to listen to the audio that your machine is playing, including any videos or songs, and transcribe any speech found.

spchcat --source=system

WAV Files

One of the most common audio file formats is WAV. If you don't have any to test with, you can download Coqui's test set to try this option out. If you need to convert files from another format like '.mp3', I recommend using FFMPeg. As with the other source options, spchcat will attempt to find any speech in the files and convert it into a transcript. You don't have to explicitly set the --source argument, as long as file names are present on the command line that will be the default.

spchcat audio/8455-210777-0068.wav 

If you're using the audio file from the test set, you should see output like the following:

TensorFlow: v2.3.0-14-g4bdd3955115
 Coqui STT: v1.1.0-0-gf3605e23
your power is sufficient i said 

You can also specify a folder instead of a single filename, and all .wav files within that directory will be transcribed.

Language Support

So far this documentation has assumed you're using American English, but the tool will default to looking for the language your system has been configured to use. It first looks for the one specified in the LANG environment variable. If no model for that language is found, it will default back to 'en_US'. You can override this by setting the --language argument on the command line, for example:

spchcat --language=de_DE

This works independently of --source and other options, so you can transcribe microphone, system audio, or files in any of the supported languages. It should be noted that some languages have very small amounts of data and so their quality may suffer. If you don't care about country-specific variants, you can also just specify the language part of the code, for example --language=en. This will pick any model that supports the language, regardless of country. The same thing happens if a particular language and country pair isn't found, it will log a warning and fall back to any country that supports the language. For example, if 'en_GB' is specified but only 'en_US' is present, 'en_US' will be used.

Language Name Code
am_ET Amharic
bn_IN Bengali
br_FR Breton
ca_ES Catalan
cnh_MM Hakha-Chin
cs_CZ Czech
cv_RU Chuvash
cy_GB Welsh
de_DE German
dv_MV Dhivehi
el_GR Greek
en_US English
et_EE Estonian
eu_ES Basque
fi_FI Finnish
fr_FR French
fy_NL Frisian
ga_IE Irish
hu_HU Hungarian
id_ID Indonesian
it_IT Italian
ka_GE Georgian
ky_KG Kyrgyz
lg_UG Luganda
lt_LT Lithuanian
lv_LV Latvian
mn_MN Mongolian
mt_MT Maltese
nl_NL Dutch
or_IN Odia
pt_PT Portuguese
rm_CH Romansh-Sursilvan
ro_RO Romanian
ru_RU Russian
rw_RW Kinyarwanda
sah_RU Sakha
sb_DE Upper-Sorbian
sl_SI Slovenian
sw_KE Swahili-Congo
ta_IN Tamil
th_TH Thai
tr_TR Turkish
tt_RU Tatar
uk_UK Ukrainian
wo_SN Wolof
yo_NG Yoruba

All of these models have been collected by Coqui, and contributed by organizations like Inclusive Technology for Marginalized Languages or individuals. All are using the conventions for Coqui's STT library, so custom models could potentially be used, but training and deployment of those is outside the scope of this document. The models themselves are provided under a variety of open source licenses, which can be inspected in their source folders (typically inside /etc/spchcat/models/).

Saving Output

By default spchcat writes any recognized text to the terminal, but it's designed to behave like a normal Unix command-line tool, so it can also be written to a file using indirection like this:

spchcat audio/8455-210777-0068.wav > /tmp/transcript.txt

If you then run cat /tmp/transcript.txt (or open it in an editor) you should see `your power is sufficient i said'. You can also pipe the output to another command. Unfortunately you can't pipe audio into the tool from another executable, since pipes aren't designed for non-text data.

There is one subtle difference between writing to a file and to the terminal. The transcription itself can take some time to settle into a final form, especially when waiting for long words to finish, so when it's being run live in a terminal you'll often see the last couple of words change. This isn't useful when writing to a file, so instead the output is finalized before it's written. This can introduce a small delay when writing live microphone or system audio input.

Build from Source

Tool

It's possible to build all dependencies from source, but I recommending downloading binary versions of Coqui's STT, TensorFlow Lite, and KenLM libraries from github.com/coqui-ai/STT/releases/download/v1.1.0/native_client.tflite.Linux.tar.xz. Extract this to a folder, and then from inside a folder containing this repo run to build the spchcat tool itself:

make spchcat LINK_PATH_STT=-L../STT_download

You should replace ../STT_download with the path to the Coqui library folder. After this you should see a spchcat executable binary in the repo folder. Because it relies on shared libraries, you'll need to specify a path to these too using LD_LIBRARY_PATH unless you have copies in system folders.

LD_LIBRARY_PATH=../STT_download ./spchcat

Models

The previous step only built the executable binary itself, but for the complete tool you also need data files for each language. If you have the gh GitHub command line tool you can run the download_models.py script to fetch Coqui's releases into the build/models folder in your local repo. You can then run your locally-built tool against these models using the --languages_dir option:

LD_LIBRARY_PATH=../STT_download ./spchcat --languages_dir=build/models/

Installer

After you have the tool built and the model data downloaded, create_deb_package.sh will attempt to package them into a Debian installer archive. It will take several minutes to run, and the result ends up in spchcat_0.0-2_amd64.deb.

Release Process

There's a notebook at notebooks/build.pynb that runs through all the build steps needed to downloaded dependencies, data, build the executable, and create the final package. These steps are run inside an Ubuntu 18.04 Docker image to create the binaries that are released.

sudo docker run -it -v`pwd`:/spchcat ubuntu:bionic bash

Contributors

Tool code written by Pete Warden, [email protected], heavily based on Coqui's STT example. It's a pretty thin wrapper on top of Coqui's speech to text library, so the Coqui team should get credit for their amazing work. Also relies on TensorFlow, KenLM, data from Mozilla's Common Voice project, and all the contributors to Coqui's model zoo.

License

Tool code is licensed under the Mozilla Public License Version 2.0, see LICENSE in this folder.

All other libraries and model data are released under their own licenses, see the relevant folders for more details.

More Repositories

1

dstk

A collection of the best open data sets and open-source tools for data science
Ruby
1,125
star
2

iPhoneTracker

Objective-C
1,029
star
3

c_hashmap

A simple string hashmap in C
C
521
star
4

open-speech-recording

Web application to record speech for an open data set
HTML
417
star
5

ParallelCurl

A PHP class providing an easy interface for running multiple concurrent CURL requests
PHP
379
star
6

dstkdata

The (large) data files needed for the Data Science Toolkit project
221
star
7

geodict

A simple Python library/tool for pulling location information from unstructured text
Python
184
star
8

findbyemail

A PHP module that incorporates all known APIs that map an email address to user information
PHP
108
star
9

openheatmap

A web renderer for geographic heat maps, using OpenStreetMap compatible file formats
C
102
star
10

extract_loudest_section

Trims .wav audio files to the loudest section of a given length
C++
94
star
11

buzzprofilecrawl

A simple script to crawl Google Profile pages and extract their information as structured data
PHP
90
star
12

tensorflow_makefile

C++
70
star
13

catdoc

Command-line utility for converting Microsoft Word documents to text
C
69
star
14

stm32_bare_lib

Simple examples and utilities for the STM32 "Blue Pill"
C
59
star
15

picoproto

Abominably Tiny Protobuf File Parser in C++
C++
54
star
16

ble_file_transfer

Example of transferring file data over BLE using an Arduino Nano Sense and WebBLE
C++
47
star
17

crunchcrawl

A project to gather, analyze and visualized the data in Crunchbase
PHP
46
star
18

salesforce_restphp_example

A minimal example showing how to handle the OAuth login process and make API calls using the Salesforce REST interface in PHP
PHP
42
star
19

pyparallelcurl

A simple Python class for running multiple URL fetches in parallel
Python
40
star
20

handmadeimap

An implementation of IMAP and POP3 in PHP using raw sockets rather than the php-imap extension
PHP
39
star
21

common_crawl_types

A simple Ruby example of how to process Common Crawl files using Elastic MapReduce
Ruby
30
star
22

pagerankgraph

Visualizes search engine ranking algorithms for a given domain
PHP
30
star
23

magic_wand

Magic Wand example for TensorFlow Lite Micro
Jupyter Notebook
28
star
24

genderfromname

A PHP port of the Perl Text::GenderFromName module
PHP
23
star
25

openwordcloud

Renders word clouds using HTML5's Canvas element
JavaScript
21
star
26

tensorflow_ios

C++
21
star
27

linkedinoauthexample

A minimal example showing how to handle the OAuth login process for LinkedIn using PHP
PHP
19
star
28

MLloWorld

Shows how to write a simple data contest entry for Kaggle, using scikit-learn for machine learning algorithms
Python
18
star
29

geoip_example

A simple example showing how to use the GeoIP API in PHP with the free database from http://maxmind.com
PHP
15
star
30

boilerpipe

A branch of the boilerpipe project
Java
15
star
31

arduino_nano_ble_write_flash

An example of modifying flash memory on the Arduino Nano BLE Sense 33 from a sketch, using Mbed.
C++
13
star
32

postgis2gmap

A small collection of PL/PGSQL functions for converting to and from Google's map tile coordinates
13
star
33

delicious_tags

A demonstration showing how to use the Delicious API to retrieve the top tags for a URL
PHP
12
star
34

simpledb_loader

A Java project exploring the fastest way to upload data to Amazon's SimpleDB
Java
10
star
35

hellosocialworld

A minimal but complete example of a site relying on authentication and sharing through Twitter and Facebook
Ruby
9
star
36

cruftstripper

Pulls strings that statistically look like valid sentences from unstructured text
Python
8
star
37

GeocodeFile

A PHP script to turn a file of addresses into latitude, longitude coordinates
PHP
8
star
38

cc2text

An example job that converts Common Crawl archived web pages into text
Ruby
7
star
39

memory_planner

Prototype for a memory planner for TensorFlow Lite Micro
C++
6
star
40

ml_memory_analyser

Runtime memory usage analysis utilities for ML models
Python
6
star
41

tf_ios_makefile_example

A simple iOS example showing how to use the iOS library produced by TensorFlow's makefile.
Objective-C++
6
star
42

twitteroauthexample

An example of how to implement the UI workflow for Twitter's oAuth process in PHP
PHP
6
star
43

flxjs

A javascript library emulating the Flex 3 Matrix, Point and Rectangle classes
JavaScript
6
star
44

schoolcrawl

Crawls the value-added school effectiveness data from the LA Times website
PHP
5
star
45

copyoptimizer

An example of using KissMetrics to measure and optimize your landing page copy
JavaScript
5
star
46

cuda-convnet

Python
4
star
47

stt_standalone_client

Copy of the example program from Coqui's open-source library with just the files needed to compile against a binary release.
C++
4
star
48

osm2pgsql

A fork of the importer for OpenStreetMap format files in PostGIS
C
4
star
49

invite_example

Sample code for the InvitesDoneRight service
PHP
4
star
50

stitchingbug

A demonstration of a polygon stitching problem with the HTML5 Canvas element
JavaScript
3
star
51

magic_wand_digit_data

Digit gesture training data for the Arduino BLE Nano Sense magic wand
3
star
52

pico_colabs

Colab notebooks for building Raspberry Pi Pico examples on the web
Jupyter Notebook
3
star
53

minimalprofiler

A super-simple profiler for Ruby
Ruby
3
star
54

geocodetest

Measures the quality of address to coordinate results across multiple services
Ruby
3
star
55

parec

Standalone version of the Pulse Audio pacat example
C
2
star
56

datathon

Support code for the ASA Datathon
Python
2
star
57

magic_wand_capture

C++
2
star
58

v4l2_opengl

Minimal example of using Video for Linux 2 together with OpenGL to display a live camera feed
C
2
star
59

pico_multicore_coherence_test

Minimal repro example for a memory coherence problem I encountered when running on the RP2040's dual cores
C
2
star
60

petesplugins

My old open-source video effects
2
star
61

cliargs_py

A utility module to make handling command line arguments in Python easier
Python
2
star
62

mbed-hello-world

Starter program for the mbed IDE for ARM M-class microcontrollers
C++
2
star
63

sinatraperftoolsexample

A minimal example of how to use Perftools in Sinatra
Ruby
2
star
64

cliargs

Easily get command-line arguments for PHP scripts, with error-checking and built-in usage help
2
star
65

tensorflow_apq8009

C++
1
star
66

invitesdoneright

1
star
67

BitBrain_C_Code

GitHub version of BitBrain code repository
C
1
star
68

bookiewatcher

Ruby script for analyzing the voting patterns on the bookie poll
Ruby
1
star
69

person_sensor_blues_web

Web frontend for the Blues/Person Sensor integration
Python
1
star
70

streaming_speech_metrics

Python library for analyzing the latency and accuracy of streaming speech to text systems
Python
1
star