• Stars
    star
    184
  • Rank 209,187 (Top 5 %)
  • Language
    C++
  • License
    Apache License 2.0
  • Created over 8 years ago
  • Updated over 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

FastCGI support for Kaldi ASR

About

FastCGI support for Kaldi. It allows Kaldi based speech recognition to be used though Apache or Nginx (or any other that support FastCGI) HTTP servers. It also contains simple HTML-based client, that allows testing Kaldi speech recognitionfrom a web page.

Licence

Apache 2.0

Installation guide

Summary

This guide will help you to download and build your own simple ASR web-service based on Kaldi ASR code.

Preparing prerequisites

Creating a working dir

Let's create a directory where all data will be downloaded and built.

mkdir ~/apiai
cd ~/apiai

You are free to choose any other name and path you wish to, but will have to keep in mind that your name differs from the name given in the guide.

Due to server code is based on Kaldi almost all prerequisites matches to Kaldi ones. Besides that a FastCGI library is required to communicate with HTTP server.

Getting Kaldi

As a first step you have to clone Kaldi source tree available at https://github.com/kaldi-asr/kaldi:

git clone https://github.com/kaldi-asr/kaldi

This command will clone source tree to kaldi directory. To configure and build Kaldi please refer to kaldi/INSTALL file. For detailed information please look for Kaldi official instruction: http://kaldi-asr.org/doc/install.html

Installing libraries

There are some extra libraries required. You may install them using system packet manager.

In openSuSE you may run:

$ sudo zypper install FastCGI-devel

It you have Debian or Ubuntu:

$ sudo apt-get install libfcgi-dev

Getting the code

Return to your working directory where you put Kaldi sources

$ cd ~/apiai

and then clone server source code

$ git clone https://github.com/api-ai/asr-server asr-server

It is recommended to checkout code to the same directory where kaldi-apiai is located to allow configure tool to detect Kaldi location automatically.

Building the app

$ cd asr-server

Before running a make process you have to configure build scripts by running a special utility:

$ ./configure

It will check that all required libraries installed to your system and also will look for Kaldi libraries in ../kaldi folder. If you have Kaldi installed somewhere else you may explicitly pass the path via --kaldi-root option:

$ ./configure --kaldi-root=<path_to_kaldi>

If configuration process has finished successfully you may begin the building process by running make script:

$ make

Getting a recognition model

When application build complete you need to download language specific data.

Return to your working directory where you put Kaldi sources

$ cd ~/apiai

Builded ASR application uses a Kaldi nnet3 models, which you can get by training a neural network with your personal data set or use a pretrained network provided by us. Currently it is only English model available at https://github.com/api-ai/api-ai-english-asr-model/releases/download/1.0/api.ai-kaldi-asr-model.zip.

$ wget https://github.com/api-ai/api-ai-english-asr-model/releases/download/1.0/api.ai-kaldi-asr-model.zip

Unzip the archive to asr-server directory.

$ unzip api.ai-kaldi-asr-model.zip

Running the app

Set the model directory as a working dir:

$ cd api.ai-kaldi-asr-model

There are several ways available to run application. The first one is to run it as a standalone app listening on socket defined with --fcgi-socket option:

$ ../asr-server/fcgi-nnet3-decoder --fcgi-socket=:8000

This command runs application listening on any IP address and port 8000. You are also free to define a path Unix socket, or explicit IP address (in a A.B.C.D:PORT form).

As an alternative way you may use special spawn-fcgi utility:

$ spawn-fcgi -n -p 8000 -- ../asr-server/fcgi-nnet3-decoder

Configuring HTTP service

You may use any web-server which have FastCGI support: Apache, Nginx, Lighttpd etc.

Installing Apache2

openSuSE:

$ sudo zypper in apache2

Debian and Ubuntu:

$ sudo apt-get install apache2

Configuring Apache2

Enable FastCGI proxy module with a2enmod:

$ sudo a2enmod proxy_fcgi

Then you have to add to Apache2 configuration file following line:

ProxyPass "/asr" "fcgi://localhost:8000/"

If your Apache configured to include all .conf files from /etc/apache2/conf.d folder you may create separate asr_proxy.conf file with following content:

ProxyPass "/asr" "fcgi://localhost:8000/"
Alias /asr-html/ "/home/username/apiai/asr-server/asr-html/"
<Directory "/home/username/apiai/asr-server/asr-html">
	Options Indexes MultiViews
	AllowOverride None
	Require all granted
</Directory>

Now restart Apache:

$ sudo /etc/init.d/apache2 restart

Installing Nginx

You can download latest sources from official website http://nginx.org/ and build Nginx with yourself or use your system package manager.

openSuSE:

$ sudo zypper install nginx

Debian and Ubuntu:

$ sudo apt-get install nginx

Configuring Nginx

Open nginx.conf and write down the following code:

http {
	server {
		location /asr {
			fastcgi_pass 127.0.0.1:8000;
			# Disabling this option invokes immediate sending replies to client
			fastcgi_buffering off;
			# Disabling this option invokes immediate decoding incoming audio data
			fastcgi_request_buffering off;
			include      fastcgi_params;
		}

		location /asr-html {
			root /home/username/apiai/asr-server/;
			index index.html;
		}
	}
}

This will setup Nginx to pass all requests coming to url /asr directly to ASR service listening 8000 port via FastCGI gate. For detailed information please please refer to nginx documentation (e.g. https://www.nginx.com/resources/wiki/start/topics/examples/fastcgiexample/)

Speech Recognition

Server accepts raw mono 16-bits 16 KHz PCM data. You can convert your audio using any popular encoding utilities, for instance, you can use ffmpeg:

$ ffmpeg -i audio.wav -f s16le -ar 16000 -ac 1 audio.raw

Recognition using web browser

There is a simple JS implementation that allows you to recognize speech using system mic. Open in your browser:

http://localhost/asr-html/

and follow the instructions on the page.

Recognition from command line using curl

Now, let’s recognize audio.raw by calling web-service with curl utility:

$ curl -H "Content-Type: application/octet-stream" --data-binary @audio.raw http://localhost/asr

On successfull recognition the command will return something like this:

{
	"status":"ok",
	"data":[{"confidence":0.900359,"text":"HELLO WORLD"}]
}

On error the return value will be like this:

{"status":"error","data":[{"text":"Failed to decode"}]}

Recognition request parameters

There are several parameters to tune up recognition process. All parameters are expected to be passed via query string as web-form fields enumeration (e.g. ?name1=value1&name2=value2).

Parameter Description Acceptable values Default value
nbest Set the number of possible returned values
{
	"status":"ok",
	"data":[
		{"confidence":0.900359,"text":"HELLO WORLD"},
		{"confidence":0.89012,"text":"HELLO WORD"}
	]
}
1-10 1
endofspeech Enable or disable end-of-speech points during recognition. If endpoint detected all then current result have returned and the rest data would be skipped. Also in case of interrupted recognition 2 fields would be added to response: "interrupted" with value "endofspeech", and "time" with time point showing the number of milliseconds have been processed.
{
	"status":"ok",
	"data":[{"confidence":0.900359,"text":"HELLO WORLD"}],
	"interrupted":"endofspeech",
	"time":3800
}
true or false true
intermediate Set time interval in milliseconds between intermediate results while recognition being in progress.
		The result returned as an simple sequence of JSON documents.
		Each intermediate document have "status" field set to "intermediate",
		last one will have "status" set to "ok".

{"status":"intermediate","data":[
	{"confidence":0.908981,"text":"HELLO"}
]}
{"status":"intermediate","data":[
	{"confidence":0.903025,"text":"HELLO WORLD"}
]}
{"status":"ok","data":[
	{"confidence":0.903025,"text":"HELLO WORLD"}
]}
>500 0
multipart If enabled the result would be returned as an HTTP multipart response with "content-type" set to "multipart/x-mixed-replace" and each response part has "Content-Disposition" header value equal to "form-data". Intermediate parts named as "partial" and a final part is named as "result".

--ResponseBoundary
Content-Disposition: form-data; name="partial"
Content-type: application/json

{"status":"intermediate","data":[ {"confidence":0.908981,"text":"HELLO"} ]}

--ResponseBoundary Content-Disposition: form-data; name="partial" Content-type: application/json

{"status":"intermediate","data":[ {"confidence":0.903025,"text":"HELLO WORLD"} ]}

--ResponseBoundary Content-Disposition: form-data; name="result" Content-type: application/json

{"status":"ok","data":[ {"confidence":0.903025,"text":"HELLO WORLD"} ]}

--ResponseBoundary--

true or false false

More Repositories

1

dialogflow-nodejs-client

Node.js SDK for Dialogflow
JavaScript
658
star
2

dialogflow-fulfillment-nodejs

Dialogflow agent fulfillment library supporting v1&v2, 8 platforms, and text, card, image, suggestion, custom responses
JavaScript
599
star
3

dialogflow-android-client

Android SDK for Dialogflow
Java
575
star
4

dialogflow-python-client

Python library for Dialogflow
Python
556
star
5

dialogflow-javascript-client

JavaScript Web SDK for Dialogflow
TypeScript
412
star
6

dialogflow-apple-client

iOS SDK for Dialogflow
Objective-C
244
star
7

agent-human-handoff-nodejs

A simple Dialogflow agent, a server, and a web interface that shows an approach for handling text-based conversations between a Dialogflow agent and a human operator
JavaScript
211
star
8

fulfillment-webhook-json

Dialogflow's Fulfillment: Webhook JSON (Requests & Responses)
190
star
9

dialogflow-ruby-client

Ruby SDK for Dialogflow
Ruby
141
star
10

selfservicekiosk-audio-streaming

A best practice for streaming audio from a browser microphone to Dialogflow or Google Cloud STT by using websockets.
JavaScript
140
star
11

dialogflow-java-client

Java client library for Dialogflow
Java
133
star
12

resources

Links to all Dialogflow libraries and samples
119
star
13

fulfillment-weather-nodejs

Integrating an API with Dialogflow's Fulfillment
JavaScript
81
star
14

fulfillment-firestore-nodejs

Integrating Firebase's Firestore database with Dialogflow
JavaScript
81
star
15

fulfillment-bike-shop-nodejs

Integrating Google Calendar API with Dialogflow's Fulfillment & Knowledge Connectors
JavaScript
79
star
16

dialogflow-dotnet-client

.NET framework for Dialogflow
C#
70
star
17

dialogflow-botkit-client

Botkit library for Dialogflow
JavaScript
62
star
18

dialogflow-java-client-v2

Java client for Dialogflow: Design and integrate a conversational user interface into your applications and devices.
59
star
19

dialogflow-unity-client

Unity library for Dialogflow
C#
54
star
20

fulfillment-translate-python

Python
45
star
21

dialogflow-cordova-client

Cordova library for Dialogflow
Objective-C
42
star
22

api-ai-english-asr-model

Api.ai English Speech Recognition (ASR) Model for Kaldi
36
star
23

fulfillment-actions-library-nodejs

Integrating Actions on Google Client Library with Dialogflow's Fulfillment Library
JavaScript
29
star
24

dialogflow-cpp-client

C++ library for Dialogflow
C++
28
star
25

api-ai-cocoa-swift

Cocoa Swift library
Swift
28
star
26

fulfillment-telephony-nodejs

Sample integrating Telephony, Google Sheets, and Slot Filling with Dialogflow
JavaScript
24
star
27

fulfillment-importer-nodejs

Dialogflow's Importer for Alexa Skills to import a Alexa Skill to Dialogflow
JavaScript
22
star
28

fulfillment-slot-filling-nodejs

Slot Filling with Dialogflow Fulfillment
JavaScript
19
star
29

fulfillment-temperature-converter-nodejs

Sample demonstrating how to make a Dialogflow agent compatible with 9 platforms
JavaScript
19
star
30

fulfillment-faq-nodejs

Integrating Dialogflow's Knowledge Connectors, Phone Gateway, and Actions on Google
JavaScript
17
star
31

dialogflow-xamarin-client

Xamarin SDK for Dialogflow
C#
10
star
32

fulfillment-multi-locale-nodejs

Sample showing how to use multiple languages and locales (e.g. English and French)
JavaScript
10
star
33

city-streets-trivia-nodejs

This sample demonstrates how to create and update developer entities using the Dialogflow Node.js Client and the Dialogflow Fulfillment Library. It also demonstrates how to create session entities from your fulfillment code.
JavaScript
10
star
34

fulfillment-regex-nodejs

Validate Entities with Regular Expressions in Dialogflow's Fulfillment
JavaScript
8
star