• Stars
    star
    603
  • Rank 74,294 (Top 2 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created over 6 years ago
  • Updated 12 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

speech to text benchmark framework

Speech-to-Text Benchmark

Made in Vancouver, Canada by Picovoice

This repo is a minimalist and extensible framework for benchmarking different speech-to-text engines.

Table of Contents

Data

Metrics

Word Error Rate

Word error rate (WER) is the ratio of edit distance between words in a reference transcript and the words in the output of the speech-to-text engine to the number of words in the reference transcript.

Real Time Factor

Real-time factor (RTF) is the ratio of CPU (processing) time to the length of the input speech file. A speech-to-text engine with lower RTF is more computationally efficient. We omit this metric for cloud-based engines.

Model Size

The aggregate size of models (acoustic and language), in MB. We omit this metric for cloud-based engines.

Engines

Usage

This benchmark has been developed and tested on Ubuntu 20.04.

  • Install FFmpeg
  • Download datasets.
  • Install the requirements:
pip3 install -r requirements.txt

Amazon Transcribe Instructions

Replace ${DATASET} with one of the supported datasets, ${DATASET_FOLDER} with path to dataset, and ${AWS_PROFILE} with the name of AWS profile you wish to use.

python3 benchmark.py \
--dataset ${DATASET} \
--dataset-folder ${DATASET_FOLDER} \
--engine AMAZON_TRANSCRIBE \
--aws-profile ${AWS_PROFILE}

Azure Speech-to-Text Instructions

Replace ${DATASET} with one of the supported datasets, ${DATASET_FOLDER} with path to dataset, ${AZURE_SPEECH_KEY} and ${AZURE_SPEECH_LOCATION} information from your Azure account.

python3 benchmark.py \
--dataset ${DATASET} \
--dataset-folder ${DATASET_FOLDER} \
--engine AZURE_SPEECH_TO_TEXT \
--azure-speech-key ${AZURE_SPEECH_KEY}
--azure-speech-location ${AZURE_SPEECH_LOCATION}

Google Speech-to-Text Instructions

Replace ${DATASET} with one of the supported datasets, ${DATASET_FOLDER} with path to dataset, and ${GOOGLE_APPLICATION_CREDENTIALS} with credentials download from Google Cloud Platform.

python3 benchmark.py \
--dataset ${DATASET} \
--dataset-folder ${DATASET_FOLDER} \
--engine GOOGLE_SPEECH_TO_TEXT \
--google-application-credentials ${GOOGLE_APPLICATION_CREDENTIALS}

IBM Watson Speech-to-Text Instructions

Replace ${DATASET} with one of the supported datasets, ${DATASET_FOLDER} with path to dataset, and ${WATSON_SPEECH_TO_TEXT_API_KEY}/${${WATSON_SPEECH_TO_TEXT_URL}} with credentials from your IBM account.

python3 benchmark.py \
--dataset ${DATASET} \
--dataset-folder ${DATASET_FOLDER} \
--engine IBM_WATSON_SPEECH_TO_TEXT \
--watson-speech-to-text-api-key ${WATSON_SPEECH_TO_TEXT_API_KEY}
--watson-speech-to-text-url ${WATSON_SPEECH_TO_TEXT_URL}

Mozilla DeepSpeech Instructions

Replace ${DATASET} with one of the supported datasets, ${DATASET_FOLDER} with path to dataset, ${DEEP_SPEECH_MODEL} with path to DeepSpeech model file (.pbmm), and ${DEEP_SPEECH_SCORER} with path to DeepSpeech scorer file (.scorer).

python3 benchmark.py \
--engine MOZILLA_DEEP_SPEECH \
--dataset ${DATASET} \
--dataset-folder ${DATASET_FOLDER} \
--deepspeech-pbmm ${DEEP_SPEECH_MODEL} \
--deepspeech-scorer ${DEEP_SPEECH_SCORER}

Picovoice Cheetah Instructions

Replace ${DATASET} with one of the supported datasets, ${DATASET_FOLDER} with path to dataset, and ${PICOVOICE_ACCESS_KEY} with AccessKey obtained from Picovoice Console.

python3 benchmark.py \
--engine PICOVOICE_CHEETAH \
--dataset ${DATASET} \
--dataset-folder ${DATASET_FOLDER} \
--picovoice-access-key ${PICOVOICE_ACCESS_KEY}

Picovoice Leopard Instructions

Replace ${DATASET} with one of the supported datasets, ${DATASET_FOLDER} with path to dataset, and ${PICOVOICE_ACCESS_KEY} with AccessKey obtained from Picovoice Console.

python3 benchmark.py \
--engine PICOVOICE_LEOPARD \
--dataset ${DATASET} \
--dataset-folder ${DATASET_FOLDER} \
--picovoice-access-key ${PICOVOICE_ACCESS_KEY}

Results

Word Error Rate (WER)

Engine LibriSpeech test-clean LibriSpeech test-other TED-LIUM CommonVoice Average
Amazon Transcribe 5.20% 9.58% 4.25% 15.94% 8.74%
Azure Speech-to-Text 4.96% 9.66% 4.99% 12.09% 7.93%
Google Speech-to-Text 11.23% 24.94% 15.00% 30.68% 20.46%
Google Speech-to-Text (Enhanced) 6.62% 13.59% 6.68% 18.39% 11.32%
IBM Watson Speech-to-Text 11.08% 26.38% 11.89% 38.81% 22.04%
Mozilla DeepSpeech 7.27% 21.45% 18.90% 43.82% 22.86%
Picovoice Cheetah 7.08% 16.28% 10.89% 23.10% 14.34%
Picovoice Leopard 5.39% 12.45% 9.04% 17.13% 11.00%

RTF

Measurement is carried on an Ubuntu 20.04 machine with Intel CPU (Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz), 64 GB of RAM, and NVMe storage.

Engine RTF Model Size
Mozilla DeepSpeech 0.46 1142 MB
Picovoice Cheetah 0.07 19 MB
Picovoice Leopard 0.05 19 MB

More Repositories

1

porcupine

On-device wake word detection powered by deep learning
Python
3,685
star
2

rhino

On-device Speech-to-Intent engine powered by deep learning
Python
616
star
3

cheetah

On-device streaming speech-to-text engine powered by deep learning
Python
582
star
4

picovoice

On-device voice assistant platform powered by deep learning
Python
564
star
5

leopard

On-device speech-to-text engine powered by deep learning
Python
427
star
6

web-voice-processor

A library for real-time voice processing in web browsers
TypeScript
195
star
7

cobra

On-device voice activity detection (VAD) powered by deep learning
Python
165
star
8

picollm

On-device LLM Inference Powered by X-Bit Quantization
Python
158
star
9

wake-word-benchmark

wake word engine benchmark framework
Python
131
star
10

pvrecorder

Cross-platform audio recorder designed for real-time speech audio processing
C
78
star
11

pico-cookbook

Recipes for on-device voice AI and local LLM
JavaScript
62
star
12

koala

On-device noise suppression powered by deep learning
Python
59
star
13

orca

On-device streaming text-to-speech engine powered by deep learning
TypeScript
43
star
14

octopus

On-device Speech-to-Index engine powered by deep learning
Python
34
star
15

flutter-voice-processor

Flutter audio recording plugin designed for real-time speech audio processing
Dart
29
star
16

eagle

On-device speaker recognition engine powered by deep learning
Python
23
star
17

falcon

On-device speaker diarization powered by deep learning
Python
22
star
18

react-native-voice-processor

React Native audio recording package designed for real-time speech audio processing
TypeScript
21
star
19

speech-to-intent-benchmark

benchmark for Speech-to-Intent engines
Python
15
star
20

llm-compression-benchmark

LLM Compression Benchmark
Python
15
star
21

browser-extension

Picovoice Browser Extension
JavaScript
14
star
22

voice-activity-benchmark

Voice activity engine benchmark framework
Python
12
star
23

speaker-diarization-benchmark

Speaker diarization benchmark framework
Python
10
star
24

serverless-picollm

LLM Inference on AWS Lambda
Python
9
star
25

picovoice-arduino-en

Picovoice SDK for Arduino boards - English language
C
7
star
26

ios-voice-processor

Asynchronous iOS audio recording library designed for real-time speech audio processing
Swift
5
star
27

unity-voice-processor

Unity audio recording package designed for real-time speech audio processing
C#
5
star
28

android-voice-processor

Asynchronous Android audio recording library designed for real-time speech audio processing
Java
4
star
29

tts-latency-benchmark

Text-to-Speech Latency Benchmark
Python
3
star
30

speaker-recognition-benchmark

Speaker recongnition benchmark framework
Python
3
star
31

picovoice-arduino-fa

Picovoice SDK for Arduino boards - Persian language
C
2
star
32

porcupine-arduino-en

Porcupine SDK for Arduino boards - English language
C
2
star
33

serverless-leopard

Python
2
star
34

noise-suppression-benchmark

Benchmark for noise suppression engines
Python
1
star
35

picovoice-arduino-es

Picovoice SDK for Arduino boards - Spanish language
C
1
star
36

speech-to-index-benchmark

Speech-to-Index (voice search) benchmark framework
Python
1
star