• Stars
    star
    582
  • Rank 76,801 (Top 2 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created about 6 years ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

On-device streaming speech-to-text engine powered by deep learning

Cheetah

Made in Vancouver, Canada by Picovoice

Twitter URL

YouTube Channel Views

Cheetah is an on-device streaming speech-to-text engine. Cheetah is:

  • Private; All voice processing runs locally.
  • Accurate
  • Compact and Computationally-Efficient
  • Cross-Platform:
    • Linux (x86_64), macOS (x86_64, arm64), and Windows (x86_64)
    • Android and iOS
    • Chrome, Safari, Firefox, and Edge
    • Raspberry Pi (4, 3) and NVIDIA Jetson Nano

Table of Contents

AccessKey

AccessKey is your authentication and authorization token for deploying Picovoice SDKs, including Cheetah. Anyone who is using Picovoice needs to have a valid AccessKey. You must keep your AccessKey secret. You would need internet connectivity to validate your AccessKey with Picovoice license servers even though the voice recognition is running 100% offline.

AccessKey also verifies that your usage is within the limits of your account. Everyone who signs up for Picovoice Console receives the Free Tier usage rights described here. If you wish to increase your limits, you can purchase a subscription plan.

Demos

Python Demos

Install the demo package:

pip3 install pvcheetahdemo
cheetah_demo_mic --access_key ${ACCESS_KEY}

Replace ${ACCESS_KEY} with yours obtained from Picovoice Console.

C Demos

If using SSH, clone the repository with:

git clone --recurse-submodules [email protected]:Picovoice/cheetah.git

If using HTTPS, clone the repository with:

git clone --recurse-submodules https://github.com/Picovoice/cheetah.git

Build the demo:

cmake -S demo/c/ -B demo/c/build && cmake --build demo/c/build

Run the demo:

./demo/c/build/cheetah_demo_mic -a ${ACCESS_KEY} -m ${MODEL_PATH} -l ${LIBRARY_PATH}

Replace ${ACCESS_KEY} with yours obtained from Picovoice Console, ${LIBRARY_PATH} with the path to appropriate library under lib, and ${MODEL_PATH} to path to default model file (or your custom one).

iOS Demos

To run the demo, go to demo/ios/CheetahDemo and run:

pod install

Replace let accessKey = "${YOUR_ACCESS_KEY_HERE}" in the file ViewModel.swift with your AccessKey.

Then, using Xcode, open the generated CheetahDemo.xcworkspace and run the application.

Android Demo

Using Android Studio, open demo/android/CheetahDemo as an Android project and then run the application.

Replace "${YOUR_ACCESS_KEY_HERE}" in the file MainActivity.java with your AccessKey.

Flutter Demo

To run the Cheetah demo on Android or iOS with Flutter, you must have the Flutter SDK installed on your system. Once installed, you can run flutter doctor to determine any other missing requirements for your relevant platform. Once your environment has been set up, launch a simulator or connect an Android/iOS device.

Before launching the app, use the copy_assets.sh script to copy the cheetah demo model file into the demo project. (NOTE: on Windows, Git Bash or another bash shell is required, or you will have to manually copy the context into the project.).

Replace "${YOUR_ACCESS_KEY_HERE}" in the file main.dart with your AccessKey.

Run the following command from demo/flutter to build and deploy the demo to your device:

flutter run

Go Demo

The demo requires cgo, which on Windows may mean that you need to install a gcc compiler like MinGW to build it properly.

From demo/go run the following command from the terminal to build and run the file demo:

go run micdemo/cheetah_mic_demo.go -access_key "${ACCESS_KEY}"

Replace ${ACCESS_KEY} with yours obtained from Picovoice Console.

For more information about Go demos go to demo/go.

React Native Demo

To run the React Native Porcupine demo app you will first need to set up your React Native environment. For this, please refer to React Native's documentation. Once your environment has been set up, navigate to demo/react-native to run the following commands:

For Android:

yarn android-install    # sets up environment
yarn android-run        # builds and deploys to Android

For iOS:

yarn ios-install        # sets up environment
yarn ios-run

Node.js Demo

Install the demo package:

yarn global add @picovoice/cheetah-node-demo

With a working microphone connected to your device, run the following in the terminal:

cheetah-mic-demo --access_key ${ACCESS_KEY}

For more information about Node.js demos go to demo/nodejs.

Java Demos

The Cheetah Java demo is a command-line application that lets you choose between running Cheetah on an audio file or on real-time microphone input.

To try the real-time demo, make sure there is a working microphone connected to your device. Then invoke the following commands from the terminal:

cd demo/java
./gradlew build
cd build/libs
java -jar cheetah-mic-demo.jar -a ${ACCESS_KEY}

For more information about Java demos go to demo/java.

.NET Demo

Cheetah .NET demo is a command-line application that lets you choose between running Cheetah on an audio file or on real-time microphone input.

Make sure there is a working microphone connected to your device. From demo/dotnet/CheetahDemo run the following in the terminal:

dotnet run -c MicDemo.Release -- --access_key ${ACCESS_KEY}

Replace ${ACCESS_KEY} with your Picovoice AccessKey.

For more information about .NET demos, go to demo/dotnet.

Rust Demo

Cheetah Rust demo is a command-line application that lets you choose between running Cheetah on an audio file or on real-time microphone input.

Make sure there is a working microphone connected to your device. From demo/rust/micdemo run the following in the terminal:

cargo run --release -- --access_key ${ACCESS_KEY}

Replace ${ACCESS_KEY} with your Picovoice AccessKey.

For more information about Rust demos, go to demo/rust.

Web Demo

From demo/web run the following in the terminal:

yarn
yarn start

(or)

npm install
npm run start

Open http://localhost:5000 in your browser to try the demo.

SDKs

Python

Install the Python SDK:

pip3 install pvcheetah

Create an instance of the engine and transcribe audio in real-time:

import pvcheetah

handle = pvcheetah.create(access_key='${ACCESS_KEY}')

def get_next_audio_frame():
    pass

while True:
    partial_transcript, is_endpoint = handle.process(get_next_audio_frame())
    if is_endpoint:
        final_transcript = handle.flush()

Replace ${ACCESS_KEY} with yours obtained from Picovoice Console.

C

Create an instance of the engine and transcribe audio in real-time:

#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>

#include "pv_cheetah.h"

pv_cheetah_t *handle = NULL;
const pv_status_t status = pv_cheetah_init("${ACCESS_KEY}", "${MODEL_PATH}", 0.f, false, &handle);
if (status != PV_STATUS_SUCCESS) {
    // error handling logic
}

extern const int16_t *get_next_audio_frame(void);

while (true) {
    char *partial_transcript = NULL;
    bool is_endpoint = false;
    const pv_status_t status = pv_cheetah_process(
            handle,
            get_next_audio_frame(),
            &partial_transcript,
            &is_endpoint);
    if (status != PV_STATUS_SUCCESS) {
        // error handling logic
    }
    // do something with transcript
    free(partial_transcript);
    if (is_endpoint) {
        char *final_transcript = NULL;
        const pv_status_t status = pv_cheetah_flush(handle, &final_transcript);
        if (status != PV_STATUS_SUCCESS) {
            // error handling logic
        }
        // do something with transcript
        free(final_transcript);
    }
}

Replace ${ACCESS_KEY} with yours obtained from Picovoice Console and ${MODEL_PATH} to path to default model file (or your custom one). Finally, when done be sure to release resources acquired using pv_cheetah_delete(handle).

iOS

The Cheetah iOS binding is available via CocoaPods. To import it into your iOS project, add the following line to your Podfile and run pod install:

pod 'Cheetah-iOS'

Create an instance of the engine and transcribe audio in real-time:

import Cheetah

let modelPath = Bundle(for: type(of: self)).path(
        forResource: "${MODEL_FILE}", // Name of the model file name for Cheetah
        ofType: "pv")!

let cheetah = Cheetah(accessKey: "${ACCESS_KEY}", modelPath: modelPath)

func getNextAudioFrame() -> [Int16] {
  // .. get audioFrame
  return audioFrame;
}

while true {
  do {
    let partialTranscript, isEndpoint = try cheetah.process(getNetAudioFrame())
    if isEndpoint {
      let finalTranscript = try cheetah.flush()
    }
  } catch let error as CheetahError {
      // handle error
  } catch { }
}

Replace ${ACCESS_KEY} with yours obtained from Picovoice Console and ${MODEL_FILE} with a custom trained model from Picovoice Console or the default model.

Android

To include the package in your Android project, ensure you have included mavenCentral() in your top-level build.gradle file and then add the following to your app's build.gradle:

dependencies {
    implementation 'ai.picovoice:cheetah-android:${LATEST_VERSION}'
}

Create an instance of the engine and transcribe audio in real-time:

import ai.picovoice.cheetah.*;

final String accessKey = "${ACCESS_KEY}"; // AccessKey obtained from Picovoice Console (https://console.picovoice.ai/)
final String modelPath = "${MODEL_FILE}";

short[] getNextAudioFrame() {
    // .. get audioFrame
    return audioFrame;
}

try {
    Cheetah cheetah = new Cheetah.Builder().setAccessKey(accessKey).setModelPath(modelPath).build(appContext);

    String transcript = "";

    while true {
        CheetahTranscript transcriptObj = cheetah.process(getNextAudioFrame());
        transcript += transcriptObj.getTranscript();

        if (transcriptObj.getIsEndpoint()) {
            CheetahTranscript finalTranscriptObj = cheetah.flush();
            transcript += finalTranscriptObj.getTranscript();
        }
    };

} catch (CheetahException ex) { }

Replace ${ACCESS_KEY} with yours obtained from Picovoice Console and ${MODEL_FILE} with the default or custom trained model from console.

Flutter

Add the Cheetah Flutter plugin to your pub.yaml.

dependencies:
  cheetah_flutter: ^<version>

Create an instance of the engine and transcribe audio in real-time:

import 'package:cheetah_flutter/cheetah.dart';

const accessKey = "{ACCESS_KEY}"  // AccessKey obtained from Picovoice Console (https://console.picovoice.ai/)

List<int> buffer = getAudioFrame();

try{
    Cheetah _cheetah = await Cheetah.create(accessKey, '{CHEETAH_MODEL_PATH}');

    String transcript = "";

    while true {
        CheetahTranscript partialResult = await _cheetah.process(getAudioFrame());
        transcript += partialResult.transcript;

        if (partialResult.isEndpoint) {
            CheetahTranscript finalResult = await _cheetah.flush();
            transcript += finalResult.transcript;
        }
    }

    _cheetah.delete()

} on CheetahException catch (err) { }

Replace ${ACCESS_KEY} with your AccessKey obtained from Picovoice Console and ${CHEETAH_MODEL_PATH} with the the path a custom trained model from Picovoice Console or the default model.

Go

Install the Go binding:

go get github.com/Picovoice/cheetah/binding/go

Create an instance of the engine and transcribe audio in real-time:

import . "github.com/Picovoice/cheetah/binding/go"

cheetah = NewCheetah{AccessKey: "${ACCESS_KEY}"}
err := cheetah.Init()
if err != nil {
    // handle err init
}
defer cheetah.Delete()

func getNextFrameAudio() []int16{
    // get audio frame
}

for {
  partialTranscript, isEndpoint, err = cheetah.Process(getNextFrameAudio())
  if isEndpoint {
    finalTranscript, err = cheetah.Flush()
    }
}

Replace ${ACCESS_KEY} with yours obtained from Picovoice Console. When done be sure to explicitly release the resources using cheetah.Delete().

React Native

The Cheetah React Native binding is available via NPM. Add it via the following command:

yarn add @picovoice/cheetah-react-native

Create an instance of the engine and transcribe an audio file:

import {Cheetah, CheetahErrors} from '@picovoice/cheetah-react-native';

const getAudioFrame = () => {
  // get audio frames
}

try {
  while (1) {
    const cheetah = await Cheetah.create("${ACCESS_KEY}", "${MODEL_FILE}")
    const {transcript, isEndpoint} = await cheetah.process(getAudioFrame())
    if (isEndpoint) {
      const {transcript} = await cheetah.flush()
    }
  }
} catch (err: any) {
  if (err instanceof CheetahErrors) {
    // handle error
  }
}

Replace ${ACCESS_KEY} with yours obtained from Picovoice Console and ${MODEL_FILE} with the default or custom trained model from console. When done be sure to explicitly release the resources using cheetah.delete().

Node.js

Install the Node.js SDK:

yarn add @picovoice/cheetah-node

Create instances of the Cheetah class:

const Cheetah = require("@picovoice/cheetah-node");

const accessKey = "${ACCESS_KEY}"; // Obtained from the Picovoice Console (https://console.picovoice.ai/)
const endpointDurationSec = 0.2;
const handle = new Cheetah(accessKey);

function getNextAudioFrame() {
  // ...
  return audioFrame;
}

while (true) {
  const audioFrame = getNextAudioFrame();
  const [partialTranscript, isEndpoint] = handle.process(audioFrame);
  if (isEndpoint) {
    finalTranscript = handle.flush()
  }
}

Replace ${ACCESS_KEY} with yours obtained from Picovoice Console.

When done, be sure to release resources using release():

handle.release();

Java

Create an instance of the engine with the Cheetah Builder class and transcribe audio in real-time:

import ai.picovoice.cheetah.*;

final String accessKey = "..."; // AccessKey provided by Picovoice Console (https://console.picovoice.ai/)

short[] getNextAudioFrame() {
    // .. get audioFrame
    return audioFrame;
}

String transcript = "";

try {
    Cheetah cheetah = new Cheetah.Builder().setAccessKey(accessKey).build();

    while true {
        CheetahTranscript transcriptObj = cheetah.process(getNextAudioFrame());
        transcript += transcriptObj.getTranscript();

        if (transcriptObj.getIsEndpoint()) {
            CheetahTranscript finalTranscriptObj = cheetah.flush();
            transcript += finalTranscriptObj.getTranscript();
        }
    }

    cheetah.delete();

} catch (CheetahException ex) { }

.NET

Install the .NET SDK using NuGet or the dotnet CLI:

dotnet add package Cheetah

The SDK exposes a factory method to create instances of the engine as below:

using Pv;

const string accessKey = "${ACCESS_KEY}";

Cheetah handle = Cheetah.Create(accessKey);

Replace ${ACCESS_KEY} with yours obtained from Picovoice Console.

When initialized, the valid sample rate is given by handle.SampleRate. Expected frame length (number of audio samples in an input array) is handle.FrameLength. The engine accepts 16-bit linearly-encoded PCM and operates on single-channel audio.

short[] GetNextAudioFrame()
{
    // .. get audioFrame
    return audioFrame;
}

string transcript = "";

while(true)
{
    CheetahTranscript transcriptObj = handle.Process(GetNextAudioFrame());
    transcript += transcriptObj.Transcript;

        if (transcriptObj.IsEndpoint) {
        CheetahTranscript finalTranscriptObj = handle.Flush();
        transcript += finalTranscriptObj.Transcript;
    }
}

Cheetah will have its resources freed by the garbage collector, but to have resources freed immediately after use, wrap it in a using statement:

using(Cheetah handle = Cheetah.Create(accessKey))
{
    // .. Cheetah usage here
}

Rust

First you will need Rust and Cargo installed on your system.

To add the cheetah library into your app, add pv_cheetah to your app's Cargo.toml manifest:

[dependencies]
pv_cheetah = "*"

Create an instance of the engine using CheetahBuilder instance and transcribe an audio file:

use cheetah::CheetahBuilder;

fn next_audio_frame() -> Vec<i16> {
  // get audio frame
}

let access_key = "${ACCESS_KEY}"; // AccessKey obtained from Picovoice Console (https://console.picovoice.ai/)
let cheetah: Cheetah = CheetahBuilder::new().access_key(access_key).init().expect("Unable to create Cheetah");

if let Ok(cheetahTranscript) = cheetah.process(&next_audio_frame()) {
  println!("{}", cheetahTranscript.transcript)
  if cheetahTranscript.is_endpoint {
    if let Ok(cheetahTranscript) = cheetah.flush() {
      println!("{}", cheetahTranscript.transcript)
    }
  }
}

Replace ${ACCESS_KEY} with yours obtained from Picovoice Console.

Web

Install the web SDK using yarn:

yarn add @picovoice/cheetah-web

or using npm:

npm install --save @picovoice/cheetah-web

Create an instance of the engine using CheetahWorker and transcribe an audio file:

import { CheetahWorker } from "@picovoice/cheetah-web";
import cheetahParams from "${PATH_TO_BASE64_CHEETAH_PARAMS}";

let transcript = "";

function transcriptCallback(cheetahTranscript: CheetahTranscript) {
  transcript += cheetahTranscript.transcript;
  if (cheetahTranscript.isEndpoint) {
    transcript += "\n";
  }
}

function getAudioData(): Int16Array {
... // function to get audio data
  return new Int16Array();
}

const cheetah = await CheetahWorker.create(
  "${ACCESS_KEY}",
  transcriptCallback,
  { base64: cheetahParams }
);

for (;;) {
  cheetah.process(getAudioData());
  // break on some condition
}
cheetah.flush(); // runs transcriptionCallback on remaining data.

Replace ${ACCESS_KEY} with yours obtained from Picovoice Console. Finally, when done release the resources using cheetah.release().

Releases

v1.1.0 — August 11th, 2022

  • added true-casing by default for transcription results
  • added option to enable automatic punctuation insertion
  • Cheetah Web SDK release

v1.0.0 — January 25th, 2022

  • Initial release.

More Repositories

1

porcupine

On-device wake word detection powered by deep learning
Python
3,685
star
2

rhino

On-device Speech-to-Intent engine powered by deep learning
Python
616
star
3

speech-to-text-benchmark

speech to text benchmark framework
Python
603
star
4

picovoice

On-device voice assistant platform powered by deep learning
Python
564
star
5

leopard

On-device speech-to-text engine powered by deep learning
Python
427
star
6

web-voice-processor

A library for real-time voice processing in web browsers
TypeScript
195
star
7

cobra

On-device voice activity detection (VAD) powered by deep learning
Python
165
star
8

picollm

On-device LLM Inference Powered by X-Bit Quantization
Python
158
star
9

wake-word-benchmark

wake word engine benchmark framework
Python
131
star
10

pvrecorder

Cross-platform audio recorder designed for real-time speech audio processing
C
78
star
11

pico-cookbook

Recipes for on-device voice AI and local LLM
JavaScript
62
star
12

koala

On-device noise suppression powered by deep learning
Python
59
star
13

orca

On-device streaming text-to-speech engine powered by deep learning
TypeScript
43
star
14

octopus

On-device Speech-to-Index engine powered by deep learning
Python
34
star
15

flutter-voice-processor

Flutter audio recording plugin designed for real-time speech audio processing
Dart
29
star
16

eagle

On-device speaker recognition engine powered by deep learning
Python
23
star
17

falcon

On-device speaker diarization powered by deep learning
Python
22
star
18

react-native-voice-processor

React Native audio recording package designed for real-time speech audio processing
TypeScript
21
star
19

speech-to-intent-benchmark

benchmark for Speech-to-Intent engines
Python
15
star
20

llm-compression-benchmark

LLM Compression Benchmark
Python
15
star
21

browser-extension

Picovoice Browser Extension
JavaScript
14
star
22

voice-activity-benchmark

Voice activity engine benchmark framework
Python
12
star
23

speaker-diarization-benchmark

Speaker diarization benchmark framework
Python
10
star
24

serverless-picollm

LLM Inference on AWS Lambda
Python
9
star
25

picovoice-arduino-en

Picovoice SDK for Arduino boards - English language
C
7
star
26

ios-voice-processor

Asynchronous iOS audio recording library designed for real-time speech audio processing
Swift
5
star
27

unity-voice-processor

Unity audio recording package designed for real-time speech audio processing
C#
5
star
28

android-voice-processor

Asynchronous Android audio recording library designed for real-time speech audio processing
Java
4
star
29

tts-latency-benchmark

Text-to-Speech Latency Benchmark
Python
3
star
30

speaker-recognition-benchmark

Speaker recongnition benchmark framework
Python
3
star
31

picovoice-arduino-fa

Picovoice SDK for Arduino boards - Persian language
C
2
star
32

porcupine-arduino-en

Porcupine SDK for Arduino boards - English language
C
2
star
33

serverless-leopard

Python
2
star
34

noise-suppression-benchmark

Benchmark for noise suppression engines
Python
1
star
35

picovoice-arduino-es

Picovoice SDK for Arduino boards - Spanish language
C
1
star
36

speech-to-index-benchmark

Speech-to-Index (voice search) benchmark framework
Python
1
star