• Stars
    star
    216
  • Rank 183,179 (Top 4 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created over 5 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Code for Temporal Convolution for Real-time Keyword Spotting on Mobile Devices

Temporal Convolution for Real-time Keyword Spotting on Mobile Devices

Abstract

Keyword spotting (KWS) plays a critical role in enabling speech-based user interactions on smart devices. Recent developments in the field of deep learning have led to wide adoption of convolutional neural networks (CNNs) in KWS systems due to their exceptional accuracy and robustness. The main challenge faced by KWS systems is the trade-off between high accuracy and low latency. Unfortunately, there has been little quantitative analysis of the actual latency of KWS models on mobile devices. This is especially concerning since conventional convolution-based KWS approaches are known to require a large number of operations to attain an adequate level of performance.

In this paper, we propose a temporal convolution for real-time KWS on mobile devices. Unlike most of the 2D convolution-based KWS approaches that require a deep architecture to fully capture both low- and high-frequency domains, we exploit temporal convolutions with a compact ResNet architecture. In Google Speech Command Dataset, we achieve more than 385x speedup on Google Pixel 1 and surpass the accuracy compared to the state-of-the-art model. In addition, we release the implementation of the proposed and the baseline models including an end-to-end pipeline for training models and evaluating them on mobile devices.

Requirements

  • Python 3.6+
  • Tensorflow 1.13.1

Installation

git clone https://github.com/hyperconnect/TC-ResNet.git
pip3 install -r requirements/py36-[gpu|cpu].txt

Dataset

For evaluating the proposed and the baseline models we use Google Speech Commands Dataset.

Google Speech Commands Dataset

Follow instructions in speech_commands_dataset/

How to run

Scripts to reproduce the training and evaluation procedures discussed in the paper are located on scripts/commands. After training a model, you can generate .tflite file by following the instruction below.

To train TCResNet8Model-1.0 model, run:

./scripts/commands/TCResNet8Model-1.0_mfcc_40_3010_0.001_mom_l1.sh

To freeze the trained model checkpoint into .pb file, run:

python freeze.py --checkpoint_path work/v1/TCResNet8Model-1.0/mfcc_40_3010_0.001_mom_l1/TCResNet8Model-XXX --output_name output/softmax --output_type softmax --preprocess_method no_preprocessing --height 49 --width 40 --channels 1 --num_classes 12 TCResNet8Model --width_multiplier 1.0

To convert the .pb file into .tflite file, run:

tflite_convert --graph_def_file=work/v1/TCResNet8Model-1.0/mfcc_40_3010_0.001_mom_l1/TCResNet8Model-XXX.pb --input_format=TENSORFLOW_GRAPHDEF --output_format=TFLITE --output_file=work/v1/TCResNet8Model-1.0/mfcc_40_3010_0.001_mom_l1/TCResNet8Model-XXX.tflite --inference_type=FLOAT --inference_input_type=FLOAT --input_arrays=input --output_arrays=output/softmax --allow_custom_ops

As shown in above commands, you need to properly set height, width, model, model specific arguments(e.g. width_multiplier). For more information, please refer to scripts/commands/

Benchmark tool

Android Debug Bridge (adb) is required to run the Android benchmark tool (model/tflite_tools/run_benchmark.sh). adb is part of The Android SDK Platform Tools and you can download it here and follow the installation instructions.

1. Connect Android device to your computer

2. Check if connection is established

Run following command.

adb devices

You should see similar output to the one below. The ID of a device will, of course, differ.

List of devices attached
FA77M0304573	device

3. Run benchmark

Go to model/tflite_tools and place the TF Lite model you want to benchmark (e.g. mobilenet_v1_1.0_224.tflite) and execute the following command. You can pass the optional parameter, cpu_mask, to set the CPU affinity CPU affinity

./run_benchmark.sh TCResNet_14Model-1.5.tflite [cpu_mask]

If everything goes well you should see an output similar to the one below. The important measurement of this benchmark is avg=5701.96 part. The number represents the average latency of the inference measured in microseconds.

./run_benchmark.sh TCResNet_14Model-1.5_mfcc_40_3010_0.001_mom_l1.tflite 3
benchmark_model_r1.13_official: 1 file pushed. 22.1 MB/s (1265528 bytes in 0.055s)
TCResNet_14Model-1.5_mfcc_40_3010_0.001_mom_l1.tflite: 1 file pushed. 25.0 MB/s (1217136 bytes in 0.046s)
>>> run_benchmark_summary TCResNet_14Model-1.5_mfcc_40_3010_0.001_mom_l1.tflite 3
TCResNet_14Model-1.5_mfcc_40_3010_0.001_mom_l1.tflite > count=50 first=5734 curr=5801 min=4847 max=6516 avg=5701.96 std=210

License

Apache License 2.0

More Repositories

1

MMNet

Code for Towards Real-Time Automatic Portrait Matting on Mobile Devices
Python
177
star
2

HypeUI

๐ŸŒบ HypeUI is a implementation of Apple's SwiftUI DSL style based on UIKit
Swift
127
star
3

LADE

This repository contains code for the paper "Disentangling Label Distribution for Long-tailed Visual Recognition", published at CVPR' 2021
Python
96
star
4

MarioNETte

MarioNETte: Few-shot Face Reenactment Preserving Identity of Unseen Targets
39
star
5

pseudo-dialog-prompting

This repository contains code for the paper "Meet Your Favorite Character: Open-domain Chatbot Mimicking Fictional Characters with only a Few Utterances", published at NAACL' 2022
Python
12
star
6

hyperconnect.github.io

ํ•˜์ดํผ์ปค๋„ฅํŠธ ๊ธฐ์ˆ ๋ธ”๋กœ๊ทธ์ž…๋‹ˆ๋‹ค
HTML
10
star
7

trusthresh

An official codebase for the paper, "Reliable Decision from Multiple Subtasks through Threshold Optimization: Content Moderation in the Wild", WSDM'23
Python
10
star
8

FasTEN

Python
9
star
9

g2r

Codebase for the EMNLP 2021 Paper "Distilling the Knowledge of Large-scale Generative Models into Retrieval Models for Efficient Open-domain Conversation".
Python
6
star
10

TiDAL

Python
5
star
11

sem-ent

An official codebase for the paper, "Measuring and Improving Semantic Diversity of Dialogue Generation", EMNLP 2022 Findings
Python
4
star
12

corge

An official codebase for the paper "Understanding and Improving the Exemplar-based Generation for Open-domain Conversation", which is presented at ACL 2022, 4th Workshop on NLP for ConvAI as an oral paper.
Python
3
star
13

Attentron

Attentron: Few-shot Text-to-Speech Exploiting Attention-based Variable Length Embedding
1
star
14

pypipeline-tutorial

Python
1
star