• Stars
    star
    180
  • Rank 206,320 (Top 5 %)
  • Language
    Jupyter Notebook
  • License
    GNU General Publi...
  • Created over 5 years ago
  • Updated over 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Voice Activity Detection (VAD) using deep learning.

Voice Activity Detection in Noisy Environments

Voice Activity Detection (VAD) using deep learning. Supervised by Retune DSP.

Abstract

Automatic speech recognition (ASR) systems often require an always-on low-complexity Voice Activity Detection (VAD) module to identify voice before forwarding it for further processing in order to reduce power consumption. In most real-life scenarios recorded audio is noisy and deepneural networks have proven more robust to noise than the traditionally used statistical methods.

This study investigates the performance of three distinct low-complexity architectures – namely Long Short-Term Memory (LSTM) Recurrent Neural Networks (RNN), Gated Recurrent Unit (GRU) RNNs and an implementation of DenseNet. Furthermore, the impact of Focal Loss (FL) over the Cross-Entropy (CE) criterion during training is explored and findings are compared to recent VAD research.

Using a 72-hour dataset built from open sources with varied noise levels, 12 Mel-frequency Cepstral Coefficients (MFCC) as well as their derivatives in a temporal context of 900 ms, a GRU-RNN with 30.000 parameters achieves an Area Under Curve (AUC) of .991 and a False Acceptance Rate (FAR) of 3.61% given a False Rejection Rate (FRR) fixed at 1%. Focal Loss is found to improve performance slightly when using focusing parameter γ=2 and performance improvements are observed for all three architectures when their number of parameters is increased, which suggests that network size and performance can be viewed as a trade-off.

It is observed that in a high-noise environment, Convolutional Neural Networks (CNN) struggle compared to pure RNNs where a 10.000 parameter LSTM-RNN achieves a FAR of 48.13% for fixed FRR at 1% compared to 58.14% for a DenseNet of comparable size.

Results

All results shown here are for samples generated with a SNR (signal-to-noise ratio) of -3 dB, which -- for the unfamiliar reader -- is a substantial amount of noise.

ROC Curve

ROC

Example of a label

Sample

Associated NN prediction

Prediction

How to run?

You will need to download the two datasets used for our study as well as the notebook itself. Instructions on the datasets are given in the following section. The notebook will automatically install any missing dependencies using pip. You may need to alter the global parameters slightly before running -- see notebook for more details.

Datasets

Our provided notebook can be run in two different modes: either you download and pre-process all data from scratch (takes roughly 16 hours on a personal computer) or you download and execute with already processed data (27 GB).

In any case, you will want to create a local directory that contain all necessary data and outputs. If you want to start from scratch, these are the two datasets that you need to collect: LibriSpeech ASR corpus and QUT-NOISE.

More Repositories

1

tdmpc

Code for "Temporal Difference Learning for Model Predictive Control"
Python
273
star
2

rnn_lstm_from_scratch

How to build RNNs and LSTMs from scratch with NumPy.
Jupyter Notebook
234
star
3

tdmpc2

Code for "TD-MPC2: Scalable, Robust World Models for Continuous Control"
Python
179
star
4

dmcontrol-generalization-benchmark

DMControl Generalization Benchmark
Python
155
star
5

policy-adaptation-during-deployment

Training code and evaluation benchmarks for the "Self-Supervised Policy Adaptation during Deployment" paper.
Python
109
star
6

neural-net-optimization

PyTorch implementations of recent optimization algorithms for deep learning.
Python
61
star
7

minimal-nas

Minimal implementation of a Neural Architecture Search system.
Python
36
star
8

svea-vit

Code for the paper "Stabilizing Deep Q-Learning with ConvNets and Vision Transformers under Data Augmentation"
Python
15
star
9

adaptive-learning-rate-schedule

PyTorch implementation of the "Learning an Adaptive Learning Rate Schedule" paper found here: https://arxiv.org/abs/1909.09712.
Python
10
star
10

nicklashansen.github.io

Repository for my personal site https://nicklashansen.github.io/, built with plain html.
HTML
7
star
11

a3c

Asynchronous Advantage Actor-Critic using Generalized Advantage Estimation (PyTorch)
Python
6
star
12

docker-from-conda

Builds a docker image from a conda environment.yml file.
Dockerfile
3
star
13

smallrl

Personal repository for quick RL prototyping. Work in progress!
Python
2
star
14

music-genre-classification

Exam project on Audio Features for Music Genre Classification for course 02452 Audio Information Processing Systems at Technical University of Denmark (DTU).
Jupyter Notebook
1
star
15

bachelor-thesis

Repository for bachelor thesis on Automatic Multi-Modal Detection of Autonomic Arousals in Sleep. The thesis itself and all related data is confidential and thus not publicly available, but access to the thesis can be granted by sending a request to [email protected].
Python
1
star
16

reinforcement-learning-sutton-barto

Personal repository for course on reinforcement learning. Includes implementations of various problems from the Reinforcement Learning: An Introduction book by R. Sutton and A. Barto.
Jupyter Notebook
1
star
17

nautilus-launcher

Minimal launcher for Nautilus
Python
1
star