• Stars
    star
    3,520
  • Rank 12,081 (Top 0.3 %)
  • Language
    C
  • License
    BSD 3-Clause "New...
  • Created over 6 years ago
  • Updated 5 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Recurrent neural network for audio noise reduction
RNNoise is a noise suppression library based on a recurrent neural network.
A description of the algorithm is provided in the following paper:

J.-M. Valin, A Hybrid DSP/Deep Learning Approach to Real-Time Full-Band Speech
Enhancement, Proceedings of IEEE Multimedia Signal Processing (MMSP) Workshop,
arXiv:1709.08243, 2018.
https://arxiv.org/pdf/1709.08243.pdf

An interactive demo is available at: https://jmvalin.ca/demo/rnnoise/

To compile, just type:
% ./autogen.sh
% ./configure
% make

Optionally:
% make install

It is recommended to either set -march= in the CFLAGS to an architecture
with AVX2 support or to add --enable-x86-rtcd to the configure script
so that AVX2 (or SSE4.1) can at least be used as an option.
Note that the autogen.sh script will automatically download the model files
from the Xiph.Org servers, since those are too large to put in Git.

While it is meant to be used as a library, a simple command-line tool is
provided as an example. It operates on RAW 16-bit (machine endian) mono
PCM files sampled at 48 kHz. It can be used as:

% ./examples/rnnoise_demo <noisy speech> <output denoised>

The output is also a 16-bit raw PCM file.
NOTE AGAIN, THE INPUT and OUTPUT ARE IN RAW FORMAT, NOT WAV.

The latest version of the source is available from
https://gitlab.xiph.org/xiph/rnnoise .  The GitHub repository
is a convenience copy.

== Training ==

The models distributed with RNNoise are now trained using only the publicly
available datasets listed below and using the training precedure described
here. Exact results will still depend on the the exact mix us data used,
on how long the training is performed and on the various random seeds involved.

To train an RNNoise model, you need both clean speech data, and noise data.
Both need to be sampled at 48 kHz, in 16-bit PCM format (machine endian).
Clean speech data can be obtained from the datasets listed in the datasets.txt
file, or by downloaded the already-concatenation of those files in
https://media.xiph.org/rnnoise/data/tts_speech_48k.sw
For noise data, we suggest concatenating the 48 kHz noise data from DEMAND at
https://zenodo.org/records/1227121
with contrib_noise.sw and synthetic_noise.sw noise files from
https://media.xiph.org/rnnoise/data/
To balance out the data, we recommend using multiple (e.g. 5) copies of the
contrib_noise.sw and synthetic_noise.sw noise files.

The first step is to take the speech and noise, and mix them in a variety of ways
to simulate real life conditions (including pauses, filtering and more).
Assuming the files are called speech.pcm and noise.pcm, start by generating
the training feature data with:

% ./dump_features speech.pcm noise.pcm features.f32 <count>
where <count> is the number of sequences to process. The number of sequences
should be at least 10000, but the more the better (200000 or more is recommended).

Optionally, training can also simulate reverberation, in which case room impulse
responses (RIR) are also needed. Limited RIR data is available at:
https://media.xiph.org/rnnoise/data/measured_rirs-v2.tar.gz
The format for those is raw 32-bit floating-point (files are little endian).
Assuming a list of all the RIR files is contained in a rir_list.txt file,
the training feature data can be generated with:

% ./dump_features -rir_list rir_list.txt speech.pcm noise.pcm features.f32 <count>

To make the feature generation faster, you can use the script provided in
script/dump_features_parallel.sh (you will need to modify the script if you
want to add RIR augmentation).

To use it:
% script/dump_features_parallel.sh ./dump_features speech.pcm noise.pcm features.f32 <count> <nb_processes>
which will run nb_processes processes, each for count sequences, and
concatenate the output to a single file.

Once the feature file is computed, you can start the training with:
% python3 train_rnnoise.py features.f32 output_directory

Choose a number of epochs (using --epochs) that leads to about 75000 weight
updates. The training will produce .pth files, e.g. rnnoise_50.pth .
The next step is to convert the model to C files using:

% python3 dump_rnnoise_weights.py --quantize rnnoise_50.pth rnnoise_c

which will produce the rnnoise_data.c and rnnoise_data.h files in the
rnnoise_c directory.

Copy these files to src/ and then build RNNoise using the instructions above.

For slightly better results, a trained model can be used to remove any noise
from the "clean" training speech, before restaring the denoising process
again (no need to do that more than once).

== Loadable Models ==

The model format has changed since v0.1.1. Models now use a binary
"machine endian" format. To output a model in that format, build RNNoise
with that model and use the dump_weights_blob executable to output a
weights_blob.bin binary file. That file can then be used with the
rnnoise_model_from_file() API call. Note that the model object MUST NOT
be deleted while the RNNoise state is active and the file MUST NOT
be closed.

To avoid including the default model in the build (e.g. to reduce download
size) and rely only on model loading, add -DUSE_WEIGHTS_FILE to the CFLAGS.
To be able to load different models, the model size (and header file) needs
to patch the size use during build. Otherwise the model will not load
We provide a "little" model with half as an alternative. To use the smaller
model, rename rnnoise_data_little.c to rnnoise_data.c. It is possible
to build both the regular and little binary weights and load any of them
at run time since the little model has the same size as the regular one
(except for the increased sparsity).

More Repositories

1

rav1e

The fastest and safest AV1 encoder.
Assembly
3,575
star
2

opus

Modern audio compression for the internet.
C
2,016
star
3

flac

Free Lossless Audio Codec
C
1,448
star
4

LPCNet

Efficient neural speech synthesis
C
1,098
star
5

daala

Modern video compression for the internet
C
535
star
6

Icecast-Server

Icecast streaming media server (Mirror) - Please report bugs at https://gitlab.xiph.org/xiph/icecast-server/issues
C
446
star
7

vorbis

Reference implementation of the Ogg Vorbis audio format.
C
437
star
8

speexdsp

Speex audio processing library - THIS IS A MIRROR, DEVELOPMENT HAPPENS AT https://gitlab.xiph.org/xiph/speexdsp
C
426
star
9

speex

Speex voice codec mirror - THIS IS A MIRROR, DEVELOPMENT HAPPENS AT https://gitlab.xiph.org/xiph/speex
C
401
star
10

ogg

Reference implementation of the Ogg media container
C
327
star
11

aomanalyzer

AV1 / VP9 Bitstream Analyzer
TypeScript
218
star
12

opus-tools

A set of tools to encode, inspect, and decode audio in the Opus format.
C
208
star
13

opusfile

Stand-alone decoder library for .opus streams
C
141
star
14

libopusenc

Library for encoding .opus audio files and live streams.
C
105
star
15

theora

Reference implementation of the Theora video compression format.
C
94
star
16

vorbis-tools

Command-line tools for creating and playing Ogg Vorbis files.
C
71
star
17

libao

Portable audio output library
C
69
star
18

awcy

Python
68
star
19

ezstream

[Mirror] A streaming source client for Icecast
C
57
star
20

Icecast-libshout

Icecast project live streaming library (Mirror) - Please report bugs at https://gitlab.xiph.org/xiph/icecast-libshout/issues
C
42
star
21

rd_tool

Python
17
star
22

Icecast-IceS

IceS, source client for streaming vorbis to the Icecast server.
C
16
star
23

Icecast-directory

[Obsolete] Icecast stream directory (NodeJS version)
JavaScript
11
star
24

Icecast-common

Shared code of the Icecast project
C
9
star
25

xiph-mirror

Scripts for maintaining mirrors of https://git.xiph.org/
Shell
7
star
26

Icecast-m4

Icecast project shared autofoo
M4
5
star
27

opus-website

Source of https://opus-codec.org/
JavaScript
4
star
28

oggdsf

Ogg Directshow Filters
C
4
star
29

flac-website

Homepage for the Free Lossless Audio Codec
HTML
3
star
30

xiphbot-ng

IRC notification bot in rust
Rust
3
star
31

opus-logo

Source files for the Opus audio codec logo.
3
star
32

sintel-downmix

Scripts for deriving packages from the "Sintel" open movie.
Makefile
2
star
33

gsoc

Google Summer of Code
1
star