Looking to Listen
This is implementation of "Looking to Listen at the Cocktail Party" by python3 and chainer. This deep learning technology can be applied to noise reduction, removal of background music, and speech separation.
Original paper is here (arxiv.org/abs/1804.03619). Note that this implementation is inspired by crystal-method (MIT).
Quick Start Demonstration (Audio-only Noise Reduction)
We show demonstration of noise reduction using pretrained model.
- First, you need build docker container.
$ docker-compose build
-
Put the noisy audio file(s) to
./data/noise
. -
Run following command.
- GPU
$ docker-compose run network python3 quick_start_audio_only.py /data/model/0f_1sclean_noise.npz /data/noise
- CPU (comment out
_set_gpu()
innetwork/src/env.py
)
Intel CPU (Fast)
$ docker-compose run network python3 quick_start_audio_only.py /data/model/0f_1sclean_noise.npz /data/noise -ideep
Other CPU (Slow)
$ docker-compose run network python3 quick_start_audio_only.py /data/model/0f_1sclean_noise.npz /data/noise
- We can get clean audio in
./data/results
.
Usage
Please refer to the following section for additional information such as speech separation and audio-visual processing.
Open in bash
$ docker-compose run preprocess bash
$ docker-compose run dataset bash
$ docker-compose run network bash
Differences from original paper
The original paper has a large FC layer. However, there is not enough memory to put this network on the GPU. In this implementation, the size of the FC layer is reduced so that a network can be installed in a single GPU.
External Libraries
We use external libraries in preprocess/src/libs
.
- Facenet (MIT)