• Stars
    star
    162
  • Rank 230,965 (Top 5 %)
  • Language
    Python
  • License
    MIT License
  • Created almost 5 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Executable code based on Google articles

Looking to Listen at the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Separation


The project is an audiovisual model reproduced by the contents of the paper Looking to Listen at the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Separation.

Ephrat A, Mosseri I, Lang O, et al. Looking to listen at the cocktail party: A speaker-independent audio-visual model for speech separation[J]. arXiv preprint arXiv:1804.03619, 2018.


Requirement

To install requirements:

pip install -r requirements.txt

You can install ffmpeg and sox using homebrew:

brew install ffmpeg
brew install sox

Pretreatment

Video Data

  1. Download the dataset from here and place files in data/csv.
  2. First use this command to download the YouTube video and use ffmpeg to capture the 3 second video as 75 images.
python3 video_download.py
  1. Then use mtcnn to get the image bounding box of the face, and then use the CSV x, y to locate the face center point.
pip install mtcnn
python3 face_detected.py
python3 check_vaild_face.py

Audio Data

  1. For the audio section, use the YouTube download tool to download the audio, then set the sample rate to 16000 via the librosa library. Finally, the audio data is normalized.
python3 audio_downloads.py
python3 audio_norm.py # audio_data normalized
  1. Pre-processing audio data, including stft, Power-law, blending, generating complex masks, etc....
python3 audio_data.py

Face embedding Feature

  • Here we use Google's FaceNet method to map face images to high-dimensional Euclidean space. In this project, we use David Sandberg's open source FaceNet preprocessing model "20180402-114759". Then use the TensorFlow_to_Keras script in this project to convert.(Model/face_embedding/

Schroff F, Kalenichenko D, Philbin J. Facenet: A unified embedding for face recognition and clustering[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 815-823.

Change the path tf_model_dir in Tensorflow_to_Keras.py

python3 Tensorflow_to_Keras.py
python3 face_emb.py

  1. Create AVdataset_train.txt and AVdataset_val.txt
python3 AV_data_log.py

Training

  • Support continuous training after interrupt training
  • Support multi-GPU multi-process training.
  • According to the description in the paper, set the following parameters:
people_num = 2 # How many people you want to separate?
epochs = 100
initial_epoch = 0
batch_size = 1 # 2,4 need to GPU
gamma_loss = 0.1
beta_loss = gamma_loss * 2
  • Then use the script train.py to train

Plan to achieve

  • Implemented with Pytorch
  • Provide a trained model
  • Optimize code style
  • ......

Part of the code reference this github https://github.com/bill9800/speech_separation

More Repositories

1

Speech-Separation-Paper-Tutorial

A must-read paper for speech separation based on neural networks
732
star
2

Conv-TasNet

Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation Pytorch's Implement
Python
410
star
3

Dual-Path-RNN-Pytorch

Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation implemented by Pytorch
Python
404
star
4

TDANet

An efficient speech separation method
Python
218
star
5

AFRCNN-For-Speech-Separation

Speech Separation Using an Asynchronous Fully Recurrent Convolutional Neural Network
Python
134
star
6

LibriSpace

Python
130
star
7

Deep-Clustering-for-Speech-Separation

Pytorch implements Deep Clustering: Discriminative Embeddings For Segmentation And Separation
Python
121
star
8

SPMamba

Python
111
star
9

IIANet

This is the demo of our paper "IIANet: An Intra- and Inter-Modality Attention Network for Audio-Visual Speech Separation".
Python
107
star
10

Calculate-SNR-SDR

Script to calculate SNR and SDR using python
Python
86
star
11

LRS3-For-Speech-Separation

Multi-modal speech separation task data generation script on LRS3 data set.
MATLAB
75
star
12

CTCNet

An Audio-Visual Speech Separation Model Inspired by Cortico-Thalamo-Cortical Circuits
Python
65
star
13

UtterancePIT-Speech-Separation

According to funcwj's uPIT, the training code supporting multi-gpu is written, and the Dataloader is reconstructed.
Python
64
star
14

AV-ConvTasNet

Unofficial Time Domain Audio Visual Speech Separation Implementation
Python
44
star
15

Deep-Encoder-Decoder-Conv-TasNet

A PyTorch implementation of " AN EMPIRICAL STUDY OF CONV-TASNET "
Python
43
star
16

DANet-For-Speech-Separation

Pytorch implement of DANet For Speech Separation
Python
20
star
17

S4M

Official implementation of Efficient Speech Separation Framework Based on Neural State-Space Models
Python
16
star
18

Look2hear

A toolkit for researchers in the multimodal sound separation.
15
star
19

speechbrain-docs-zh-cn

SpeechBrain中文文档
12
star
20

Arxiv-New-Paper-Server

Arxiv automatically obtains the latest article service.
CSS
11
star
21

My-Script-For-Audio-Process

Some convenient scripts for your own use
Jupyter Notebook
10
star
22

ExamOnline

This is a complete online exam system
Java
10
star
23

Apollo

Music repair method to convert lossy MP3 compressed music to lossless music.
Python
9
star
24

WeChatApp

Complete code of WeChat Mini Program
JavaScript
8
star
25

player

Android Homework(3)
Java
7
star
26

GrabCut

C++
7
star
27

Grass

Python
7
star
28

ELF-SR

Python
7
star
29

Time

My Android Project
Java
7
star
30

Accelerator

Openmp Accelerator
Python
7
star
31

Deep-Learning

Learn to deep learning the code of your own records.
Python
7
star
32

JusperLee

7
star
33

jusperlee.github.io

HTML
5
star
34

pytorch-template

PyTorch deep learning projects made easy.
Python
4
star
35

Souhu-Competition-Dazuoye

Python
3
star
36

TFACM

HTML
3
star
37

BigData-Homework-Yanwaizhiyi

HTML
2
star
38

RTFS-Net

HTML
2
star
39

Deep-learning-course

Store some necessary files
1
star
40

taichi

Productive & portable programming language for high-performance, sparse & differentiable computing
C++
1
star