• Stars
    star
    275
  • Rank 149,796 (Top 3 %)
  • Language
    Python
  • License
    BSD 3-Clause "New...
  • Created over 4 years ago
  • Updated 12 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

see README

Welcome

This is a set of Python / Pytorch scripts and tools for various speech-processing projects.

It is maintained by Xin Wang since 2021.

XW is a Pytorch newbie. Please feel free to give suggestions and feedback.

Notes

  • The repo is relatively large. Please use --depth 1 option for fast cloning.
git clone --depth 1 https://github.com/nii-yamagishilab/project-NN-Pytorch-scripts.git
  • Latest updates:
    1. Neural vocoders pretrained on VoxCeleb2 dev and other datasets are available in tutorial notebook chapter_a3.ipynb Open In Colab

    2. Code, databases, and resources for the paper below were added. Please check project/09-asvspoof-vocoded-trn/ for more details.

      Xin Wang, and Junichi Yamagishi. Spoofed training data for speech spoofing countermeasure can be efficiently created using neural vocoders. Proc. ICASSP 2023, accepted. https://arxiv.org/abs/2210.10570

    3. Code for the paper for the paper below were added. Please check project/08-asvspoof-activelearn for more details.

      Xin Wang, and Junichi Yamagishi. Investigating Active-Learning-Based Training Data Selection for Speech Spoofing Countermeasure. In Proc. SLT, accepted. 2023.

    4. Pointer to tutorials on neural vocoders were moved to ./tutorials/b1_neural_vocoder.

    5. All pre-trained models were moved to Zenodo.

Contents

This repository contains a few projects and tutorials.

Project

Folder Project
project/01-nsf Neural source-filter waveform models
project/05-nn-vocoders Other neural waveform models including WaveNet, WaveGlow, and iLPCNet.
project/03-asvspoof-mega Speech spoofing countermeasures : a comparison of some popular countermeasures
project/06-asvspoof-ood Speech spoofing countermeasures with confidence estimation
project/07-asvspoof-ssl Speech spoofing countermeasures with pre-trained self-supervised-learning (SSL) speech feature extractor
project/08-asvspoof-activelearn Speech spoofing countermeasures in an active learning framework
project/09-asvspoof-vocoded-trn Speech spoofing countermeasures using vocoded speech as spoofed data

See project/README.md for an overview.

Tutorials

Folder Status Contents
b1_neural_vocoder readable and executable tutorials on selected neural vocoders
b2_anti_spoofing partially finished tutorials on speech audio anti-spoofing
b3_voice_privacy readable and executable tutorials on voice privacy challenge baselines

See tutorials/README.md for an overview.

Python environment

Projects above use either one of the two environments:

For most of the projects, install env.yml is sufficient

# create environment
conda env create -f env.yml

# load environment (whose name is pytorch-1.7)
conda activate pytorch-1.7

For projects using SSL models, use ./env-fs-install.sh to install the dependency.

# make sure other conda envs are not loaded
bash env-fs-install.sh

# load
conda activate fairseq-pip2

How to use

Most of the projects include a simple demonstration script. Take project/01-nsf/cyc-noise-nsf as an example:

# cd into one project
cd project/01-nsf/cyc-noise-nsf-4

# add PYTHONPATH and activate conda environment
source ../../../env.sh 

# run the script
bash 00_demo.sh

The printed messages will show what is happening.

Detailed instruction is in README of each project.

Folder structure

Name Function
./core_scripts scripts (Numpy or Pytorch code) to manage the training process, data io, etc.
./core_modules finalized pytorch modules
./sandbox new functions and modules to be test
./project project directories, and each folder correspond to one model for one dataset
./project/*/*/main.py script to load data and run training and inference
./project/*/*/model.py model definition based on Pytorch APIs
./project/*/*/config.py configurations for training/val/test set data
./project/*/*/*.sh scripts to wrap the python codes

See more instructions on the design and conventions of this repository misc/DESIGN.md

Resources & links


By Xin Wang

More Repositories

1

multi-speaker-tacotron

VCTK multi-speaker tacotron for ICASSP 2020
Python
262
star
2

Capsule-Forensics-v2

Implementation of the Capsule-Forensics-v2
Python
114
star
3

self-attention-tacotron

An implementation of "Investigation of enhanced Tacotron text-to-speech synthesis systems with self-attention for pitch accent language" https://arxiv.org/abs/1810.11960
Python
113
star
4

ZMM-TTS

ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations
C
96
star
5

tacotron2

An implementation of Tacotron and Tacotron2
Python
81
star
6

project-CURRENNT-public

CURRENNNT codes and scripts
Cuda
76
star
7

ClassNSeg

Implementation and demonstration of the paper: Multi-task Learning for Detecting and Segmenting Manipulated Facial Images and Videos
Python
75
star
8

project-CURRENNT-scripts

This repository contains the scripts to use CURRENNT
Python
64
star
9

mos-finetune-ssl

Python
63
star
10

Extended_VQVAE

Python
59
star
11

Intelligibility-MetricGAN

Implementation for paper "iMetricGAN: Intelligibility Enhancement for Speech-in-Noise using Generative Adversarial Network-based Metric Learning"
Python
51
star
12

VCC2020-database

49
star
13

Attention_Backend_for_ASV

Attention Backend for Aotumatic Speaker Verification with Multiple Enrollment Utterances
Python
45
star
14

TSNetVocoder

Python
42
star
15

Capsule-Forensics

Old implementation and demonstration of the Capsule-Forensics. The Capsule-Forensics-v2 has been released here: https://github.com/nii-yamagishilab/capsule-forensics-v2
Python
31
star
16

midi-to-audio

Project for MIDI to Audio Synthesis
Shell
19
star
17

vctk-silence-labels

19
star
18

NELE-GAN

Implementation for paper: Multi-Metric Optimization using Generative Adversarial Networks for Near-End Speech Intelligibility Enhancement
Python
18
star
19

PartialSpoof

Jupyter Notebook
17
star
20

speaker_sex_attribute_privacy

Project for HIDING SPEAKER’S SEX IN SPEECH USING ZERO-EVIDENCE SPEAKER REPRESENTATION IN AN ANALYSIS/SYNTHESIS PIPELINE
Python
14
star
21

SSL-SAS

Language independent SSL-based Speaker Anonymization system
Python
11
star
22

ssnt-tts

An implementation of SSNT-TTS.
Python
6
star
23

mla

A Multi-Level Attention Model for Evidence-Based Fact Checking
Python
4
star
24

downloader-DR-VCTK-complete

downloader to obtain the complete DR-VCTK dataset (250GB)
Python
4
star
25

Modular-CNN-for-CGIs-PIs-discrimination

Python
2
star
26

ewc

Python
2
star
27

fashion_adv

Fashion-Guided Adversarial Attack on Person Segmentation
Python
2
star
28

partial_rank_similarity

Jupyter Notebook
2
star
29

VCC2020-listeningtest

1
star
30

xfever

Shell
1
star
31

Generalization_of_CMs_regularizations

The source code for the paper Improving Generalization Ability of Countermeasures for New Mismatch Scenario by Combining Multiple Advanced Regularization Terms (interspeech2023)
Python
1
star