• Stars
    star
    149
  • Rank 248,619 (Top 5 %)
  • Language
    Python
  • License
    Other
  • Created about 2 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Official implementation of "Avocodo: Generative Adversarial Network for Artifact-Free Vocoder" (AAAI2023)

πŸ₯‘ Avocodo: Generative Adversarial Network for Artifact-Free Vocoder

Accepted for publication in the 37th AAAI conference on artificial intelligence.

https://img.shields.io/badge/arXiv-2211.04610-red.svg?style=plastic https://img.shields.io/badge/Sample_Page-Avocodo-blue.svg?style=plastic https://img.shields.io/badge/NC_SpeechAI-publications-brightgreen.svg?style=plastic

In our paper, we proposed Avocodo. We provide our implementation as an open source in this repository.

Abstract : Neural vocoders based on the generative adversarial neural network (GAN) have been widely used due to their fast inference speed and lightweight networks while generating high-quality speech waveforms. Since the perceptually important speech components are primarily concentrated in the low-frequency bands, most GAN-based vocoders perform multi-scale analysis that evaluates downsampled speech waveforms. This multi-scale analysis helps the generator improve speech intelligibility. However, in preliminary experiments, we discovered that the multi-scale analysis which focuses on the low-frequency bands causes unintended artifacts, e.g., aliasing and imaging artifacts, which degrade the synthesized speech waveform quality. Therefore, in this paper, we investigate the relationship between these artifacts and GAN-based vocoders and propose a GAN-based vocoder, called Avocodo, that allows the synthesis of high-fidelity speech with reduced artifacts. We introduce two kinds of discriminators to evaluate speech waveforms in various perspectives: a collaborative multi-band discriminator and a sub-band discriminator. We also utilize a pseudo quadrature mirror filter bank to obtain downsampled multi-band speech waveforms while avoiding aliasing. According to experimental resutls, Avocodo outperforms baseline GAN-based vocoders, both objectviely and subjectively, while reproducing speech with fewer artifacts.

Pre-requisites

  1. Install pyenv
  1. Clone this repository
  2. Setup virtual environment and install python requirements. Please refer pyproject.toml
pyenv install 3.8.11
pyenv virtualenv 3.8.11 avocodo
pyenv local avocodo

pip install wheel
pip install poetry

poetry install
  1. Download and extract the LJ Speech dataset.
  • Move all wav files to LJSpeech-1.1/wavs
  • Split dataset into a trainset and a validationset.
cat LJSpeech-1.1/metadata.csv | tail -n 13000 > training.txt
cat LJSpeech-1.1/metadata.csv | head -n 100 > validation.txt

Training

python avocodo/train.py --config avocodo/configs/avocodo_v1.json

Inference

python avocodo/inference.py --version ${version} --checkpoint_file_id ${checkpoint_file_id}

Reference

We referred to below repositories to make this project.

HiFi-GAN

Parallel-WaveGAN

More Repositories

1

Unreal.js

Unreal.js: Javascript runtime built for UnrealEngine
3,655
star
2

React-UMG

A React renderer for Unreal Motion Graphics With Unreal.js
JavaScript
272
star
3

Unreal.js-core

Unreal.js plugin submodule
C++
260
star
4

Unreal.js-demo

Demo project for unreal.js
JavaScript
131
star
5

promotionImpact

R package for promotion effect analysis
R
46
star
6

PhonMatchNet

Official implementation of "PhonMatchNet: Phoneme-Guided Zero-Shot Keyword Spotting for User-Defined Keywords" (INTERSPEECH 2023)
Python
35
star
7

rotated-box-is-back

Accurate Box Proposal Network for Scene Text Detection
C++
31
star
8

rescue_drone_dataset

30
star
9

PCM-A10-SSL

Sound Source Localization for PCM-A10 Microphone
28
star
10

drone-robust-gender-classification

인λͺ… ꡬ쑰용 λ“œλ‘ μ„ μœ„ν•œ μŒμ„± ν™”μž 인지 기술
27
star
11

v8

modified v8 for unreal.js
C++
18
star
12

oss-basic-training

oss training materials and scripts
18
star
13

osc-enterprise-ko

Korean summary of "Open Source Compliance In The Enterprise (2nd Edition)"
13
star
14

ncstreamer

A Windows application for live-streaming.
C++
10
star
15

wamp-scala

Implementation of WAMP in Scala
Scala
7
star
16

ncresearch

NC NLP Techblog. NC의 NLPκ°€ μ—΄μ–΄κ°ˆ 도전과 λ³€ν™”λ₯Ό μ†Œκ°œν•©λ‹ˆλ‹€.
SCSS
6
star
17

PurpleLive

C++
5
star
18

mpWAV-Sound-Source-Localization

5
star
19

Sound-Source-Localization

5
star
20

ncstreamer-remote

NC Streamer Remote is a Windows dll with which game applications can control NC Streamer remotely via WebSocket protocol.
C++
5
star
21

argew

Implementation for "Node Embedding for Homophilous Graphs with ARGEW: Augmentation of Random walks by Graph Edge Weights"
Python
4
star
22

Align-to-Distill

Official implementation of "Align-to-Distill: Trainable Attention Alignment for Knowledge Distillation in Neural Machine Translation" (LREC-COLING 2024)
Python
4
star
23

Unreal.js-packages

3
star
24

bns2-fonts

λΈ”λ ˆμ΄λ“œ&μ†ŒμšΈ2 곡식 μ„œμ²΄
3
star
25

timesuperin

R library for time series data modeling
R
2
star
26

TimeCriticalResponse

2
star
27

harim_plus

Evaluating Summary Quality with Hallucination Risk
1
star
28

ParameterizedMotion

1
star