• Stars
    star
    201
  • Rank 194,491 (Top 4 %)
  • Language
    Python
  • Created about 1 year ago
  • Updated 5 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Audio Codec Speech processing Universal PERformance Benchmark

Codec-SUPERB: Sound Codec Speech Processing Universal Performance Benchmark

Overview

Codec-SUPERB is a comprehensive benchmark designed to evaluate audio codec models across a variety of speech tasks. Our goal is to facilitate community collaboration and accelerate advancements in the field of speech processing by preserving and enhancing speech information quality.

Table of Contents

Introduction

Codec-SUPERB sets a new benchmark in evaluating sound codec models, providing a rigorous and transparent framework for assessing performance across a range of speech processing tasks. Our goal is to foster innovation and set new standards in audio quality and processing efficiency.

Key Features

Out-of-the-Box Codec Interface

Codec-SUPERB offers an intuitive, out-of-the-box codec interface that allows for easy integration and testing of various codec models, facilitating quick iterations and experiments.

Multi-Perspective Leaderboard

Codec-SUPERB's unique blend of multi-perspective evaluation and an online leaderboard drives innovation in sound codec research by providing a comprehensive assessment and fostering competitive transparency among developers.

Standardized Environment

We ensure a standardized testing environment to guarantee fair and consistent comparison across all models. This uniformity brings reliability to benchmark results, making them universally interpretable.

Unified Datasets

We provide a collection of unified datasets, curated to test a wide range of speech processing scenarios. This ensures that models are evaluated under diverse conditions, reflecting real-world applications.

Installation

git clone https://github.com/voidful/Codec-SUPERB.git
cd Codec-SUPERB
pip install -r requirements.txt

Usage

Out of the Box Codec Interface

from SoundCodec import codec
import torchaudio

# get all available codec
print(codec.list_codec())
# load codec by name, use encodec as example
encodec_24k_6bps = codec.load_codec('encodec_24k_6bps')

# load audio
waveform, sample_rate = torchaudio.load('sample audio')
resampled_waveform = waveform.numpy()[-1]
data_item = {'audio': {'array': resampled_waveform,
                       'sampling_rate': sample_rate}}

# extract unit
sound_unit = encodec_24k_6bps.extract_unit(data_item).unit

# sound synthesis
decoded_waveform = encodec_24k_6bps.synth(sound_unit, local_save=False)['audio']['array']

Citation

If you use this code or result in your paper, please cite our work as:

@misc{wu2024codecsuperb,
      title={Codec-SUPERB: An In-Depth Analysis of Sound Codec Models}, 
      author={Haibin Wu and Ho-Lam Chung and Yi-Cheng Lin and Yuan-Kuei Wu and Xuanjun Chen and Yu-Chi Pai and Hsiu-Hsuan Wang and Kai-Wei Chang and Alexander H. Liu and Hung-yi Lee},
      year={2024},
      eprint={2402.13071},
      archivePrefix={arXiv},
      primaryClass={eess.AS}
}

Contribution

Contributions are highly encouraged, whether it's through adding new codec models, expanding the dataset collection, or enhancing the benchmarking framework. Please see CONTRIBUTING.md for more details.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Reference Sound Codec Repositories:

More Repositories

1

awesome-chatgpt-dataset

Unlock the Power of LLM: Explore These Datasets to Train Your Own ChatGPT!
Python
688
star
2

TextRL

Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL)
Python
537
star
3

tw_stocker

keep tracking and store taiwan stock information
Python
100
star
4

TFkit

πŸ€–πŸ“‡ handling multiple nlp task in one pipeline
Python
56
star
5

SpeechMix

Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together
Python
41
star
6

vall-e-encodec

Python
41
star
7

BertGenerate

Fine tuning bert for text generation
Jupyter Notebook
38
star
8

asr-trainer

one script for xls-r/xlsr/whisper fine-tuning
Python
37
star
9

aidev

Revolutionize your development workflow with AI-powered code assistance, automating mock tests, suggestions, and unit test generation in a single Python CLI tool.
Python
35
star
10

NLPrep

🍳 NLPrep - dataset tool for many natural language processing task
Python
28
star
11

BDG

Code for "A BERT-based Distractor Generation Scheme with Multi-tasking and Negative Answer Training Strategies."
Python
27
star
12

Phraseg

Phraseg - δΈ€θ¨€οΌšζ–°θ©žη™ΌηΎε·₯ε…·εŒ…
Jupyter Notebook
26
star
13

wav2vec2-xlsr-multilingual-56

56 language, 1 model Multilingual ASR
Python
23
star
14

FTA

Technical Analysis on Cryptocurrency
Python
23
star
15

ChineseErrorDataset

CGED & CSC
22
star
16

asrp

ASR text preprocessing utility
Python
20
star
17

nlp2go

πŸƒ hosting nlp models in one line
CSS
20
star
18

ipa2

Tools for convert Text to IPA in python
Python
16
star
19

nlp2

βš™οΈTool for NLP - handle file and text
Python
15
star
20

awesome-question-answering-dataset

A list of awesome machine question answering dataset - ζ©Ÿε™¨ε•η­”ζ•Έζ“šι›†
15
star
21

pretrain_bart

training BART from scratch
Python
12
star
22

SnapShare

Linking Your Phone To Computer Browser With Socket.io.
JavaScript
10
star
23

causal-lm-trainer

Python
8
star
24

wav2vec-u-exp

Build and Run Wav2vec Unsupervised Experiment
Dockerfile
8
star
25

whisper-live-asr-demo

run whisper on CPU/GPU server
JavaScript
8
star
26

gpu-info-api

πŸ±β€πŸ’» GPU Info API is an API that provides detailed information about Nvidia, AMD, and Intel GPUs. The information is extracted from Wikipedia and stored in JSON format.
Python
8
star
27

t5lephone

phoneme byt5
Python
7
star
28

MMLM

Toward Multi Modality Language Model - implementation of GPT-4o/Project Astra
Python
7
star
29

llm-estimator

Effortlessly predict training time, loss, and cost for LLM model training
JavaScript
6
star
30

WikiExtractor

Extract Knowledge from wiki dump file
Python
6
star
31

react-media-viewer

Ready to go Media Player Component for React.
JavaScript
6
star
32

dtokenizer

discretize everything into tokens
Python
6
star
33

hubert-cluster-code

Extract clustering feature from hubert
5
star
34

pytorch-tta

Pytorch implementation of "Fast and Accurate Deep Bidirectional Language Representations for Unsupervised Learning".
Python
5
star
35

GSQA

Generative Spoken Question Answering
Python
4
star
36

taiwan-company-network

ε°η£ε…¬εΈζŠ•θ³‡ι—œδΏ‚εœ–
CSS
4
star
37

DevLEGO

Create your development Env like LEGO blocks, run your projects on any device - be it a PC, Web, Phone or Tablet!
Shell
4
star
38

awesome-evaluation-lm

Collection Of Automated Language Model Assessment
3
star
39

fastpages

Jupyter Notebook
3
star
40

modelhub

3
star
41

Gossiping-Chinese-Positive-Corpus

PTT ε…«ε¦η‰ˆε•η­”-正青-δΈ­ζ–‡θͺžζ–™
3
star
42

survey-builder

survey builder for human evaluation
JavaScript
3
star
43

voidful

Python
3
star
44

audio-preprocessing-pipeline

Python
3
star
45

DG-Showcase

Showcase for "A BERT-based Distractor Generation Scheme with Multi-tasking and Negative Answer Training Strategies."
CSS
3
star
46

hubert-pretrain

using huggingface trainer to pre-train hubert
Python
2
star
47

dpr-multilingual

A multilingual version of DPR
2
star
48

telenotify

Python
2
star
49

tts-corpus-creator

collection of different source of TTS api for generating corpus.
Python
2
star
50

diff-aspect-set-dg

Python
2
star
51

depack

Extract files from any type of archive in command line
Python
2
star
52

Data2QA

Unified QA with different modality input
Python
2
star
53

bindtorchaudio

`bindtorchaudio` is a Python package that allows for easy installation of the `torchaudio` library, which provides audio processing functionalities for the PyTorch machine learning framework.
Python
2
star
54

seq2seq-lm-trainer

This is a simple example of using the T5 model for sequence-to-sequence tasks, leveraging Hugging Face's `Trainer` for efficient model training.
Python
2
star
55

PPA

Prompt Pool Agent
Python
2
star
56

bforce

bruteforce is all you need in a unstable system
Python
1
star
57

twcc-usage-slack-bot

TWCC GPU Usage Notification Slack Bot
Python
1
star
58

shows

lib for system monitoring with CPU/GPU/DISK/MEM/NET
Python
1
star
59

get-stat

lib for system monitoring in Python / Web API (CPU/GPU/DISK/MEM/NET/SERVICE)
1
star
60

NLPrep-Datasets

HTML
1
star
61

pearl

PEARL - Optimize Prompt Selection for Enhanced Answer Performance Using Reinforcement Learning
Python
1
star
62

uni-superb

Python
1
star
63

huggingface_notebook

Jupyter Notebook
1
star
64

superb-website

JavaScript
1
star
65

leverage-lm

small lm + RAG > LLM
1
star
66

fbcrawler

Python
1
star
67

SoundON

Python
1
star