• Stars
    star
    690
  • Rank 65,522 (Top 2 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created about 3 years ago
  • Updated about 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit

WeSpeaker

License Python-Version

Roadmap | Paper | Runtime (x86_gpu) | Python binding | Pretrained Models | Huggingface Demo

WeSpeaker mainly focuses on speaker embedding learning, with application to the speaker verification task. We support online feature extraction or loading pre-extracted features in kaldi-format.

Installation

  • Clone this repo
git clone https://github.com/wenet-e2e/wespeaker.git
  • Create conda env: pytorch version >= 1.10.0 is required !!!
conda create -n wespeaker python=3.9
conda activate wespeaker
conda install pytorch=1.12.1 torchaudio=0.12.1 cudatoolkit=11.3 -c pytorch -c conda-forge
pip install -r requirements.txt
  • If you just want to use the pretrained model, try the python binding!
pip3 install wespeakerruntime

🔥 News

  • 2023.04.27: Support the CAM++ model, with better performance and single-thread inference rtf in comparison with the ResNet34 model, see #153.

  • 2023.02.27: Update onnxruntime (C++), see onnxruntime, #135.

  • 2023.02.15: Update the code for multi-node training. For how to setup multi-node training, please refer to #131.

Recipes

  • VoxCeleb: Speaker Verification recipe on the VoxCeleb dataset
    • 🔥 UPDATE 2022.10.31: We support deep r-vector up to the 293-layer version! Achiving 0.447%/0.043 EER/mindcf on vox1-O-clean test set
    • 🔥 UPDATE 2022.07.19: We apply the same setups as the CNCeleb recipe, and obtain SOTA performance considering the open-source systems
      • EER/minDCF on vox1-O-clean test set are 0.723%/0.069 (ResNet34) and 0.728%/0.099 (ECAPA_TDNN_GLOB_c1024), after LM fine-tuning and AS-Norm
  • CNCeleb: Speaker Verification recipe on the CnCeleb dataset
    • 🔥 UPDATE 2022.10.31: 221-layer ResNet achieves 5.655%/0.330 EER/minDCF
    • 🔥 UPDATE 2022.07.12: We migrate the winner system of CNSRC 2022 report slides
      • EER/minDCF reduction from 8.426%/0.487 to 6.492%/0.354 after large margin fine-tuning and AS-Norm
  • VoxConverse: Diarization recipe on the VoxConverse dataset

Support List:

Discussion

For Chinese users, you can scan the QR code on the left to follow our offical account of WeNet Community. We also created a WeChat group for better discussion and quicker response. Please scan the QR code on the right to join the chat group.

Citations

If you find wespeaker useful, please cite it as

@article{wang2022wespeaker,
  title={Wespeaker: A Research and Production oriented Speaker Embedding Learning Toolkit},
  author={Wang, Hongji and Liang, Chengdong and Wang, Shuai and Chen, Zhengyang and Zhang, Binbin and Xiang, Xu and Deng, Yanlei and Qian, Yanmin},
  journal={arXiv preprint arXiv:2210.17016},
  year={2022}
}

Looking for contributors

If you are interested to contribute, feel free to contact @wsstriving or @robin1001

More Repositories

1

wenet

Production First and Production Ready End-to-End Speech Recognition Toolkit
Python
4,073
star
2

speech-synthesis-paper

List of speech synthesis papers.
989
star
3

WenetSpeech

A 10000+ hours dataset for Chinese speech recognition
Shell
488
star
4

wekws

Production First and Production Ready End-to-End Keyword Spotting Toolkit
Python
444
star
5

WeTextProcessing

Text Normalization & Inverse Text Normalization
Python
443
star
6

wetts

Production First and Production Ready End-to-End Text-to-Speech Toolkit
Python
367
star
7

speech-recognition-papers

Towards hot directions in industrial end to end speech recognition
325
star
8

opencpop

Opencpop: A High-Quality Open Source Chinese Popular Song Database for Singing Voice Synthesis
207
star
9

wenet-kws

Production First and Production Ready End-to-End Keyword Spotting Toolkit
Python
142
star
10

west

We Speech Transcript based on LLM, in 300 lines of code.
Python
109
star
11

wesep

Target Speaker Extraction Toolkit
Python
80
star
12

wesignal

Production first, nn-based on-device signal processing toolkit.
63
star
13

WeTextProcessing.deprecated

C++
61
star
14

wesubtitle

用 OCR 提取视频硬字幕
Python
54
star
15

llm-papers

List of Large Lanugage Model Papers
51
star
16

wecut

video cut powered by AI
25
star
17

WeSpeech-AI

Open Source Speech/Text Data on AI
18
star
18

nn-singal-processing-papers

List of NN based singal processing papers
17
star
19

wenet_in_action_homework

WeNet 实战课程作业
Python
16
star
20

wenet-e2e.github.io

WeNet Community
CSS
1
star
21

wenet-contributors

Contributors of WeNet, including individual and companies.
1
star