cvqluu/simple_diarizer

Stars
125
Rank 286,335 (Top 6 %)
Language
Python
License
GNU General Publi...
Created over 3 years ago
Updated 7 months ago

cvqluu/simple_diarizer

cvqluu

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Simplified diarization pipeline using some pretrained models - audio file to diarized segments in a few lines of code

simple_diarizer

Simplified diarization pipeline using some pretrained models.

Made to be a simple as possible to go from an input audio file to diarized segments.

import soundfile as sf
import matplotlib.pyplot as plt

from simple_diarizer.diarizer import Diarizer
from simple_diarizer.utils import combined_waveplot

diar = Diarizer(
                  embed_model='xvec', # 'xvec' and 'ecapa' supported
                  cluster_method='sc' # 'ahc' and 'sc' supported
               )

segments = diar.diarize(WAV_FILE, num_speakers=NUM_SPEAKERS)

signal, fs = sf.read(WAV_FILE)
combined_waveplot(signal, fs, segments)
plt.show()

Install

Simplified diarization is available on PyPI:

pip install simple-diarizer

Source Video

"Some Quick Advice from Barack Obama!"

Pre-trained Models

The following pretrained models are used:

Voice Activity Detection (VAD)
- Silero VAD
Deep speaker embedding extraction
- SpeechBrain
  - X-Vector
  - ECAPA-TDNN
(Optional/Experimental) Speech-to-text
- ESPnet Model Zoo
  - English ASR model

Demo

It can be checked out in the above link, where it will try and diarize any input YouTube URL.

Other References

Spectral clustering methods lifted from https://github.com/wq2012/SpectralCluster

Planned Features

Angular-Penalty-Softmax-Losses-Pytorch

Angular penalty loss functions in Pytorch (ArcFace, SphereFace, Additive Margin, CosFace)

TDNN

Time delay neural network (TDNN) implementation in Pytorch using unfold method

Factorized-TDNN

PyTorch implementation of the Factorized TDNN (TDNN-F) from "Semi-Orthogonal Low-Rank Matrix Factorization for Deep Neural Networks" and Kaldi

GE2E-Loss

Pytorch implementation of Generalized End-to-End Loss for speaker verification

nn-similarity-diarization

Neural network based similarity scoring for diarization (pytorch implementation of "LSTM based Similarity Measurement with Spectral Clustering for Speaker Diarization")

MTL-Speaker-Embeddings

Code for the paper: "Leveraging speaker attribute information using multi task learning for speaker verification and diarization" presented at Interspeech 2021

dropclass_speaker

DropClass and DropAdapt - repository for the paper accepted to Speaker Odyssey 2020

splitdim_disentangle

Disentangling speaker embeddings using multi-task learning and adversarial training