diffsptk
diffsptk is a differentiable version of SPTK based on the PyTorch framework.
Requirements
- Python 3.8+
- PyTorch 1.10.0+
Documentation
See this page for a reference manual.
Installation
The latest stable release can be installed through PyPI by running
pip install diffsptk
The development release can be installed from the master branch:
pip install git+https://github.com/sp-nitech/diffsptk.git@master
Examples
Mel-cepstral analysis and synthesis
import diffsptk
# Set analysis condition.
fl = 400
fp = 80
n_fft = 512
M = 24
# Read waveform.
x, sr = diffsptk.read("assets/data.wav")
# Compute STFT amplitude of x.
stft = diffsptk.STFT(frame_length=fl, frame_period=fp, fft_length=n_fft)
X = stft(x)
# Estimate mel-cepstrum of x.
alpha = diffsptk.get_alpha(sr)
mcep = diffsptk.MelCepstralAnalysis(cep_order=M, fft_length=n_fft, alpha=alpha, n_iter=10)
mc = mcep(X)
# Reconstruct x.
mlsa = diffsptk.MLSA(filter_order=M, frame_period=fp, alpha=alpha, taylor_order=30)
x_hat = mlsa(mlsa(x, -mc), mc)
# Write reconstructed waveform.
diffsptk.write("reconst.wav", x_hat, sr)
# Compute error.
error = (x_hat - x).abs().sum()
print(error)
# Extract pitch of x.
pitch = diffsptk.Pitch(frame_period=fp, sample_rate=sr, f_min=80, f_max=180)
p = pitch(x)
# Generate excitation signal.
excite = diffsptk.ExcitationGeneration(frame_period=fp)
e = excite(p)
n = diffsptk.nrand(x.size(0) - 1)
# Synthesize waveform.
x_voiced = mlsa(e, mc)
x_unvoiced = mlsa(n, mc)
# Output analysis-synthesis result.
diffsptk.write("voiced.wav", x_voiced, sr)
diffsptk.write("unvoiced.wav", x_unvoiced, sr)
Mel-spectrogram and MFCC extraction
import diffsptk
# Set analysis condition.
fl = 400
fp = 80
n_fft = 512
n_channel = 80
M = 12
# Read waveform.
x, sr = diffsptk.read("assets/data.wav")
# Compute STFT amplitude of x.
stft = diffsptk.STFT(frame_length=fl, frame_period=fp, fft_length=n_fft)
X = stft(x)
# Extract mel-spectrogram.
fbank = diffsptk.MelFilterBankAnalysis(
n_channel=n_channel,
fft_length=n_fft,
sample_rate=sr,
)
Y = fbank(X)
print(Y.shape)
# Extract MFCC.
mfcc = diffsptk.MFCC(
mfcc_order=M,
n_channel=n_channel,
fft_length=n_fft,
sample_rate=sr,
)
Y = mfcc(X)
print(Y.shape)
Subband decomposition
import diffsptk
K = 4 # Number of subbands.
M = 40 # Order of filter.
# Read waveform.
x, sr = diffsptk.read("assets/data.wav")
# Decompose x.
pqmf = diffsptk.PQMF(K, M)
decimate = diffsptk.Decimation(K)
y = decimate(pqmf(x), dim=-1)
# Reconstruct x.
interpolate = diffsptk.Interpolation(K)
ipqmf = diffsptk.IPQMF(K, M)
x_hat = ipqmf(interpolate(K * y, dim=-1)).reshape(-1)
# Write reconstructed waveform.
diffsptk.write("reconst.wav", x_hat, sr)
# Compute error.
error = (x_hat - x).abs().sum()
print(error)
Vector quantization
import diffsptk
K = 2 # Codebook size.
M = 4 # Order of vector.
# Prepare input.
x = diffsptk.nrand(M)
# Quantize x.
vq = diffsptk.VectorQuantization(M, K)
x_hat, indices, commitment_loss = vq(x)
# Compute error.
error = (x_hat - x).abs().sum()
print(error)
License
This software is released under the Apache License 2.0.