keonlee9420/Soft-DTW-Loss

Stars
113
Rank 299,045 (Top 7 %)
Language
Python
License
MIT License
Created almost 3 years ago
Updated over 2 years ago

keonlee9420/Soft-DTW-Loss

keonlee9420

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

PyTorch implementation of Soft-DTW: a Differentiable Loss Function for Time-Series in CUDA

Soft DTW Loss Function for PyTorch in CUDA

This is a Pytorch Implementation of Soft-DTW: a Differentiable Loss Function for Time-Series which is batch supported computation, CUDA-friendly, and feasible to use as a final loss. I can confirm that you can train a (sequential) model with this as a final loss! The following image shows training logs of a TTS model using the Soft-DTW Loss Function.

There are some previous implementations:

But they are either not supported by CUDA-friendly batch computation or not considering the jacobean w.r.t input matrix, which is necessary to be used as a final loss in recent deep learning frameworks. In the current implementation, all conditions are satisfied.

Usage

Same as Maghoumi's pytorch-softdtw-cuda:

from sdtw_cuda_loss import SoftDTW

# Create the sequences
batch_size, len_x, len_y, dims = 8, 15, 12, 5
x = torch.rand((batch_size, len_x, dims), requires_grad=True)
y = torch.rand((batch_size, len_y, dims))

# Create the "criterion" object
sdtw = SoftDTW(use_cuda=True, gamma=0.1)

# Compute the loss value
loss = sdtw(x, y)  # Just like any torch.nn.xyzLoss()

# Aggregate and call backward()
loss.mean().backward()

But the backward will compute the gradient w.r.t input target sequence x (which is not considered in the previous work).

Note

In the current implementation, only use_cuda=True is supported. But you can easily implement the CPU version as in Maghoumi's pytorch-softdtw-cuda.

Citation

@misc{lee2021soft_dtw_loss,
  author = {Lee, Keon},
  title = {Soft-DTW-Loss},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/keonlee9420/Soft-DTW-Loss}}
}

PortaSpeech

PyTorch Implementation of PortaSpeech: Portable and High-Quality Generative Text-to-Speech

Comprehensive-Transformer-TTS

A Non-Autoregressive Transformer based Text-to-Speech, supporting a family of SOTA transformers with supervised and unsupervised duration modelings. This project grows with the research community, aiming to achieve the ultimate TTS

DiffGAN-TTS

PyTorch Implementation of DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs

Expressive-FastSpeech2

PyTorch Implementation of Non-autoregressive Expressive (emotional, conversational) TTS based on FastSpeech2, supporting English, Korean, and your own languages.

DiffSinger

PyTorch implementation of DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (focused on DiffSpeech)

Parallel-Tacotron2

PyTorch Implementation of Google's Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling

StyleSpeech

PyTorch Implementation of Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation

DailyTalk

Official repository of DailyTalk: Spoken Dialogue Dataset for Conversational Text-to-Speech, ICASSP 2023 (Oral)

Cross-Speaker-Emotion-Transfer

PyTorch Implementation of ByteDance's Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech

STYLER

Official repository of STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech, INTERSPEECH 2021

Comprehensive-E2E-TTS

A Non-Autoregressive End-to-End Text-to-Speech (text-to-wav), supporting a family of SOTA unsupervised duration modelings. This project grows with the research community, aiming to achieve the ultimate E2E-TTS

FastPitchFormant

PyTorch Implementation of NCSOFT's FastPitchFormant: Source-filter based Decomposed Modeling for Speech Synthesis

VAENAR-TTS

PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

WaveGrad2

PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis

Daft-Exprt

PyTorch Implementation of Daft-Exprt: Robust Prosody Transfer Across Speakers for Expressive Speech Synthesis

Comprehensive-Tacotron2

PyTorch Implementation of Google's Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions. This implementation supports both single-, multi-speaker TTS and several techniques to enforce the robustness and efficiency of the model.

Robust_Fine_Grained_Prosody_Control

PyTorch Implementation of Robust and fine-grained prosody control of end-to-end speech synthesis

Stepwise_Monotonic_Multihead_Attention

PyTorch Implementation of Stepwise Monotonic Multihead Attention similar to Enhancing Monotonicity for Robust Autoregressive Transformer TTS

Deep-Learning-TTS-Template

This is a template for the Non-autoregressive Deep Learning-Based TTS model (in PyTorch).

tacotron2_MMI

Another PyTorch implementation of Tacotron2 MMI (with waveglow) which supports n_frames_per_step>1 mode(reduction windows) and diagonal guided attention for robust alignments.

Jupyter Notebook

Fully_Hierarchical_Fine_Grained_TTS

Pytorch Implementation of Fully-hierarchical fine-grained prosody modeling for interpretable speech synthesis (Unofficial)

cs231n

cs231n 2020 Spring assignments implementation

Jupyter Notebook

pintos

KAIST CS330 OS pintos Project