• Stars
    star
    701
  • Rank 61,906 (Top 2 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created over 3 years ago
  • Updated 11 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

PyTorch implementation of "Conformer: Convolution-augmented Transformer for Speech Recognition" (INTERSPEECH 2020)

PyTorch implementation of Conformer: Convolution-augmented Transformer for Speech Recognition.


Transformer models are good at capturing content-based global interactions, while CNNs exploit local features effectively. Conformer combine convolution neural networks and transformers to model both local and global dependencies of an audio sequence in a parameter-efficient way. Conformer significantly outperforms the previous Transformer and CNN based models achieving state-of-the-art accuracies.

This repository contains only model code, but you can train with conformer at openspeech

Installation

This project recommends Python 3.7 or higher. We recommend creating a new virtual environment for this project (using virtual env or conda).

Prerequisites

  • Numpy: pip install numpy (Refer here for problem installing Numpy).
  • Pytorch: Refer to PyTorch website to install the version w.r.t. your environment.

Install from source

Currently we only support installation from source code using setuptools. Checkout the source code and run the following commands:

pip install -e .

Usage

import torch
import torch.nn as nn
from conformer import Conformer

batch_size, sequence_length, dim = 3, 12345, 80

cuda = torch.cuda.is_available()  
device = torch.device('cuda' if cuda else 'cpu')

criterion = nn.CTCLoss().to(device)

inputs = torch.rand(batch_size, sequence_length, dim).to(device)
input_lengths = torch.LongTensor([12345, 12300, 12000])
targets = torch.LongTensor([[1, 3, 3, 3, 3, 3, 4, 5, 6, 2],
                            [1, 3, 3, 3, 3, 3, 4, 5, 2, 0],
                            [1, 3, 3, 3, 3, 3, 4, 2, 0, 0]]).to(device)
target_lengths = torch.LongTensor([9, 8, 7])

model = Conformer(num_classes=10, 
                  input_dim=dim, 
                  encoder_dim=32, 
                  num_encoder_layers=3).to(device)

# Forward propagate
outputs, output_lengths = model(inputs, input_lengths)

# Calculate CTC Loss
loss = criterion(outputs.transpose(0, 1), targets, output_lengths, target_lengths)

Troubleshoots and Contributing

If you have any questions, bug reports, and feature requests, please open an issue on github or
contacts [email protected] please.

I appreciate any kind of feedback or contribution. Feel free to proceed with small issues like bug fixes, documentation improvement. For major contributions and new features, please discuss with the collaborators in corresponding issues.

Code Style

I follow PEP-8 for code style. Especially the style of docstrings is important to generate documentation.

Reference

Author

More Repositories

1

kospeech

Open-Source Toolkit for End-to-End Korean Automatic Speech Recognition leveraging PyTorch and Hydra.
Python
574
star
2

attentions

PyTorch implementation of some attentions for Deep Learning Researchers.
Python
396
star
3

k-startups

List of tech startups in South Korea. (Republic of Korea)
206
star
4

Korean-PLM

List of Korean pre-trained language models.
157
star
5

ksponspeech

Pre-processing KsponSpeech corpus (Korean Speech dataset) provided by AI Hub.
Python
76
star
6

pytorch-lr-scheduler

PyTorch implementation of some learning rate schedulers for deep learning researcher.
Python
67
star
7

Speech-Recognition-Tutorial

한국어 음성인식 튜토리얼
60
star
8

nlp-tasks

Natural Language Processing Tasks and Examples.
Python
59
star
9

speech-transformer

Transformer implementation speciaized in speech recognition tasks using Pytorch.
Python
56
star
10

RNN-Transducer

PyTorch implementation of RNN-Transducer(RNN-T).
Python
51
star
11

lightning-asr

Modular and extensible speech recognition library leveraging pytorch-lightning and hydra.
Python
42
star
12

End-to-End-Speech-Recognition-Models

PyTorch implementation of automatic speech recognition models.
Python
41
star
13

transformer

A PyTorch Implementation of "Attention Is All You Need"
Python
37
star
14

luna-transformer

A PyTorch Implementation of the Luna: Linear Unified Nested Attention
Python
35
star
15

jasper

PyTorch implementation of "Jasper: An End-to-End Convolutional Neural Acoustic Model" (INTERSPEECH 2019)
Python
29
star
16

Naver-AI-Hackathon-Speech

2019 Clova AI Hackathon : Speech - Rank 12 / Team Kai.Lib
Python
25
star
17

deepspeech2

PyTorch implementation of "Deep Speech 2: End-to-End Speech Recognition in English and Mandarin" (ICML, 2016)
Python
19
star
18

seq2seq

PyTorch implementation of the RNN-based sequence-to-sequence architecture.
Python
19
star
19

tacotron2

Pytorch implementation of "Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions", ICASSP, 2018.
Python
17
star
20

speech-paper-review

Review of papers I read
15
star
21

speech-recognition-papers

Awesome Automatic Speech Recognition (ASR) paper collection
15
star
22

Fairseq-Listen-Attend-Spell

A Fairseq implementation of Listen, Attend and Spell (LAS), an End-to-End ASR framework.
Python
14
star
23

char-rnnlm

Character-level Recurrent Neural Network Language Model (rnnlm) implement in Pytorch.
Python
11
star
24

accelerate-asr

Modular and extensible speech recognition library leveraging accelerate and hydra.
Python
10
star
25

sooftware

10
star
26

sooftware.io

My personal blog powered by react (gatsby)
TypeScript
9
star
27

Speech-Note

🎧 Speech study records repository
C
7
star
28

Audio-Signal-Processing

Audio Signal Preocessing: pcm2wav, wav2pcm, feature extraction, augment, delete silence etc
Python
7
star
29

TIL

Today I Learned
Python
6
star
30

generate-sec-dataset

Generate space error correction dataset
Python
6
star
31

sooftware.github.io

SCSS
6
star
32

KoSpeech-Flask

KoSpeech Flask Web Application
Python
3
star