• Stars
    star
    119
  • Rank 297,930 (Top 6 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created over 2 years ago
  • Updated over 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

3M: Multi-loss, Multi-path and Multi-level Neural Networks for speech recognition

3M-ASR for End-to-End Speech Recognition

This project is used to build an End-to-End Speech Recognition system based on Mixture-of-Experts(MoE) model. MoE is an efficient way to train a large scale model and we have proved its efficiency on public dataset. More details about the algorithm can be found in "3M: Multi-loss, Multi-path and Multi-level Neural Networks for speech recognition".

Installation

  • Clone this repo
git clone https://github.com/tencent-ailab/3m-asr.git
conda create -n moe python=3.8
conda activate moe
pip install -r requirements.txt
conda install pytorch==1.9.0 torchvision==0.10.0 torchaudio==0.9.0 cudatoolkit=11.1 -c pytorch -c conda-forge
  • Follow the instruction under directory fastmoe to install fastmoe

Performance Benchmark

We evaluate our system on the public WenetSpeech dataset and the recipe of Conformer-MoE is provided(trained on 24 V100). CER results are listed below and the first three lines are provided by WenetSpeech

Toolkit Dev Test_net Test_Meeting AIShell-1
Kaldi 9.07 12.83 24.72 5.41
Espnet 9.70 8.90 15.90 3.90
WeNet 8.88 9.70 15.59 4.61
Conformer-MoE(32e) 7.49 7.99 13.69 4.03

Acknowledge

  • We used FastMoE to support Mixture-of-Experts model training in Pytorch
  • We borrowed a lot of code from WeNet for the implementation of Conformer and data processing

Reference

[1] SpeechMoE: Scaling to Large Acoustic Models with Dynamic Routing Mixture of Experts(InterSpeech 2021)

[2] 3M: Multi-loss, Multi-path and Multi-level Neural Networks for speech recognition(Submitted to InterSpeech 2022)

Citation

@inproceedings{you21_interspeech,
  author={Zhao You and Shulin Feng and Dan Su and Dong Yu},
  title={{SpeechMoE: Scaling to Large Acoustic Models with Dynamic Routing Mixture of Experts}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={2077--2081},
  doi={10.21437/Interspeech.2021-478}
}

@article{you20223m,
  title={3M: Multi-loss, Multi-path and Multi-level Neural Networks for speech recognition},
  author={You, Zhao and Feng, Shulin and Su, Dan and Yu, Dong},
  journal={arXiv preprint arXiv:2204.03178},
  year={2022}
}

Contact

If you have any questions about this project, please feel free to contact [email protected] or [email protected]

Disclaimer

This is not an officially supported Tencent product

More Repositories

1

IP-Adapter

The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
Jupyter Notebook
5,177
star
2

V-Express

V-Express aims to generate a talking head video under the control of a reference image, an audio, and a sequence of V-Kps images.
Python
2,182
star
3

persona-hub

Official repo for the paper "Scaling Synthetic Data Creation with 1,000,000,000 Personas"
Python
768
star
4

hifi3dface

Code and data for our paper "High-Fidelity 3D Digital Human Creation from RGB-D Selfies".
Python
758
star
5

hok_env

Honor of Kings AI Open Environment of Tencent
Python
616
star
6

pika

a lightweight speech processing toolkit based on Pytorch and (Py)Kaldi
Python
338
star
7

grover

This is a Pytorch implementation of the paper: Self-Supervised Graph Transformer on Large-Scale Molecular Data
Python
325
star
8

bddm

BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis
Python
217
star
9

FRA-RIR

Python
169
star
10

PCDMs

Implementation code:Advancing Pose-Guided Image Synthesis with Progressive Conditional Diffusion Models
Jupyter Notebook
150
star
11

DrugOOD

OOD Dataset Curator and Benchmark for AI-aided Drug Discovery
Python
149
star
12

Frequency_Aug_VAE_MoESR

Latent-based SR using MoE and frequency augmented VAE decoder
Python
145
star
13

tleague_projpage

Jinja
135
star
14

TLeague

Python
79
star
15

RLogist

RLogist = RL (reinforcement learning) + Pathologist
Python
65
star
16

CogKernel

Python
44
star
17

MDM

MDM
Python
43
star
18

UltraDualPathCompression

A Pytorch-based implementation of the compression and decompression module in "Ultra Dual-Path Compression For Joint Echo Cancellation And Noise Suppression".
Jupyter Notebook
36
star
19

Lodoss

Python
34
star
20

mini-hok

Mini HoK: a novel MARL benchmark based on the popular mobile game, Honor of Kings, to address limitations in existing environments such as complexity and accessibility.
Python
29
star
21

TriNet

TriNet: stabilizing self-supervised learning from complete or slow collapse on ASR.
Python
26
star
22

ICML21_OAXE

Python
25
star
23

season

[EMNLP 2022] Salience Allocation as Guidance for Abstractive Summarization
Python
22
star
24

hokoff

Python
21
star
25

Leopard

The repository for the paper titled "Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks"
18
star
26

hifi3dface_projpage

Project page for our paper "High-Fidelity 3D Digital Human Creation from RGB-D Selfies".
HTML
16
star
27

GrndPodcastSum

(ACL 2022) The source code for the paper "Towards Abstractive Grounded Summarization of Podcast Transcripts"
Python
15
star
28

OASum

13
star
29

EMNLP21_SemEq

This repo is the code release of EMNLP 2021 conference paper "Connect-the-Dots: Bridging Semantics between Words and Definitions via Aligning Word Sense Inventories".
Python
12
star
30

learning_singing_from_speech

Project page for our paper "DurIAN : DurIAN-SC: Duration Informed Attention Network based Singing Voice Conversion System".
10
star
31

valuationgame

Jupyter Notebook
9
star
32

Arena

Python
8
star
33

MetaLogic

Python
8
star
34

ZED

This is the repository for EMNLP 2022 paper "Efficient Zero-shot Event Extraction with Context-Definition Alignment"
Python
8
star
35

machine-translation

Open source on machine translation
7
star
36

TPolicies

Python
6
star
37

zebra-inference

Python
5
star
38

Interformer

Jupyter Notebook
5
star
39

FOLNet

This repository includes the code for First-Order Logic Network (FOLNet).
Python
4
star
40

TLeagueAutoBuild

Python
4
star
41

TImitate

Python
2
star
42

siam

2
star