Yuan Gong (@YuanGongND)

Top repositories

1

ast

Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".
Jupyter Notebook
1,077
star
2

ssast

Code for the AAAI 2022 paper "SSAST: Self-Supervised Audio Spectrogram Transformer".
Python
358
star
3

ltu

Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".
Python
340
star
4

whisper-at

Code and Pretrained Models for Interspeech 2023 Paper "Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong Audio Event Taggers"
Python
303
star
5

cav-mae

Code and Pretrained Models for ICLR 2023 Paper "Contrastive Audio-Visual Masked Autoencoder".
Python
217
star
6

gopt

Code for the ICASSP 2022 paper "Transformer-Based Multi-Aspect Multi-Granularity Non-native English Speaker Pronunciation Assessment".
Python
139
star
7

psla

Code for the TASLP paper "PSLA: Improving Audio Tagging With Pretraining, Sampling, Labeling, and Aggregation".
Python
131
star
8

vocalsound

Dataset and baseline code for the VocalSound dataset (ICASSP2022).
Jupyter Notebook
95
star
9

uavm

Code for the IEEE Signal Processing Letters 2022 paper "UAVM: Towards Unifying Audio and Visual Models".
Python
54
star
10

python-compute-eer

Simple Python script to compute equal error rate (EER) for machine learning model evaluation.
Python
37
star
11

ReMASC

ReMASC: Realistic Replay Attack Corpus for Voice Controlled Systems
Python
36
star
12

realtime-adversarial-attack

Code for IJCAI 2019 paper "Real-time Adversarial Attack".
Python
20
star
13

llm_speech_emotion_challenge

Jupyter Notebook
11
star
14

multichannel-antispoof

Code for SPL paper "Detecting Replay Attacks Using Multi-Channel Audio: A Neural Network-Based Method"
Python
5
star
15

efficient-voice-antispoof

Jupyter Notebook
4
star
16

kaldi-abbr

kaldi name convention note
2
star
17

yuangongnd.github.io

HTML
2
star
18

Speech_DB_Engine

Python
1
star