• Stars
    star
    1,088
  • Rank 42,566 (Top 0.9 %)
  • Language
    Python
  • License
    MIT License
  • Created about 1 year ago
  • Updated 6 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Versatile audio super resolution (any -> 48kHz) with AudioSR.

AudioSR: Versatile Audio Super-resolution at Scale

arXiv githubio Replicate

Pass your audio in, AudioSR will make it high fidelity!

Work on all types of audio (e.g., music, speech, dog, raining, ...) & all sampling rates.

Share your thoughts/samples/issues in our discord channel: https://discord.gg/HWeBsJryaf

Image Description

Change Log

  • 2023-09-24: Add replicate demo (@nateraw); Fix error on windows, librosa warning etc (@ORI-Muchim).
  • 2023-09-16: Fix DC shift issue. Fix duration padding bug. Update default DDIM steps to 50.

Commandline Usage

Installation

# Optional
conda create -n audiosr python=3.9; conda activate audiosr
# Install AudioLDM
pip3 install audiosr==0.0.6

Usage

Process a list of files. The result will be saved at ./output by default.

audiosr -il batch.lst

Process a single audio file.

audiosr -i example/music.wav

Full usage instruction

> audiosr -h

> usage: audiosr [-h] -i INPUT_AUDIO_FILE [-il INPUT_FILE_LIST] [-s SAVE_PATH] [--model_name {basic,speech}] [-d DEVICE] [--ddim_steps DDIM_STEPS] [-gs GUIDANCE_SCALE] [--seed SEED]

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT_AUDIO_FILE, --input_audio_file INPUT_AUDIO_FILE
                        Input audio file for audio super resolution
  -il INPUT_FILE_LIST, --input_file_list INPUT_FILE_LIST
                        A file that contains all audio files that need to perform audio super resolution
  -s SAVE_PATH, --save_path SAVE_PATH
                        The path to save model output
  --model_name {basic,speech}
                        The checkpoint you gonna use
  -d DEVICE, --device DEVICE
                        The device for computation. If not specified, the script will automatically choose the device based on your environment.
  --ddim_steps DDIM_STEPS
                        The sampling step for DDIM
  -gs GUIDANCE_SCALE, --guidance_scale GUIDANCE_SCALE
                        Guidance scale (Large => better quality and relavancy to text; Small => better diversity)
  --seed SEED           Change this value (any integer number) will lead to a different generation result.
  --suffix SUFFIX       Suffix for the output file

TODO

"Buy Me A Coffee"

  • Add gradio demo.
  • Optimize the inference speed.

Cite our work

If you find this repo useful, please consider citing:

@article{liu2023audiosr,
  title={{AudioSR}: Versatile Audio Super-resolution at Scale},
  author={Liu, Haohe and Chen, Ke and Tian, Qiao and Wang, Wenwu and Plumbley, Mark D},
  journal={arXiv preprint arXiv:2309.07314},
  year={2023}
}

More Repositories

1

AudioLDM

AudioLDM: Generate speech, sound effects, music and beyond, with text.
Python
2,389
star
2

AudioLDM2

Text-to-Audio/Music Generation
Python
2,235
star
3

voicefixer

General Speech Restoration
Python
1,014
star
4

audioldm_eval

This toolbox aims to unify audio generation model evaluation for easier comparison.
Python
287
star
5

voicefixer_main

General Speech Restoration
Python
274
star
6

AudioLDM-training-finetuning

AudioLDM training, finetuning, evaluation and inference.
Python
191
star
7

ssr_eval

Evaluation and Benchmarking of Speech Super-resolution Methods
Python
133
star
8

SemantiCodec-inference

Ultra-low bitrate neural audio codec (0.31~1.40 kbps) with a better semantic in the latent space.
Python
117
star
9

2021-ISMIR-MSS-Challenge-CWS-PResUNet

Music Source Separation; Train & Eval & Inference piplines and pretrained models we used for 2021 ISMIR MDX Challenge.
Python
113
star
10

Subband-Music-Separation

Pytorch: Channel-wise subband (CWS) input for better voice and accompaniment separation
Python
93
star
11

torchsubband

Pytorch implementation of subband decomposition
HTML
88
star
12

SemantiCodec

HTML
37
star
13

diffres-python

Learning differentiable temporal resolution on time-series data.
Python
33
star
14

DCASE_2022_Task_5

System that ranks 2nd in DCASE 2022 Challenge Task 5: Few-shot Bioacoustic Event Detection
Python
27
star
15

ontology-aware-audio-tagging

Python
13
star
16

courseProject_Compiler

java implementation of NWPU Compiler course project-西工大编译原理-试点班
Java
13
star
17

Key-word-spotting-DNN-GRU-DSCNN

key word spotting GRU/DNN/DSCNN
Python
8
star
18

DM_courseProject

KNN Bayes 西北工业大学 NWPU 数据挖掘与分析
Python
6
star
19

netease_downloader

网易云音乐上以歌单为单位进行下载
Python
3
star
20

Channel-wise-Subband-Input

The demos of paper: Channel-wise Subband Input for Better Voice and Accompaniment Separation on High Resolution Music
Jupyter Notebook
2
star
21

haoheliu.github.io

SCSS
1
star
22

demopage-NVSR

HTML
1
star
23

deepDecagon

Python
1
star
24

visa-monitor

实时监控可预约签证的时间,有更早的就邮件通知
Python
1
star
25

colab_collection

Jupyter Notebook
1
star
26

SatProj

西北工业大学应用综合实验
Python
1
star
27

demopage-voicefixer

Voicefixer is a speech restoration model that handles noise, reverberation, low resolution (2kHz~44.1kHz), and clipping (0.1-1.0 threshold) distortion simultaneously.
HTML
1
star
28

mushra_test_2024_April

1
star