• Stars
    star
    1,283
  • Rank 36,676 (Top 0.8 %)
  • Language
    Python
  • License
    Other
  • Created over 3 years ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Official implementation of the RAVE model: a Realtime Audio Variational autoEncoder

rave_logo

RAVE: Realtime Audio Variational autoEncoder

Official implementation of RAVE: A variational autoencoder for fast and high-quality neural audio synthesis (article link) by Antoine Caillon and Philippe Esling.

If you use RAVE as a part of a music performance or installation, be sure to cite either this repository or the article !

If you want to share / discuss / ask things about RAVE you can do so in our discord server !

Previous versions

The original implementation of the RAVE model can be restored using

git checkout v1

Installation

Install RAVE using

pip install acids-rave

You will need ffmpeg on your computer. You can install it locally inside your virtual environment using

conda install ffmpeg

Colab

A colab to train RAVEv2 is now available thanks to hexorcismos ! colab_badge

Usage

Training a RAVE model usually involves 3 separate steps, namely dataset preparation, training and export.

Dataset preparation

You can know prepare a dataset using two methods: regular and lazy. Lazy preprocessing allows RAVE to be trained directly on the raw files (i.e. mp3, ogg), without converting them first. Warning: lazy dataset loading will increase your CPU load by a large margin during training, especially on Windows. This can however be useful when training on large audio corpus which would not fit on a hard drive when uncompressed. In any case, prepare your dataset using

rave preprocess --input_path /audio/folder --output_path /dataset/path (--lazy)

Training

RAVEv2 has many different configurations. The improved version of the v1 is called v2, and can therefore be trained with

rave train --config v2 --db_path /dataset/path --name give_a_name

We also provide a discrete configuration, similar to SoundStream or EnCodec

rave train --config discrete ...

By default, RAVE is built with non-causal convolutions. If you want to make the model causal (hence lowering the overall latency of the model), you can use the causal mode

rave train --config discrete --config causal ...

Many other configuration files are available in rave/configs and can be combined. Here is a list of all the available configurations

Type Name Description
Architecture v1 Original continuous model
v2 Improved continuous model (faster, higher quality)
v3 v2 with Snake activation, descript discriminator and Adaptive Instance Normalization for real style transfer
discrete Discrete model (similar to SoundStream or EnCodec)
onnx Noiseless v1 configuration for onnx usage
raspberry Lightweight configuration compatible with realtime RaspberryPi 4 inference
Regularization (v2 only) default Variational Auto Encoder objective (ELBO)
wasserstein Wasserstein Auto Encoder objective (MMD)
spherical Spherical Auto Encoder objective
Discriminator spectral_discriminator Use the MultiScale discriminator from EnCodec.
Others causal Use causal convolutions
noise Enable noise synthesizer V2

Export

Once trained, export your model to a torchscript file using

rave export --run /path/to/your/run (--streaming)

Setting the --streaming flag will enable cached convolutions, making the model compatible with realtime processing. If you forget to use the streaming mode and try to load the model in Max, you will hear clicking artifacts.

Realtime usage

This section presents how RAVE can be loaded inside nn~ in order to be used live with Max/MSP or PureData.

Reconstruction

A pretrained RAVE model named darbouka.gin available on your computer can be loaded inside nn~ using the following syntax, where the default method is set to forward (i.e. encode then decode)

This does the same thing as the following patch, but slightly faster.

High-level manipulation

Having an explicit access to the latent representation yielded by RAVE allows us to interact with the representation using Max/MSP or PureData signal processing tools:

Style transfer

By default, RAVE can be used as a style transfer tool, based on the large compression ratio of the model. We recently added a technique inspired from StyleGAN to include Adaptive Instance Normalization to the reconstruction process, effectively allowing to define source and target styles directly inside Max/MSP or PureData, using the attribute system of nn~.

Other attributes, such as enable or gpu can enable/disable computation, or use the gpu to speed up things (still experimental).

Pretrained models

Several pretrained streaming models are available here. We'll keep the list updated with new models.

Where is the prior ?

Here !

Discussion

If you have questions, want to share your experience with RAVE or share musical pieces done with the model, you can use the Discussion tab !

Demonstration

RAVE x nn~

Demonstration of what you can do with RAVE and the nn~ external for maxmsp !

RAVE x nn~

embedded RAVE

Using nn~ for puredata, RAVE can be used in realtime on embedded platforms !

RAVE x nn~

Funding

This work is led at IRCAM, and has been funded by the following projects

More Repositories

1

diffusion_models

A series of tutorial notebooks on denoising diffusion probabilistic models in PyTorch
Jupyter Notebook
607
star
2

ddsp_pytorch

Implementation of Differentiable Digital Signal Processing (DDSP) in Pytorch
C
445
star
3

nn_tilde

Max
310
star
4

creative_ml

Creative Machine Learning course and notebook tutorials in JAX, PyTorch and Numpy
Jupyter Notebook
206
star
5

rave_vst

C++
186
star
6

pytorch_flows

Implementation and tutorials of normalizing flows with the novel distributions module
Jupyter Notebook
158
star
7

flow_synthesizer

Universal audio synthesizer control learning with normalizing flows
Max
132
star
8

neurorack

Python
108
star
9

variational-timbre

Generative timbre spaces by perceptually regularizing variational auto-encoders
Python
56
star
10

vschaos2

vintage neural synthesis with spectral auto-encoders
Python
48
star
11

cached_conv

Python
44
star
12

wavae

Realtime Variational Autoencoder built on top of libtorch and PureData
Python
36
star
13

timbre_exploration

Additional materials for "TIMBRE LATENT SPACE: EXPLORATION AND CREATIVE ASPECTS"
SCSS
20
star
14

lottery_mir

Ultra-light MIR models with a structured lottery ticket hypothesis approach
Python
13
star
15

lottery_generative

Lottery ticket hypothesis for deep generative models
Python
11
star
16

Expressive_WAE_FADER

companion repository to the DAFx-19 paper "Assisted Sound Sample Generation with Musical Conditioning in Adversarial Auto-Encoders" by Adrien Bitton, Philippe Esling et al.
9
star
17

Timbre_MoVE

Modulated Variational Auto-Encoders for Many-to-Many Musical Timbre Transfer
8
star
18

cml

Library for the Creative Machine Learning course
Python
6
star
19

projective_orchestration

Automatic projective orchestration using neural networks.
Python
5
star
20

PianoTranscriptionTransposition

Automatic Music Transcription and Instrument Transposition with Differentiable Rendering @ The 2020 Joint Conference on AI Music Creativity
SCSS
3
star
21

waveflow

Python
3
star
22

acids-ircam.github.io

HTML
3
star
23

live_orchestral_piano

Max/MSP patch for live projective orchestration
2
star