• Stars
    star
    185
  • Rank 207,131 (Top 5 %)
  • Language
    Jupyter Notebook
  • License
    Other
  • Created over 3 years ago
  • Updated over 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Third-party audio effects plugins as differentiable layers within deep neural networks.

DeepAFx: Deep Audio Effects

Audio signal processing effects (FX) are used to manipulate sound characteristics across a variety of media. Many FX, however, can be difficult or tedious to use, particularly for novice users. In our work, we aim to simplify how audio FX are used by training a machine to use FX directly and perform automatic audio production tasks. By using familiar and existing tools for processing and suggesting control parameters, we can create a unique paradigm that blends the power of AI with human creative control to empower creators. For a quick demonstration, please see our demo video:

Demo Video

To combine deep learning and audio plugins together, we have developed a new method to incorporate third-party, audio signal processing effects (FX) plugins as layers within deep neural networks. We then use a deep encoder to analyze sounds and learn to control audio FX that themselves performs signal manipulation. To train our network with non-differentiable FX layers, we compute FX layer gradients via a fast, parallel stochastic approximation scheme within a standard automatic differentiation graph, enabling efficient end-to-end backpropagation for deep learning training.

Paper

For technical details of the work, please see:

"Differentiable Signal Processing with Black-Box Audio Effects." Marco A. Martínez Ramírez, Oliver Wang, Paris Smaragdis, and Nicholas J. Bryan. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021.

@inproceedings{martinez2021deepafx,
title={Differentiable Signal Processing with Black-Box Audio Effects},
author={Mart'{i}nez Ram'{i}rez, Marco A. and Wang, Oliver and Smaragdis, Paris and Bryan, Nicholas J.},
booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
month={June},
year={2021},
publisher={IEEE}
}

Developer

DeepAFX is built using Tensorflow 2.2.0 for deep learning and Linux Audio Developer's Simple Plugin API v2 (LV2) audio effects plugins. LV2 is a royalty-free, open standard for audio plugins for both synthesis, processing, and host applications written in the C programming language. To bridge the language gap between the LV2 plugins and Python, we use the lilv LV2 library together our own research python pip package called deepafx. Our package provides a custom TF Keras layer that internally loads LV2 plugins and does gradient approximation during backpropagation.

Docker Setup

Given that our work deeply combines python and Linux binaries (i.e. pre-compiled audio plugins), we provide a Dockerfile to fully reproduce our development environment. Docker is a set of tools and ecosystem to develop software in packages called containers, which act as light-weight virtual machines. You can directly use our dockerfile to run a Linux-based containerized development environment on Windows, MacOS, or Linux.

Our dockerfile builds off of the tensorflow/tensorflow:2.2.0-jupyter docker image, adds our necessary LV2 dependencies and installs our deepafx python pip package for training and inference. In addition to the default JupyterLab IDE, we also install code-server into our development environment, which provides a variant of the popular VS Code IDE for development as well. If you don't like our Dockerfile, you can just use it as a recipe to recreate our development environment elsewhere.

Build & Run Development Environment

Install Docker and docker-compose on your local system. To verify, make sure you can see a Docker icon in your OS toolbar and/or confirm via running docker --version and docker-compose -v. Once confirmed, you can

# Clone this repository
git clone http://github.com/adobe-research/deepafx

# Move into git folder
cd <deepafx>

# Build and run the docker image -> container
docker-compose up --build -d

# Specify a shared directory between your local machine and docker to share data
# Note: You can alternatively update the docker-compose.yml to use a Named Volume by 
# changing $DEEPAFX_DATA/:/home/code-base/scratch_space to $data/:/home/code-base/scratch_space
export DEEPAFX_DATA=<path/for/shared/data>

# Run an existing image
docker-compose up 

# Open your IDE of choice
# For VS Code, open a web browser at http://0.0.0.0:8887 (password is dsp)
# For Jupyter, open a web browser at http://127.0.0.1:8888 (password is dsp)

# Within the IDE, open a terminal and navigate to the code within the container
cd /home/code-base/runtime

# Please change the passwords for any remote development.

For command line SSH access to the container when running locally, open a second terminal, find the running container id, and enter it

docker container ls
docker exec -it <CONTAINER ID> bash

Once you open the web IDE or ssh into the container, everything is installed as needed, and you can start using DeepAFX as discussed below.

Usage

This repository is intended for educational and research purposes (full license below). We overview downloading our pre-trained models and datasets for our tasks as well as training and evaluating the models. Further below, we also provide several examples on developing TF Keras layers with custom gradients.

Dataset Download

In this work, we developed three DeepAFX models: tube amplifier emulation (distortion), automatic non-speech sound removal (nonspeech) and automatic music mastering (mastering). For each of these tasks, we have provide scripts to download the necessary datasets to train each model.

To download all datasets (about 50GB), type:

# Within the docker container + deepafx code folder
cd /home/code-base/runtime/deepafx/deepafx
python download.py all

Alternatively, to download all datasets individually, type:

# Within the docker container + deepafx code folder
cd /home/code-base/runtime/deepafx/deepafx

# Download tube amplifier emulation/distortion dataset
python download.py distortion

# Download the nonspeech dataset
python download.py nonspeech

# Download the mastering dataset
python download.py mastering

# Note the mastering dataset is built on-the-fly and results much change depending on when you run the command.
# The reconstruction is done can also be done individually via:
python download.py mastering --mode download
python download.py mastering --mode align
python download.py mastering --mode resample
python download.py mastering --mode all

Training & Configurations

To train one or more of the models,

# Within the docker container + deepafx code folder
cd /home/code-base/runtime/deepafx/deepafx

# Train the tube amplifier emulation/distortion task
python train_distortion.py

# Train the nonspeech removal tasks
python train_nonspeech.py

# Train the music mastering task
python train_mastering.py

These scripts will train for the specifc task and save the trained models, training history, config files and gradient logs (if enabled). The training scripts will also use the evaluate function from evaluate.py to test the trained model. The function computes the objective metrics and saves the input, target and output audio samples and the parameter automation. The notebook notebook_plots.ipynb plots and saves specific test audio samples and parameter automation curves for a given model.

You can edit the respective training configuration files before training each task, such as selecting the type of encoder, audio plugins, trainable parameters, values of non-trainable parameters, new range of parameters, etc. via editing the following files: config_distortion.py, config_nonspeech.py, and config_mastering.py.

Evaluation

The script evaluate.py allows evaluating a trained model with the test dataset(s). The config file saved from training (numpy dictionary .params.npy) and the weights (.h5) are required for this, along with the task name and output directory.

The script receives the following command line positional arguments: task - string from 'distortion', 'nonspeech' or 'mastering' model_path - absolute filepath to the model .h5 file. params_path - absolute filepath to the params.npy file. output_dir - absolute path to output folder dafx_wise (optional) = integer that indicates which audio plugins from the Fx chain are going to be used. For example, for the mastering task; FxChain = Compressor, EQ, Limiter; If dafx_wise=2 the script will only use the Compressor and the EQ. Useful for testing audio fxchain and progressive training. Default is 0, which means all audio plugins are used.

Examples:

# Within the docker container + deepafx code folder
cd /home/code-base/runtime/deepafx/deepafx

# Evaluate 
# Note: Pretrained models can be found /home/code-base/runtime/deepafx/models
python evaluate.py nonspeech </path/to/model>.h5 </path/to/params>.npy /path/to/output/folder

# Evaluate distortion with only output with the first two LV2 plugins in the DSP chain
cd /home/code-base/runtime/deepafx/deepafx
python evaluate.py distortion </path/to/model>.h5 </path/to/params>.npy /path/to/output/folder --dafx 2

Inference

To run our models on any audio file, you can use type:

# Within the docker container + deepafx code folder
cd /home/code-base/runtime/deepafx/deepafx

# <task> below can be one of: distortion, nonspeech, mastering
python inference.py <task> --input_file <path/to/input>.wav --output_file <path/to/output>.wav

Notes

  • To save the gradient during training, enable the global variable kLogGradient from dafx_layer.py

  • The function set_param from the class Parallel_Batch doesn't work when using an audio fx chain (kFxChain=True) due to threading communication issues. This means that the values for the non-trainable parameters can only be set at the constructor (using values from config_*.py and not after the dafx layer is created.

  • TODO FIX: The function layers.compute_time_shifting() has the argument samples_delay_max which corresponds to the max delay we considered as group delay. For the distortion and nonspeech tasks its value is 100 samples. For the mastering task it yields better results as 300 samples (due to the longer group delay of the Fx chain). This change is not happening automatically.

  • The evaluate function from evaluate.py and evaluate_trained_model.py renders up to 50 seconds from each audio file, but this constant can be changed manually.

  • For the distortion task, the models were trained with an FxChain of Compressor and Limiter (to see whether the Limiter would help to obtain a better matching, it didn't). So when you test these models, add the --dafx 1 command line argument. This command line arg is not needed for the other ICASSP models.

Custom TF Keras Operators and Gradients

At the core of this project is a custom TF keras layer with custom gradients. To develop this, we started with a series of small examples that compared finite different gradients with gradients from automatic differentiation via TF. We then built this up with more complex setups in the examples listed below.

Examples

Once we built this up enough, we moved over to building a custom keras layer that loads an LV2 plugin and approximates the gradient. You can see our custom LV2 TF Keras layer and a very basic toy example below

License

Copyright (c) Adobe Systems Incorporated. All rights reserved.

Licensed under ADOBE RESEARCH LICENSE.

More Repositories

1

custom-diffusion

Custom Diffusion: Multi-Concept Customization of Text-to-Image Diffusion (CVPR 2023)
Python
1,835
star
2

theseus

A pretty darn cool JavaScript debugger for Brackets
JavaScript
1,337
star
3

MakeItTalk

Jupyter Notebook
481
star
4

DeepAFx-ST

DeepAFx-ST - Style transfer of audio effects with differentiable signal processing. Please see https://csteinmetz1.github.io/DeepAFx-ST/
Python
352
star
5

spindle

Next-generation web analytics processing with Scala, Spark, and Parquet.
JavaScript
332
star
6

diffusion-rig

Code Release for DiffusionRig (CVPR 2023)
Python
259
star
7

MetaAF

Control adaptive filters with neural networks.
Python
221
star
8

ActionScript4

ActionScript 4 specification archive
TeX
181
star
9

sam_inversion

[CVPR 2022] GAN inversion and editing with spatially-adaptive multiple latent layers
Python
169
star
10

affordance-insertion

Python
135
star
11

convmelspec

Convmelspec: Convertible Melspectrograms via 1D Convolutions
Python
128
star
12

MagicFixup

Python
125
star
13

VideoDoodles

Python
119
star
14

fondue

JavaScript instrumentation library for collecting traces
JavaScript
110
star
15

libkafka

A C++ client library for Apache Kafka v0.8+. Also includes C API.
C++
90
star
16

domain-expansion

Domain Expansion of Image Generators - CVPR23
Python
86
star
17

deft_corpus

The Definition Extraction From Text corpus and relevant formatting scripts
Python
79
star
18

node-theseus

JavaScript
76
star
19

GCview

GC / memory management visualization and monitoring framework.
JavaScript
73
star
20

vaw_dataset

This repository provides data for the VAW dataset as described in the CVPR 2021 paper titled "Learning to Predict Visual Attributes in the Wild" and the ECCV 2022 paper titled "Improving Closed and Open-Vocabulary Attribute Prediction using Transformers"
Python
61
star
21

svgObjectModelGenerator

SVG OM Generator & Writer
JavaScript
49
star
22

spark-parquet-thrift-example

Example Spark project using Parquet as a columnar store with Thrift objects.
Scala
48
star
23

spark-cluster-deployment

Automates Spark standalone cluster tasks with Puppet and Fabric.
Python
43
star
24

EntitySeg-Dataset

Adobe-EntitySeg dataset
38
star
25

spark-gpu

GPU Acceleration for Apache Spark
Python
34
star
26

layered-depth-refinement

Python
32
star
27

auto-wire-removal

28
star
28

sunstage

Python
28
star
29

deep-acoustic-analysis

Python
26
star
30

mesh

General-purpose programming language featuring functional idioms, strong static inferred types, and a concurrency model built on managed mutability and STM.
26
star
31

AutoToon

Python
25
star
32

VideoSham-dataset

22
star
33

CHART-Synthetic

Synthetic Dataset used in the ICDAR2019 Competition on HArvesting Raw Tables from Infographics (CHART-Infographics)
Python
19
star
34

DiffusionHandles

Diffusion Handles is a training-free method that enables 3D-aware image edits using a pre-trained Diffusion Model.
Python
15
star
35

Cross-lingual-Test-Dataset-XTD10

13
star
36

beacon-aug

Cross-library augmentation toolbox supporting 300 operators over 8 libraries + AI transforms
Jupyter Notebook
12
star
37

audio-retargeting

C
11
star
38

prometheus-opentsdb-exporter

A Prometheus exporter component for OpenTSDB
Scala
10
star
39

cross-preferences

Java Preferences SPI implementations backed by distributed configuration stores (web API included)
Java
8
star
40

aesop

AESOP: Abstract Encoding of Stories, Objects and Pictures
Python
7
star
41

meetingqa

Python
7
star
42

UniHuman

Python
7
star
43

mississippi

Mississippi is a Python package that runs batch jobs in the Amazon Web Services (AWS) environment.
6
star
44

http_streaming_client

Ruby HTTP client with support for HTTP 1.1 streaming, GZIP compressed streams, and chunked transfer encoding. Includes extensible OAuth support for the Adobe Analytics Firehose and Twitter Streaming APIs.
Ruby
6
star
45

DocEdit-Dataset

Release of the DocEdit Dataset associated with the AAAI 2023 paper "DocEdit: Language-guided Document Editing"
5
star
46

longmoment-detr

Python
5
star
47

LexDeMod

3
star
48

hw_with_style

Python
2
star
49

AutoForecast_ResourceUsageData

2
star
50

ASWValData

Jupyter Notebook
1
star