• Stars
    star
    128
  • Rank 279,529 (Top 6 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created almost 2 years ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Convmelspec: Convertible Melspectrograms via 1D Convolutions

Convmelspec: Convertible Melspectrograms via 1D Convolutions

Melspectrogram
Convertible melspectrograms for ONNX and CoreML.

About

For a large class of audio neural network models, a Mel-scaled short-time Fourier transform or Melspectrogram operator is needed. The Melspectrogram operator is not typically implemented in on-device machine learning frameworks such CoreML (and previously ONNX), however, which significantly complicates the cross-platform deployment of audio machine learning models. To mitigate this, here we reuse standardized and interoperable neural network operators to implement a convertible Melspectrogram by implementing the short-time Fourier transform (STFT) via 1D convolutions.

Beyond basic functionality (known to many), however, we offer an ability to trade-off module storage size and inference speed. To do so, we provide three modes of how we compute the discrete Fourier transform (DFT) matrix needed for the STFT: store, input, and on-the-fly. Our store mode precomputed the DFT matrix and stores it directly in your model file (fastest inference, larger model, easy), our input mode assumes the DFT matrix is provided as an input parameter to your model (fast inference speed, small model, hard), and our on-the-fly model dynamically constructs the DFT matrix at inference time (slower inference, small model, easy). Our module also can be used as a pass-through to torchaudio for training and then converted to DFT mode for conversion and is setup to be compatible to the recent native ONNX stft that still requires a custom compilation setup. Further, we also show how to convert the native torchaudio melspectrogram layers via CoreML model intermediate language ops directly.

In total, we implement Melspectrograms in a standardized cross-platform way with minimal impact on model size and reasonble speed. Try it out, let us know how it goes, and submit PRs to fix!

Setup

  • Create new python environment via pip or conda
conda create -n convmelspec python=3.9 -y
conda activate convmelspec
  • Install the source code
# Install editable from source
cd <convmelspec>

# Install as editable (for developers)
pip install -e .

# Alterntatively, install read-only
pip install .

Usage

The easiest way to convert your own PyTorch models to ONNX and CoreML is to use our custom ConvertibleSpectrogram module within your model as opposed to directly using torchaudio. Once you do this, you can then export to ONNX or CoreML with a few lines of code. Internally, will use torchaudio directly or implement the required short-time Fourier transform operations using 1D convolutions, depending on the mode of operation. For CoreML, we further show how you can use CoreML's Model Intermediate Language (MIL) to implement the short-time Fourier transform (again using 1D convs) and not need to use our layer at all.

import torch
import librosa
import numpy as np
from convmelspec.stft import ConvertibleSpectrogram as Spectrogram
import coremltools as ct

# Load an example audio file
x = torch.zeros(1, 16000)

# Create the layer
melspec = Spectrogram(
    sr=sr,
    n_fft=1024,
    hop_size=512,
    n_mel=64,
)

# Switch to eval for inference and conversion
melspec.eval()

Training

For training, we recommend you create and use the layer in torchaudio mode. Once complete, you can change the mode of the layer to one of the other options that convert to ONNX and CoreML.

Convert to ONNX

To convert your model to ONNX, you can use the built-in PyTorch onnx export function.


# Set the export mode (pick one)
melspec.set_mode("DFT", "input")
melspec.set_mode("DFT", "store")
melspec.set_mode("DFT", "on_the_fly")

# Export to ONNX
output_path = '/tmp/melspec.onnx'
torch.onnx.export(melspec, x, output_path)

Convert to ONNX with Opset 17

The ONNX standard and runtime have added support for an STFT operator and related functionality (e.g. pytorch/audio#982). As noted, however, PyTorch itself does not yet support exporting with opset 17, so a custom build of PyTorch is required (this works, but not yet documented here).

Convert to CoreML

To convert your model to CoreML, you can use the coremltools Python package


# Export to CoreML
output_path = '/tmp/melspec.mlmodel'

# To reduce the size of the exported CoreML model (tradeoff with speed)
pipeline = ct.PassPipeline()
pipeline.set_options("common::const_elimination", {"skip_const_by_size": "1e6"})

# Trace the model
traced_model = torch.jit.trace(melspec, x)

# Convert traced model to CoreML
input_tensors = [ct.TensorType(name="input", shape=(x.shape))]
mlmodel = ct.convert(model=traced_model,
                     inputs=input_tensors,
                     compute_units=ct.ComputeUnit.ALL,
                     minimum_deployment_target=None,
                     pass_pipeline=pipeline)

# Save to disk
mlmodel.save(output_path)

Convert to CoreML via MIL

In addition to using our PyTorch layer to convert to CoreML, we also provide an example of how to use native torchaudio melspectrogram together with CoreMLTools model intermediate language (MIL) operators for conversion. To do this, please see the example below and corresponding unit tests.

The MIL implementation is provided as an illustrative example, but should not regularly be used in favor of the native STFT conversion implementation provided in coremltools.

import torchaudio

output_path = '/tmp/melspec-mil.mlmodel'

# Use native torchaudio melspec + CoreMLTools MIL
melspec = torchaudio.transforms.MelSpectrogram(
                sample_rate=16000,
                n_fft=1024,
                hop_length=512,
                power=2.0)

# Trace model
traced_model = torch.jit.trace(melspec, x)

# Convert traced model to CoreML
input_tensors = [ct.TensorType(name="input", shape=(x.shape))]
mlmodel = ct.convert(model=traced_model,
                     inputs=input_tensors,
                     compute_units=ct.ComputeUnit.ALL,
                     minimum_deployment_target=None)

# Save to disk
mlmodel.save(output_path)

Unit test

To run our unit tests and inspect code examples for each mode of operation per platform, please see below.

cd <convmelspec>

python -m unittest discover tests

License and Citation

This code is licensed under an Apache 2.0 license. If you use code from this work for academic publications, pleace cite our repo!:

@misc{convmelspec,
  author = {Nicholas J. Bryan, Oriol Nieto, Juan-Pablo Caceres},
  title = {Convmelspec: Melspectrograms for On-Device Audio Machine Learning},
  year = {2022},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{http://github.com/adobe-research/convmelspec}},
}

Authors

Contributors include Nicholas J. Bryan, Oriol Nieto, and Juan-Pablo Caceres.

More Repositories

1

custom-diffusion

Custom Diffusion: Multi-Concept Customization of Text-to-Image Diffusion (CVPR 2023)
Python
1,835
star
2

theseus

A pretty darn cool JavaScript debugger for Brackets
JavaScript
1,337
star
3

MakeItTalk

Jupyter Notebook
481
star
4

DeepAFx-ST

DeepAFx-ST - Style transfer of audio effects with differentiable signal processing. Please see https://csteinmetz1.github.io/DeepAFx-ST/
Python
352
star
5

spindle

Next-generation web analytics processing with Scala, Spark, and Parquet.
JavaScript
332
star
6

diffusion-rig

Code Release for DiffusionRig (CVPR 2023)
Python
259
star
7

MetaAF

Control adaptive filters with neural networks.
Python
221
star
8

DeepAFx

Third-party audio effects plugins as differentiable layers within deep neural networks.
Jupyter Notebook
185
star
9

ActionScript4

ActionScript 4 specification archive
TeX
181
star
10

sam_inversion

[CVPR 2022] GAN inversion and editing with spatially-adaptive multiple latent layers
Python
169
star
11

affordance-insertion

Python
135
star
12

MagicFixup

Python
125
star
13

VideoDoodles

Python
119
star
14

fondue

JavaScript instrumentation library for collecting traces
JavaScript
110
star
15

libkafka

A C++ client library for Apache Kafka v0.8+. Also includes C API.
C++
90
star
16

domain-expansion

Domain Expansion of Image Generators - CVPR23
Python
86
star
17

deft_corpus

The Definition Extraction From Text corpus and relevant formatting scripts
Python
79
star
18

node-theseus

JavaScript
76
star
19

GCview

GC / memory management visualization and monitoring framework.
JavaScript
73
star
20

vaw_dataset

This repository provides data for the VAW dataset as described in the CVPR 2021 paper titled "Learning to Predict Visual Attributes in the Wild" and the ECCV 2022 paper titled "Improving Closed and Open-Vocabulary Attribute Prediction using Transformers"
Python
61
star
21

svgObjectModelGenerator

SVG OM Generator & Writer
JavaScript
49
star
22

spark-parquet-thrift-example

Example Spark project using Parquet as a columnar store with Thrift objects.
Scala
48
star
23

spark-cluster-deployment

Automates Spark standalone cluster tasks with Puppet and Fabric.
Python
43
star
24

EntitySeg-Dataset

Adobe-EntitySeg dataset
38
star
25

spark-gpu

GPU Acceleration for Apache Spark
Python
34
star
26

layered-depth-refinement

Python
32
star
27

auto-wire-removal

28
star
28

sunstage

Python
28
star
29

deep-acoustic-analysis

Python
26
star
30

mesh

General-purpose programming language featuring functional idioms, strong static inferred types, and a concurrency model built on managed mutability and STM.
26
star
31

AutoToon

Python
25
star
32

VideoSham-dataset

22
star
33

CHART-Synthetic

Synthetic Dataset used in the ICDAR2019 Competition on HArvesting Raw Tables from Infographics (CHART-Infographics)
Python
19
star
34

DiffusionHandles

Diffusion Handles is a training-free method that enables 3D-aware image edits using a pre-trained Diffusion Model.
Python
15
star
35

Cross-lingual-Test-Dataset-XTD10

13
star
36

beacon-aug

Cross-library augmentation toolbox supporting 300 operators over 8 libraries + AI transforms
Jupyter Notebook
12
star
37

audio-retargeting

C
11
star
38

prometheus-opentsdb-exporter

A Prometheus exporter component for OpenTSDB
Scala
10
star
39

cross-preferences

Java Preferences SPI implementations backed by distributed configuration stores (web API included)
Java
8
star
40

aesop

AESOP: Abstract Encoding of Stories, Objects and Pictures
Python
7
star
41

meetingqa

Python
7
star
42

UniHuman

Python
7
star
43

mississippi

Mississippi is a Python package that runs batch jobs in the Amazon Web Services (AWS) environment.
6
star
44

http_streaming_client

Ruby HTTP client with support for HTTP 1.1 streaming, GZIP compressed streams, and chunked transfer encoding. Includes extensible OAuth support for the Adobe Analytics Firehose and Twitter Streaming APIs.
Ruby
6
star
45

DocEdit-Dataset

Release of the DocEdit Dataset associated with the AAAI 2023 paper "DocEdit: Language-guided Document Editing"
5
star
46

longmoment-detr

Python
5
star
47

LexDeMod

3
star
48

hw_with_style

Python
2
star
49

AutoForecast_ResourceUsageData

2
star
50

ASWValData

Jupyter Notebook
1
star