• Stars
    star
    131
  • Rank 275,867 (Top 6 %)
  • Language
    Python
  • License
    MIT License
  • Created almost 7 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Python library for handling audio datasets.

AUDIOMATE

PyPI Build Status Documentation Status DeepSource DOI

Audiomate is a library for easy access to audio datasets. It provides the datastructures for accessing/loading different datasets in a generic way. This should ease the use of audio datasets for example for machine learning tasks.

import audiomate
from audiomate.corpus import io

# Download a dataset
esc_downloader = io.ESC50Downloader()
esc_downloader.download('/local/path')

# Load and work with the dataset
esc50 = audiomate.Corpus.load('/local/path', reader='esc-50')

# e.g. Read the audio signal and the label of specific sample/utterance
utterance = esc50.utterances['1-100032-A-0']
samples = utterance.read_samples()
label_list = utterance.label_lists[audiomate.corpus.LL_SOUND_CLASS]

for label in label_list:
  print(label.start, label.value)

Furthermore it provides tools for interacting with datasets (validation, splitting, subsets, merge, filter), extracting features, feeding samples for training ML models and more.

Currently supported datasets:

Currently supported formats:

Installation

pip install audiomate

Install the latest development version:

pip install git+https://github.com/ynop/audiomate.git

Dependencies

sox

For parts of the functionality (e.g. audio format conversion) sox is used. In order to use it, you have to install sox.

# macos
brew install sox

# with support for specific formats
brew install sox --with-lame --with-flac --with-libvorbis

# linux
apt-get install sox

# anaconda for macOS/windows/linux:
conda install -c conda-forge sox

Development

Prerequisites

It's recommended to use a virtual environment when developing audiomate. To create one, execute the following command in the project's root directory:

python -m venv .

To install audiomate and all it's dependencies, execute:

pip install -e .

Running the test suite

pip install -e .[dev]
pytest

With PyCharm you might have to change the default test runner. Otherwise, it might only suggest to use nose. To do so, go to File > Settings > Tools > Python Integrated Tools (on the Mac it's PyCharm > Preferences > Settings > Tools > Python Integrated Tools) and change the test runner to py.test.

Benchmarks

In order to check the runtime of specific parts, pytest-benchmark is used. Benchmarks are normal test functions, but call the benchmark fixture for the code under test.

To run benchmarks:

# Run all
pytest bench

# Specific benchmark
pytest bench/corpus/test_merge_corpus.py

To compare between different runs:

pytest-benchmark compare

Editing the Documentation

The documentation is written in reStructuredText and transformed into various output formats with the help of Sphinx.

To generate the documentation, execute:

pip install -e .[dev]
cd docs
make html

The generated files are written to docs/_build/html.

Versions

Versions is handled using bump2version. To bump the version:

bump2version [major,minor,patch,release,num]

In order to directly go to a final relase version (skip .dev/.rc/...):

bump2version [major,minor,patch] --new-version x.x.x

Release

Commands to create a new release on pypi.

rm -rf build
rm -rf dist

python setup.py sdist
python setup.py bdist_wheel
twine upload dist/*

More Repositories

1

deepspeech-german

Scripts for training Mozilla's DeepSpeech using german speech data
Python
41
star
2

py-ctc-decode

CTC Decoder implementation with python only. Also supports language model decoding using KenLM.
Python
35
star
3

togglore

Tool for the timetracker toggle to calculate the difference between tracked time and the time you should have worked in a given range.
Python
12
star
4

NTSpeechRecognition

NTSpeechRecognition is a iOS/macOS framework, written in Objective-c, providing speech recognition functionality. For decoding PocketSphinx is used. (Keyword spotting, JSGF Grammar, NGram)
C
7
star
5

spoteno

Spoken text normalization for asr
Python
4
star
6

XPlane-Plugin-Template

Simple template to start developing a plugin for the XPlane Flight Simulator.
CMake
4
star
7

NTSpeechTools

Objective-C Library that provides tools for working with speech recognition. It provides data structures for pronunciation dictionaries, grammars, searches (Keyword Spotting, NGram, Grammar) and hypotheses. It also can parse and serialise JSGF grammars. Currently provides frameworks for iOS and OSX.
Objective-C
4
star
8

evalmate

Tools for evaluating audio related machine learning tasks.
Python
3
star
9

spych

Scripts/Tools used for working with automatic speech recognition.
Python
3
star
10

Speech-APIs

List of online speech-to-text, text-to-speech, translation APIs.
3
star
11

ios-container-view

Simple example of using container view for creating a custom tab bar.
Swift
1
star
12

pyspeechgrammar

PySpeechGrammar can be used to parse and convert speech grammar formats.
Python
1
star
13

pyphony

pyphony is a library to handle lexica for ASR
Python
1
star
14

dotfiles

configs
Lua
1
star
15

candle

High level utility for training neural networks with pytorch.
Python
1
star