• Stars
    star
    316
  • Rank 132,281 (Top 3 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created over 4 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Sony AI Research Code

This repository contains code related to research papers in the area of Machine Learning and Artificial Intelligence, that have been published by Sony. We believe in transparent and reproducible research and therefore want to offer a quick and easy access to our findings. Hopefully, others will benefit as much from them as we did.

Available Code

Mixed Precision DNNs: All you need is a good parametrization (Code)

Uhlich, Stefan and Mauch, Lukas and Cardinaux, Fabien and Yoshiyama, Kazuki and Garcia, Javier Alonso and Tiedemann, Stephen and Kemp, Thomas and Nakamura, Akira. Published at the 8th International Conference on Learning Representations (ICLR) 2020 arXiv technical report (arXiv 1905.11452)

Efficient deep neural network (DNN) inference on mobile or embedded devices typically involves quantization of the network parameters and activations. In particular, mixed precision networks achieve better performance than networks with homogeneous bitwidth for the same size constraint. Since choosing the optimal bitwidths is not straight forward, training methods, which can learn them, are desirable. Differentiable quantization with straight-through gradients allows to learn the quantizer's parameters using gradient methods. We show that a suited parametrization of the quantizer is the key to achieve a stable training and a good final performance. Specifically, we propose to parametrize the quantizer with the step size and dynamic range. The bitwidth can then be inferred from them. Other parametrizations, which explicitly use the bitwidth, consistently perform worse. We confirm our findings with experiments on CIFAR-10 and ImageNet and we obtain mixed precision DNNs with learned quantization parameters, achieving state-of-the-art performance.

ALL FOR ONE AND ONE FOR ALL:IMPROVING MUSIC SEPARATION BY BRIDGING NETWORKS (Code)

NNabla implementation of CrossNet-Open-Unmix (X-UMX) is an improved version of Open-Unmix (UMX) for music source separation. X-UMX achieves an improved performance without additional learnable parameters compared to the original UMX model. Details of X-UMX can be found in our paper.

Related Projects: x-umx | open-unmix-nnabla | open-unmix-pytorch | musdb | museval | norbert

The Model

As shown in Figure (b), X-UMX has almost the same architecture as the original UMX, but only differs by two additional average operations that link the instrument models together. Since these operations are not DNN layers, the number of learnable parameters of X-UMX is the same as for the original UMX and also the computational complexity is almost the same. Besides the model, there are two more differences compared to the original UMX. In particular, Multi Domain Loss (MDL) and a Combination Loss (CL) are used during training, which are different from the original loss function of UMX. Hence, these three contributions, i.e., (i) Crossing architecture, (ii) MDL and (iii) CL, make the original UMX more effective and successful without additional learnable parameters.

Out-of-core Training for Extremely Large-Scale Neural Networks With Adaptive Window-Based Scheduling (Code)

This is the official implementation of Out-of-core Training for Extremely Large-Scale Neural Networks With Adaptive Window-Based Scheduling.

We provide OoC feature as one of nnabla's utilities. You can enable OoC training on your nnabla script with only a few additional lines. Please see the document for more details!

While large neural networks demonstrate higher performance in various tasks, training large networks is difficult due to limitations on GPU memory size. We propose a novel out-of-core algorithm that enables faster training of extremely large-scale neural networks with sizes larger than allotted GPU memory. Under a given memory budget constraint, our scheduling algorithm locally adapts the timing of memory transfers according to memory usage of each function, which improves overlap between computation and memory transfers. Additionally, we apply virtual addressing technique, commonly performed in OS, to training of neural networks with out-of-core execution, which drastically reduces the amount of memory fragmentation caused by frequent memory transfers. With our proposed algorithm, we successfully train ResNet-50 with 1440 batch-size with keeping training speed at 55%, which is 7.5x larger than the upper bound of physical memory. It also outperforms a previous state-of-the-art substantially, i.e. it trains a 1.55x larger network than state-of-the-art with faster execution. Moreover, we experimentally show that our approach is also scalable for various types of networks.

Data Cleansing for Deep Neural Networks with Storage-efficient Approximation of Influence Functions (Code)

This is the official implementation of Data Cleansing for Deep Neural Networks with Storage-efficient Approximation of Influence Functions

Identifying the influence of training data for data cleansing can improve the accuracy of deep learning. An approach with stochastic gradient descent (SGD) called SGD-influence to calculate the influence scores was proposed, but, the calculation costs are expensive. It is necessary to temporally store the parameters of the model during training phase for inference phase to calculate influence sores. In close connection with the previous method, we propose a method to reduce cache files to store the parameters in training phase for calculating inference score. We only adopt the final parameters in last epoch for influence functions calculation. In our experiments on classification, the cache size of training using MNIST dataset with our approach is 1.236 MB. On the other hand, the previous method used cache size of 1.932 GB in last epoch. It means that cache size has been reduced to 1/1,563. We also observed the accuracy improvement by data cleansing with removal of negatively influential data using our approach as well as the previous method. Moreover, our simple and general proposed method to calculate influence scores is available on our auto ML tool without programing, Neural Network Console. The source code is also available.

D3Net: Densely connected multidilated convolutional networks for dense prediction tasks (Code)

This is the official NNabla implementation of D3Net, densely connected multidilated convolutional networks for dense prediction tasks which is accepted at CVPR 2021.

Takahashi, Naoya, and Yuki Mitsufuji. "Densely connected multidilated convolutional networks for dense prediction tasks." arXiv preprint arXiv:2011.11844 (2021).

Tasks that involve high-resolution dense prediction require a modeling of both local and galobal patterns in a large input field. Although the local and global structures often depend on each other and their simultaneous modeling is important, many convolutional neural network (CNN)- based approaches interchange representations in different resolutions only a few times. In this paper, we claim the importance of a dense simultaneous modeling of multiresolution representation and propose a novel CNN architecture called densely connected multidilated DenseNet (D3Net). D3Net involves a novel multidilated convolution that has different dilation factors in a single layer to model different resolutions simultaneously. By combining the multidilated convolution with the DenseNet architecture, D3Net incorporates multiresolution learning with an exponentially growing receptive field in almost all layers, while avoiding the aliasing problem that occurs when we naively incorporate the dilated convolution in DenseNet. Experiments on the image semantic segmentation task using Cityscapes and the audio source separation task using MUSDB18 show that the proposed method has superior performance over stateof-the-art methods.

NVC-Net: End-to-End Adversarial Voice Conversion (Code)

This is the official NNabla implementation of NVC-Net, an end-to-end adversarial voice conversion approach.

Nguyen, Bac, and Fabien Cardinaux. "NVC-Net: End-to-End Adversarial Voice Conversion." arXiv preprint arXiv:2106.00992 (2021).

Voice conversion has gained increasing popularity in many applications of speech synthesis. The idea is to change the voice identity from one speaker into another while keeping the linguistic content unchanged. Many voice conversion approaches rely on the use of a vocoder to reconstruct the speech from acoustic features, and as a consequence, the speech quality heavily depends on such a vocoder. In this paper, we propose NVC-Net, an end-to-end adversarial network, which performs voice conversion directly on the raw audio waveform of arbitrary length. By disentangling the speaker identity from the speech content, NVC-Net is able to perform non-parallel traditional many-to-many voice conversion as well as zero-shot voice conversion from a short utterance of an unseen target speaker. Importantly, NVC-Net is non-autoregressive and fully convolutional, achieving fast inference. Our model is capable of producing samples at a rate of more than 3600 kHz on an NVIDIA V100 GPU, being orders of magnitude faster than state-of-the-art methods under the same hardware configurations. Objective and subjective evaluations on non-parallel many-to-many voice conversion tasks show that NVC-Net obtains competitive results with significantly fewer parameters.

TVC-GMM: Towards Robust FastSpeech 2 by Modelling Residual Multimodality (Code)

This is the official implementation of models and experiments for the INTERSPEECH 2023 paper "Towards Robust FastSpeech 2 by Modelling Residual Multimodality" (Kรถgel, Nguyen, Cardinaux 2023).

This repository contains a PyTorch implementation of FastSpeech 2 with adapted variance predictors and Trivariate-Chain Gaussian Mixture Modelling (TVC-GMM) proposed in our paper. Additionally it contains scripts to export audio and calculate metrics to recreate the experiments presented in the paper.

State-of-the-art non-autoregressive text-to-speech (TTS) models based on FastSpeech 2 can efficiently synthesise high-fidelity and natural speech. For expressive speech datasets however, we observe characteristic audio distortions. We demonstrate that such artefacts are introduced to the vocoder reconstruction by over-smooth mel-spectrogram predictions, which are induced by the choice of mean-squared-error (MSE) loss for training the mel-spectrogram decoder. With MSE loss FastSpeech 2 is limited to learn conditional averages of the training distribution, which might not lie close to a natural sample if the distribution still appears multimodal after all conditioning signals. To alleviate this problem, we introduce TVC-GMM, a mixture model of Trivariate-Chain Gaussian distributions, to model the residual multimodality. TVC-GMM reduces spectrogram smoothness and improves perceptual audio quality in particular for expressive datasets as shown by both objective and subjective evaluation.

More Repositories

1

sonyflake

A distributed unique ID generator inspired by Twitter's Snowflake
Go
3,484
star
2

nnabla

Neural Network Libraries
Python
2,634
star
3

gobreaker

Circuit Breaker implemented in Go
Go
2,606
star
4

flutter-embedded-linux

Embedded Linux embedding for Flutter
C++
995
star
5

flutter-elinux

Flutter tools for embedded Linux (eLinux)
Dart
411
star
6

v8eval

Multi-language bindings to JavaScript engine V8
C++
399
star
7

model_optimization

Model Compression Toolkit (MCT) is an open source project for neural network model optimization under efficient, constrained hardware. This project provides researchers, developers, and engineers advanced quantization and compression tools for deploying state-of-the-art neural networks.
Python
295
star
8

nnabla-examples

Neural Network Libraries https://nnabla.org/ - Examples
Python
280
star
9

easyhttpcpp

A cross-platform HTTP client library with a focus on usability and speed
C++
152
star
10

sqvae

Pytorch implementation of stochastically quantized variational autoencoder (SQ-VAE)
Python
132
star
11

mapray-js

JavaScript library for Interactive high quality 3D globes and maps in the browser
TypeScript
118
star
12

nmos-cpp

An NMOS (Networked Media Open Specifications) Registry and Node in C++ (IS-04, IS-05)
C++
113
star
13

nnabla-rl

Deep reinforcement learning library built on top of Neural Network Libraries
Python
107
star
14

nnabla-ext-cuda

A CUDA Extension of Neural Network Libraries
Cuda
89
star
15

DiffRoll

PyTorch implementation of DiffRoll, a diffusion-based generative automatic music transcription (AMT) model
Jupyter Notebook
69
star
16

creativeai

CSS
63
star
17

meta-flutter

Yocto recipes for Flutter Engine and custom embedders
BitBake
61
star
18

FxNorm-automix

FxNorm-Automix - Implementation of automatic music mixing systems. We show how we can use wet music data and repurpose it to train a fully automatic mixing system
Python
51
star
19

appsync-client-go

AWS AppSync golang client library
Go
46
star
20

nnabla-nas

Neural Architecture Search for Neural Network Libraries
Python
44
star
21

flutter-elinux-plugins

Flutter plugins for embedded Linux (eLinux)
C++
43
star
22

nnabla-c-runtime

Neural Network Libraries https://nnabla.org/ - C Runtime
C
38
star
23

huis-ui-creator

JavaScript
38
star
24

NDJIR

NDJIR: Neural Direct and Joint Inverse Rendering for Geometry, Lights, and Materials of Real Object
Python
36
star
25

timbre-trap

Code for the paper "Timbre-Trap: A Low-Resource Framework for Instrument-Agnostic Music Transcription"
Python
34
star
26

pyIEOE

Python
31
star
27

nmos-js

An NMOS (Networked Media Open Specifications) Client in Javascript (IS-04, IS-05)
JavaScript
27
star
28

openocd-nuttx

Fork of OpenOCD with NuttX thread support.
C
24
star
29

CLIPSep

Python
23
star
30

pdaf-library

C
22
star
31

cdp-js

Libraries/SDK modules for multi-platform application development
TypeScript
20
star
32

polar-densification

Python
17
star
33

cordova-plugin-cdp-nativebridge

JavaScript
16
star
34

audio-visual-seld-dcase2023

Baseline method for audio-visual sound event localization and detection task of DCASE 2023 challenge
Python
16
star
35

generator-cordova-plugin-devbed

JavaScript
14
star
36

nnc-plugin

Plugins for Neural Network Console (https://dl.sony.com/).
Python
14
star
37

dolp-colorconstancy

Python
11
star
38

typescript-fsa-redux-middleware

Fluent syntax for defining typesafe Redux vanilla middlewares on top of typescript-fsa.
TypeScript
9
star
39

cdn-purge-control-php

Multi CDN purge control library for PHP
PHP
8
star
40

micro-notifier

Simplified Pusher Clone
Go
8
star
41

nnabla-browser

Visualization toolkit for Neural Network Libraries
TypeScript
8
star
42

isren

JavaScript
8
star
43

pixel-guided-diffusion

Fine-grained Image Editing by Pixel-wise Guidance Using Diffusion Models
Python
8
star
44

smarttennissensorsdk

The Smart Tennis Sensor plugs into the end of a tennis racket and records data about all the shots you make throughout a game or practice. With the SDK, you can develop apps for analyzing and presenting that data in real-time.
Java
8
star
45

cdp-cli

Command line tools for generating start point of multi-platform application development (Details: see cdp-js repository)
HTML
7
star
46

custom_layers

Python
7
star
47

mct_quantizers

Python
6
star
48

aibo-development-tutorial

6
star
49

smarttennissensormp4meta

Java
4
star
50

fp-diffusion

Jupyter Notebook
3
star
51

diffusion-timbre-transfer

Jupyter Notebook
3
star
52

node-win-usbdev

C++
3
star
53

evsCluster

Python scripts to process EVS (Event-based vision sensor) data
Python
3
star
54

Instruct3Dto3D-doc

Official documentation of Instruct 3D-to-3D
HTML
2
star
55

nnabla-js

TypeScript
1
star
56

nnabla-doc

1
star