• Stars
    star
    250
  • Rank 162,397 (Top 4 %)
  • Language
    Python
  • License
    Other
  • Created about 6 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

The Emotional Voices Database: Towards Controlling the Emotional Expressiveness in Voice Generation Systems

EmoV-DB

See also

https://github.com/noetits/ICE-Talk for controllable TTS

How to use

Download link

Sorted version (recommended), new link: https://openslr.org/115/

old link (slow download) but gives ou the folder structure needed to use "load_emov_db()" function: https://mega.nz/#F!KBp32apT!gLIgyWf9iQ-yqnWFUFuUHg

Not sorted version: http://www.coe.neu.edu/Research/AClab/Speech%20Data/

Forced alignments with gentle

"It is the process of taking the text transcription of an audio speech segment and determining where in time particular words occur in the speech segment." source

It also allows to separate verbal and non-verbal vocalizations (laughs, yawns, etc.)

  1. Go to https://github.com/lowerquality/gentle

  2. Clone the repo

  3. In Getting started, use the 3rd option: .\install.sh

  4. Copy align_db.py in the repository

  5. In align_db.py, change the "path" variable so that it corresponds to the path of EmoV-DB.

  6. Launch command "python align_db.py". You'll probably have to install some packages to make it work

  7. It should create a folder called "alignments" in the repo, with the same structure as the database, containing a json file for each sentence of the database.

  8. The function "get_start_end_from_json(path)" allows you to extract start and end of the computed force alignment

  9. you can play a file with function "play(path)"

  10. you can play the part of the file in which there is speech according to the forced alignment with "play_start_end(path, start, end)"

Overview of data

The Emotional Voices Database: Towards Controlling the Emotional Expressiveness in Voice Generation Systems

  • This dataset is built for the purpose of emotional speech synthesis. The transcript were based on the CMU arctic database: http://www.festvox.org/cmu_arctic/cmuarctic.data.

  • It includes recordings for four speakers- two males and two females.

  • The emotional styles are neutral, sleepiness, anger, disgust and amused.

  • Each audio file is recorded in 16bits .wav format

  • Spk-Je (Female, English: Neutral(417 files), Amused(222 files), Angry(523 files), Sleepy(466 files), Disgust(189 files))

  • Spk-Bea (Female, English: Neutral(373 files), Amused(309 files), Angry(317 files), Sleepy(520 files), Disgust(347 files))

  • Spk-Sa (Male, English: Neutral(493 files), Amused(501 files), Angry(468 files), Sleepy(495 files), Disgust(497 files))

  • Spk-Jsh (Male, English: Neutral(302 files), Amused(298 files), Sleepy(263 files))

  • File naming (audio_folder): anger_1-28_0011.wav - 1) first word (emotion style), 1-28 - annotation doc file range, Last four digit is the sentence number.

  • File naming (annotation_folder): anger_1-28.TextGrid - 1) first word (emotional style), 1-28- annotation doc range

References

A description of the database here: https://arxiv.org/pdf/1806.09514.pdf

Please reference this paper when using this database:

Bibtex:

@article{adigwe2018emotional,
  title={The emotional voices database: Towards controlling the emotion dimension in voice generation systems},
  author={Adigwe, Adaeze and Tits, No{\'e} and Haddad, Kevin El and Ostadabbas, Sarah and Dutoit, Thierry},
  journal={arXiv preprint arXiv:1806.09514},
  year={2018}
}

More Repositories

1

MBROLA

MBROLA is a speech synthesizer based on the concatenation of diphones
C
234
star
2

EEGLearn-Pytorch

Jupyter Notebook
126
star
3

MBROLA-voices

Data files of mbrola speech synthesizer voices
112
star
4

ofxMotionMachine

MotionMachine is a C++ software toolkit for fast prototyping of interaction based on motion feature extraction. It brings mocap-friendly data structures and intuitive visualisation together.
C++
64
star
5

mage

MAGE is a C/C++ software toolkit for reactive implementation of HMM-based speech and singing synthesis.
C++
60
star
6

Emotion-EEG

Public Models considered for emotion estimation from EEG
Python
34
star
7

MBROLATOR

This is a database creation tool for the MBROLA speech synthesizer
C++
29
star
8

MocapRecovery

Robust and automatic MoCap data recovery (Matlab code).
MATLAB
29
star
9

Attention-EEG

Python
16
star
10

UMONS-TAICHI

UMONS-TAICHI: A Multimodal Motion Capture Dataset of Expertise in Taijiquan Gestures
MATLAB
13
star
11

eeg_unicorn_basics

Repository showing the basics to register, plot and create a first database with the Unicorn Hybrid Black EEG recorder in python.
Python
12
star
12

LaughterSynthesis

This repository contains laughter-related synthesis systems.
12
star
13

Spatio-Temporal-EEG-Analysis

Python
10
star
14

Crowd-Counting-with-MCNNs

Python
7
star
15

VisualAttention-RareFamily

Where do people look on images in average? At rare, thus surprising things! Let's compute them automatically
6
star
16

ValidEEG

This repository contains all the codes and data relative to "La Fisca et al., A Versatile Validation Framework for ERP and Oscillatory Source Reconstruction Using FieldTrip, 2021"
MATLAB
6
star
17

PoseTReID_DATASET

Video dataset with ground-truth for people detecting, tracking, and re-identifying.
Python
5
star
18

multi_video_sync

Python
5
star
19

ACT

C#
4
star
20

VisualAttention-Rare2007

RARE2007 is a feature-engineered bottom-up salienct model only using color information (no orientation)
MATLAB
3
star
21

speak_with_style_demo

Demo of a synthesized utterance with different styles with controllable intensities
Python
3
star
22

VisualAttention-Rare2012

RARE2012 is a feature-engineered bottom-up visual attention model
MATLAB
3
star
23

vsensebox

VSenseBox - Python toolbox for visual sensing.
Python
3
star
24

mage-legacy

This is the code developed during enterface 2012, its commit history has since been cleaned up and is available at http://github.com/numediart/mage . This non-clean (binary files, xcode files, ...) version of the history will probably be erased in a close future, do not rely on it, go for MAGE 2.0 instead ;-)
C++
3
star
25

VisualAttention-DeepRare2019

Visual Attention : what is salient in an image with DeepRare2019
Python
3
star
26

Visual-Question-Answering

Jupyter Notebook
3
star
27

ImmersiveSoundSpace

A bunch of scripts to follow wireless headphones in 3D space (based on HTC Vive trackers) and compute spacialized sound on Unity
C#
3
star
28

GIHME

Repository for GIHME (Guitar Improvisations with Hexaphonic Multi-Effects) Dataset
2
star
29

WebSpeakable

Tampermonkey script that speaks the selected text on web pages
JavaScript
2
star
30

ultralytics-for-vsensebox

Ultralytics YOLO for SOTA object detection, multi-object tracking, instance segmentation, pose estimation and image classification.
Python
2
star
31

VERA

Virtual Environment Recording Attention
C#
2
star
32

ReVA-toolkit

Reactive Virtual Agent toolkit for human-agent interaction applications.
GDScript
2
star
33

blob-simulation

Détection d'un blob dans une image ainsi que détection des comprimés de nourriture. Simulation d'un blob dans PyGame à travers un système multi-agents à connaissance partagée.
Python
2
star
34

DepthPrediction

Jupyter Notebook
2
star
35

xVAEnet

Python
2
star
36

vsensebox-data

Dynamic data packager for the supported detector/tracker of VSenseBox.
Python
2
star
37

ISGT

Virtual Indoor Scene Generator
C#
1
star
38

PreprocEEG

HTML
1
star
39

xAAEnet

Scoring the Severity of Sleep Disorders With Explainable AI
Python
1
star