• Stars
    star
    252
  • Rank 161,312 (Top 4 %)
  • Language
    Jupyter Notebook
  • License
    MIT License
  • Created almost 6 years ago
  • Updated 5 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

TensorFlow implementation of "Multimodal Speech Emotion Recognition using Audio and Text," IEEE SLT-18

multimodal-speech-emotion

This repository contains the source code used in the following paper,

Multimodal Speech Emotion Recognition using Audio and Text, IEEE SLT-18, [paper]


[requirements]

tensorflow==1.4 (tested on cuda-8.0, cudnn-6.0)
python==2.7
scikit-learn==0.20.0
nltk==3.3

[download data corpus]

  • IEMOCAP [link] [paper]
  • download IEMOCAP data from its original web-page (license agreement is required)

[preprocessed-data schema (our approach)]

  • Get the preprocessed dataset [application link]

    If you want to download the "preprocessed dataset," please ask the license to the IEMOCAP team first.

  • for the preprocessing, refer to codes in the "./preprocessing"

  • We cannot publish ASR-processed transcription due to the license issue (commercial API), however, we assume that it is moderately easy to extract ASR-transcripts from the audio signal by oneself. (we used google-cloud-speech-api)

  • Format of the data for our experiments:

    MFCC : MFCC features of the audio signal (ex. train_audio_mfcc.npy)
    [#samples, 750, 39] - (#sampels, sequencs(max 7.5s), dims)

    MFCC-SEQN : valid lenght of the sequence of the audio signal (ex. train_seqN.npy)
    [#samples] - (#sampels)

    PROSODY : prosody features of the audio signal (ex. train_audio_prosody.npy)
    [#samples, 35] - (#sampels, dims)

    TRANS : sequences of trasnciption (indexed) of a data (ex. train_nlp_trans.npy)
    [#samples, 128] - (#sampels, sequencs(max))

    LABEL : targe label of the audio signal (ex. train_label.npy)
    [#samples] - (#sampels)

[source code]

  • repository contains code for following models

    Audio Recurrent Encoder (ARE)
    Text Recurrent Encoder (TRE)
    Multimodal Dual Recurrent Encoder (MDRE)
    Multimodal Dual Recurrent Encoder with Attention (MDREA)


[training]

  • refer "reference_script.sh"
  • fianl result will be stored in "./TEST_run_result.txt"

[cite]

  • Please cite our paper, when you use our code | model | dataset

    @inproceedings{yoon2018multimodal,
    title={Multimodal Speech Emotion Recognition Using Audio and Text},
    author={Yoon, Seunghyun and Byun, Seokhyun and Jung, Kyomin},
    booktitle={2018 IEEE Spoken Language Technology Workshop (SLT)},
    pages={112--118},
    year={2018},
    organization={IEEE}
    }