• Stars
    star
    106
  • Rank 318,535 (Top 7 %)
  • Language
    Python
  • License
    MIT License
  • Created about 4 years ago
  • Updated almost 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

The code for our IEEE ACCESS (2020) paper Multimodal Emotion Recognition with Transformer-Based Self Supervised Feature Fusion.

Model Overviw

Please replace the Table 6 in the paper

Please replace the Table 6 of the paper with this table.

Basic strucutre of the code

Inspiration from fairseq

  1. This code strcuture is built on top of Faiseq interface
  2. Fairseq is an open source project by FacebookAI team that combined different SOTA architectures for sequencial data processing
  3. This also consist of SOTA optimizing mechanisms such as ealry stopage, warup learnign rates, learning rate shedulers
  4. We are trying to develop our own architecture in compatible with fairseq interface.
  5. For more understanding please read the paper published about Fairseq interaface.

Merging of our own architecture with Fairseq interface

  1. This can be bit tricky in the beggining. First it is important to udnestand that Fairseq has built in a way that all architectures can be access through the terminal commands (args).

  2. Since our architecture has lot of properties in tranformer architecture, we followed the a tutorial that describe to use Roberta for the custom classification task.

  3. We build over archtiecture by inserting new stuff to following directories in Fairseq interfeace.

    • fairseq/data
    • fairseq/models
    • fairseq/modules
    • fairseq/tasks
    • fairseq/criterions

Main scripts of the code

Our main scripts are categorized in to for parts

  1. Custom dataloader for load raw audio, faceframes and text is in the fairseq/data/raw_audio_text_video_dataset.py

  2. The task of the emotion prediction similar to other tasks such as translation is in the fairseq/tasks/emotion_prediction.py

  3. The custom architecture of our model similar to roberta,wav2vec is in the fairseq/models/mulT_emo.py

  4. To obtain Inter-Modal attention we modify the self attentional architecture a bit. They can be found in fairseq/modules/transformer_multi_encoder.py and fairseq/modules/transformer_layer.py

  5. Finally the cutom loss function scripts cab be found it fairseq/criterions/emotion_prediction_cri.py

Prerequest models

Our model uses pretrained SSL methods to extract features. It is important to download those checkpoints prior to the trainig procedure. Please you the following links to downlaod the pretrained SSL models.

  1. For audio fetures - wav2vec
  2. For facial features - Fabnet
  3. For sentence (text) features - Roberta

Training Command

python train.py --data ./T_data-old/mosei_sent --restore-file None --task emotion_prediction --reset-optimizer --reset-dataloader --reset-meters --init-token 0 --separator-token 2 --arch robertEMO_large --criterion emotion_prediction_cri --num-classes 1 --dropout 0.1 --attention-dropout 0.1 --weight-decay 0.1 --optimizer adam --adam-betas "(0.9, 0.98)" --adam-eps 1e-06 --clip-norm 0.0 --lr 1e-03 --max-epoch 32 --best-checkpoint-metric loss --encoder-layers 2 --encoder-attention-heads 4 --max-sample-size 150000 --max-tokens 150000000 --batch-size 4 --encoder-layers-cross 2 --max-positions-t 512 --max-positions-a 936 --max-positions-v 301 --no-epoch-checkpoints --update-freq 2 --find-unused-parameters --ddp-backend=no_c10d --lr-scheduler reduce_lr_on_plateau --regression-target-mos

Validation Command

CUDA_VISIBLE_DEVICES=1 python validate.py --data ./T_data/emocap --path './checkpoints/checkpoint_best.pt' --task emotion_prediction --valid-subset test --batch-size 4

More Repositories

1

BERT-like-is-All-You-Need

The code for our INTERSPEECH 2020 paper - Jointly Fine-Tuning "BERT-like'" Self Supervised Models to Improve Multimodal Speech Emotion Recognition
Python
110
star
2

IMU-PLOS_LSTM

Using LSTM networks to train IMU data by PLOS - This is custom LSTM-RNN .
Jupyter Notebook
13
star
3

VUSFA-Variational-Universal-Successor-Features-Approximator

This repository contains implementations of the paper VUSFA
Python
12
star
4

Variational-Discriminator-Bottleneck-Tensorflow-Implementation

Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow - Tensorlfow Implementation
Python
12
star
5

Target-Driven-Visual-Navigation-with-Distributed-PPO

This repository has used AI2THOR CVPR data set.
Python
11
star
6

GAIL-with-WGAN-loss-for-the-Discriminator

This is about imitation learning using PPO and WGAN-GP loss. This is heavily influenced by GAIL-PPO repository in following link - https://github.com/uidilr/gail_ppo_tf. My agent will get converged to perform his task around 3384 iterations.
Python
9
star
7

A3C-with-GAIL-

This is an implementation of A3C with GAIL for Target Driven Visual Navigation
Python
6
star
8

Target-Driven-Visual-Navigation-A3C-USF-LSTM

Added the LSTM node prior to the policy and the USF apprximation
Jupyter Notebook
5
star
9

MAIA-Data-Processing

This repository has all the multimodal data processing scripts
Python
4
star
10

Local_Faster_RCNN

This is well described repo for train the faster RCNN /SSD on a local machine
Python
3
star
11

SummarizeMe-Digital-Journal

Weakly-supervised BART-based autobiographical text summarization model.
Python
2
star
12

Sensor_Data_Preprocess-

This is about how to write ardiuno readings in to a csv in desired manner .
Python
1
star
13

Target-Driven-Viz-PPO

This repo is to merge A3C with PPO for Ai2thor - Uses ONLY PPO working
Python
1
star
14

sementic-search-with-PEFT

Semantic Search with PEFT and Transformers
Python
1
star