• Stars
    star
    127
  • Rank 281,129 (Top 6 %)
  • Language
    Python
  • License
    GNU General Publi...
  • Created almost 3 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A Tool for extracting multimodal features from videos.

MMSA-Feature Extraction Toolkit

MMSA-Feature Extraction Toolkit extracts multimodal features for Multimodal Sentiment Analysis Datasets. It integrates several commonly used tools for visual, acoustic and text modality. The extracted features are compatible with the MMSA Framework and thus can be used directly. The tool can also extract features for single videos.

This work is included in the ACL-2022 DEMO paper: M-SENA: An Integrated Platform for Multimodal Sentiment Analysis. If you find our work useful, don't hesitate to cite our paper. Thank you!

@article{mao2022m,
  title={M-SENA: An Integrated Platform for Multimodal Sentiment Analysis},
  author={Mao, Huisheng and Yuan, Ziqi and Xu, Hua and Yu, Wenmeng and Liu, Yihe and Gao, Kai},
  journal={arXiv preprint arXiv:2203.12441},
  year={2022}
}

Features

  • Extract fully customized features for single videos or datasets.
  • Integrate some most commonly used tools, including Librosa, OpenFace, Transformers, etc.
  • Support Active Speaker Detection in case multiple faces exists in a video.
  • Easy to use, provides Python APIs and commandline tools.
  • Extracted features are compatible with MMSA, a unified training & testing framework for Multimodal Sentiment Analysis.

1. Installation

MMSA-Feature Extraction Toolkit is available from PyPI. Due to package size limitation on PyPi, large model files cannot be shipped with the package. Users need to run a post install command to download these files manually. If you can't access Google Drive, please refer to this page for manual download.

# Install package from PyPI
$ pip install MMSA-FET
# Download models & libraries from Google Drive. Use --proxy if needed.
$ python -m MSA_FET install

Note: A few system-wide dependancies need to be installed manually. See Dependency Installation for more information.

2. Quick Start

MMSA-FET is fairly easy to use. You can either call API in python or use commandline interface. Below is a basic example using python APIs.

Note: To extract features for datasets, the datasets need to be organized in a specific file structure, and a label.csv file is needed. See Dataset and Structure for details. Raw video files and label files for MOSI, MOSEI and CH-SIMS can be downloaded from BaiduYunDisk code: mfet or Google Drive.

from MSA_FET import FeatureExtractionTool
from MSA_FET import run_dataset

# initialize with default librosa config which only extracts audio features
fet = FeatureExtractionTool("openface")

# alternatively initialize with a custom config file
fet = FeatureExtractionTool("custom_config.json")

# extract features for single video
feature1 = fet.run_single("input1.mp4")
print(feature1)
feature2 = fet.run_single("input2.mp4")

# extract for dataset & save features to file
run_dataset(
    config = "aligned",
    dataset_dir="~/MOSI", 
    out_file="output/feature.pkl",
    num_workers=4
)

The custom_config.json is the path to a custom config file, the format of which is introduced below.

For detailed usage, please read APIs and Command Line Arguments.

3. Config File

MMSA-FET comes with a few example configs which can be used like below.

# Each supported tool has an example config
fet = FeatureExtractionTool(config="aligned")
fet = FeatureExtractionTool(config="librosa")
fet = FeatureExtractionTool(config="opensmile")
fet = FeatureExtractionTool(config="wav2vec")
fet = FeatureExtractionTool(config="openface")
fet = FeatureExtractionTool(config="mediapipe")
fet = FeatureExtractionTool(config="bert")
fet = FeatureExtractionTool(config="roberta")

For customized features, you can:

  1. Edit the default configs and pass a dictionary to the config parameter like the example below:
from MSA_FET import FeatureExtractionTool, get_default_config

# here we only extract audio and video features
config_a = get_default_config('opensmile')
config_v = get_default_config('openface')

# modify default config
config_a['audio']['args']['feature_level'] = 'LowLevelDescriptors'

# combine audio and video configs
config = {**config_a, **config_v}

# initialize
fet = FeatureExtractionTool(config=config)
  1. Provide a config json file. The below example extracts features of all three modalities. To extract unimodal features, just remove unnecessary sections from the file.
{
  "audio": {
    "tool": "librosa",
    "sample_rate": null,
    "args": {
      "mfcc": {
        "n_mfcc": 20,
        "htk": true
      },
      "rms": {},
      "zero_crossing_rate": {},
      "spectral_rolloff": {},
      "spectral_centroid": {}
    }
  },
  "video": {
    "tool": "openface",
    "fps": 25,
    "average_over": 3,
    "args": {
      "hogalign": false,
      "simalign": false,
      "nobadaligned": false,
      "landmark_2D": true,
      "landmark_3D": false,
      "pdmparams": false,
      "head_pose": true,
      "action_units": true,
      "gaze": true,
      "tracked": false
    }
  },
  "text": {
    "model": "bert",
    "device": "cpu",
    "pretrained": "models/bert_base_uncased",
    "args": {}
  }
}

4. Supported Tools & Features

4.1 Audio Tools

4.2 Video Tools

  • OpenFace (link)

    Supports all features in OpenFace's FeatureExtraction binary, including: facial landmarks in 2D and 3D, head pose, gaze related, facial action units, HOG binary files. Details of these features can be found in the OpenFace Wiki here and here. Detailed configurations can be found here.

  • MediaPipe (link)

    Supports face mesh and holistic(face, hand, pose) solutions. Detailed configurations can be found here.

  • TalkNet(link)

    TalkNet is used to support Active Speaker Detection in case there are multiple human faces in the video.

4.3 Text Tools

  • BERT (link)

    Integrated from huggingface transformers. Detailed configurations can be found here.

  • XLNet (link)

    Integrated from huggingface transformers. Detailed configurations can be found here.

4.4 Aligners

  • Wav2vec CTC Aligner

    Using pretrained Wav2vec ASR model to generate timestamps for each word, then align video & audio with text. Currently only support English.

More Repositories

1

MMSA

MMSA is a unified framework for Multimodal Sentiment Analysis.
Python
608
star
2

GNN-GBDT-Guided-Fast-Optimizing-Framework

GNN&GBDT-Guided Fast Optimizing Framework for Large-scale Integer Programming(Ye et al., ICML 2023): https://openreview.net/pdf?id=tX7ajV69wt
Python
311
star
3

TEXTOIR

TEXTOIR is the first opensource toolkit for text open intent recognition. (ACL 2021)
Python
183
star
4

Self-MM

Codes for paper "Learning Modality-Specific Representations with Self-Supervised Multi-Task Learning for Multimodal Sentiment Analysis"
Python
175
star
5

OKD-Reading-List

Papers for Open Knowledge Discovery
TeX
117
star
6

DeepAligned-Clustering

Discovering New Intents with Deep Aligned Clustering (AAAI 2021)
Python
116
star
7

Cross-Modal-BERT

CM-BERT: Cross-Modal BERT for Text-Audio Sentiment Analysis๏ผˆMM2020๏ผ‰
Python
100
star
8

AWESOME-MSA

Paper List for Multimodal Sentiment Analysis
93
star
9

M-SENA

M-SENA: All-in-One Platform for Multimodal Sentiment Analysis
72
star
10

Adaptive-Decision-Boundary

Deep Open Intent Classification with Adaptive Decision Boundary (AAAI 2021)
Python
70
star
11

MIntRec

MIntRec: A New Dataset for Multimodal Intent Recognition (ACM MM 2022)
Python
65
star
12

ch-sims-v2

Make Acoustic and Visual Cues Matter: CH-SIMS v2.0 Dataset and AV-Mixup Consistent Module
Python
48
star
13

TEXTOIR-DEMO

TEXTOIR: An Integrated and Visualized Platform for Text Open Intent Recognition (ACL 2021)
JavaScript
46
star
14

CDAC-plus

Discovering New Intents via Constrained Deep Adaptive Clustering with Cluster Refinement (AAAI2020)
Jupyter Notebook
42
star
15

DeepUnkID

Deep Unknown Intent Detection with Margin Loss (ACL2019)
Jupyter Notebook
34
star
16

CRL

Implementation of the research paper Consistent Representation Learning for Continual Relation Extraction (Findings of ACL 2022)
Python
25
star
17

TFR-Net

This repository contains the official implementation code of the paper Transformer-based Feature Reconstruction Network for Robust Multimodal Sentiment Analysis, accepted at ACMMM 2021.
Python
24
star
18

TCL-MAP

TCL-MAP is a powerful method for multimodal intent recognition (AAAI 2024)
Python
20
star
19

OpenVNA

[ACL 2024 SDT] OpenVNA is an open-source framework designed for analyzing the behavior of multimodal language understanding systems under noisy conditions.
Python
15
star
20

AWESOME-Dialogue

Paper List for Dialogue and Interactive Systems
15
star
21

MIntRec2.0

MIntRec 2.0 is the first large-scale dataset for multimodal intent recognition and out-of-scope detection in multi-party conversations (ICLR 2024)
Python
15
star
22

UMC

Unsupervised Multimodal Clustering for Semantics Discovery in Multimodal Utterances (ACL 2024)
Python
11
star
23

thuiar.github.io

The official website of THUIAR.
HTML
10
star
24

Books

JavaScript
8
star
25

Robust-MSA

JavaScript
7
star
26

CTMWA

Crossmodal Translation based Meta Weight Adaption for Robust Image-Text Sentiment Analysis
Python
5
star
27

Meta-NA

Pytorch implementation for codes in Meta Noise Adaption Framework for Multimodal Sentiment Analysis with Feature Noise (Accepted by IEEE Transactions on Multimedia).
Python
4
star
28

TCM-CAS

Traditional Chinese Medicine Constitution Assessment System
3
star
29

MILPGen

Python
2
star
30

AWESOME-MTL

Paper List for Multi-task Learning
2
star
31

cmcnn

code for paper "Co-attentive multi-task convolutional neural network for facial expression recognition"
Python
2
star
32

Expensive-Multi-objective-Optimization

2
star
33

Adaptive-Batch-ParEGO

This repository contains Matlab implementation of the algorithm framework for adaptive batch-ParEGO
MATLAB
2
star
34

AudioProcess

Related methods and tools for processing audio data
C++
1
star
35

Block-MOBO

This repository contains Matlab implementation of the algorithm framework for Block-MOBO.
MATLAB
1
star
36

ML4MILP

ML4MILP: the first benchmark dataset specifically designed to test ML-based algorithms for solving MILP problems
Python
1
star
37

GAR-Net

GAR-Net: A Graph Attention Reasoning Network for Conversation Understanding
Python
1
star
38

Light-MILPopt

1
star