• Stars
    star
    112
  • Rank 310,475 (Top 7 %)
  • Language
    Python
  • License
    MIT License
  • Created almost 4 years ago
  • Updated over 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

[TACL 2021] Code and data for the framework in "Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework of Vision-and-Language BERTs"

VOLTA: Visiolinguistic Transformer Architectures

This is the implementation of the framework described in the paper:

Emanuele Bugliarello, Ryan Cotterell, Naoaki Okazaki and Desmond Elliott. Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework of Vision-and-Language BERTs. Transactions of the Association for Computational Linguistics 2021; 9 978–994.

We provide the code for reproducing our results, preprocessed data and pretrained models.

News

Repository Setup

You can clone this repository with submodules included issuing:
git clone [email protected]:e-bug/volta

1. Create a fresh conda environment, and install all dependencies.

conda create -n volta python=3.6
conda activate volta
pip install -r requirements.txt

2. Install PyTorch

conda install pytorch=1.4.0 torchvision=0.5 cudatoolkit=10.1 -c pytorch

3. Install apex. If you use a cluster, you may want to first run commands like the following:

module load cuda/10.1.105
module load gcc/8.3.0-cuda

4. Setup the refer submodule for Referring Expression Comprehension:

cd tools/refer; make

5. Install this codebase as a package in this environment.

python setup.py develop

Data

Check out data/README.md for links to preprocessed data and data preparation steps.

features_extraction/ contains our latest feature extraction steps in hdf5 and npy instead of csv, and with different backbones. Steps for the IGLUE datasets can be found in its datasets sub-directory.

NB: I have noticed that uploading LMDB files made their size grow to the order of TBs. So, instead, I recently uploaded the H5 versions that can quickly be converted to LMDB locally using this script.

Models

Check out MODELS.md for links to pretrained models and how to define new ones in VOLTA.

Model configuration files are stored in config/.

Training and Evaluation

We provide sample scripts to train (i.e. pretrain or fine-tune) and evaluate models in examples/. These include ViLBERT, LXMERT and VL-BERT as detailed in the original papers, as well as ViLBERT, LXMERT, VL-BERT, VisualBERT and UNITER as specified in our controlled study.

Task configuration files are stored in config_tasks/.

License

This work is licensed under the MIT license. See LICENSE for details. Third-party software and data sets are subject to their respective licenses.
If you find our code/data/models or ideas useful in your research, please consider citing the paper:

@article{bugliarello-etal-2021-multimodal,
    author = {Bugliarello, Emanuele and Cotterell, Ryan and Okazaki, Naoaki and Elliott, Desmond},
    title = "{Multimodal Pretraining Unmasked: {A} Meta-Analysis and a Unified Framework of Vision-and-Language {BERT}s}",
    journal = {Transactions of the Association for Computational Linguistics},
    volume = {9},
    pages = {978-994},
    year = {2021},
    month = {09},
    issn = {2307-387X},
    doi = {10.1162/tacl_a_00408},
    url = {https://doi.org/10.1162/tacl\_a\_00408},
}

Acknowledgement

Our codebase heavily relies on these excellent repositories:

More Repositories

1

iglue

[ICML 2022] Code and data for our paper "IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages"
Shell
40
star
2

pascal

[ACL 2020] Code and data for our paper "Enhancing Machine Translation with Dependency-Aware Self-Attention"
Python
23
star
3

cross-modal-ablation

[EMNLP 2021] Code and data for our paper "Vision-and-Language or Vision-for-Language? On Cross-Modal Influence in Multimodal Transformers"
Jupyter Notebook
20
star
4

fine-grained-evals

[ACL 2023] Code and data for our paper "Measuring Progress in Fine-grained Vision-and-Language Understanding"
Jupyter Notebook
11
star
5

nmt-difficulty

[ACL 2020] Code and data for our paper "It's Easier to Translate out of English than into it: Measuring Neural Translation Difficulty by Cross-Mutual Information"
Python
10
star
6

distributed-tensorflow-benchmarks

Benchmarking distributed training in TensorFlow.
Jupyter Notebook
9
star
7

mpre-unmasked

[TACL 2021] Code for our paper "Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework of Vision-and-Language BERTs"
Jupyter Notebook
6
star
8

syncap

[EACL 2021] Code and data for our paper "The Role of Syntactic Planning in Compositional Image Captioning"
Jupyter Notebook
5
star
9

twitter-sentiment-classification

Machine Learning project in Text Sentiment Classification: predict if a tweet used to contain a positive :) or negative :( smiley, by considering only the remaining text.
TeX
3
star
10

solarity

Repository for the project in Applied Data Analysis, EPFL 2016/2017.
Jupyter Notebook
3
star
11

npmf

npMF: NumPy Matrix Factorization package.
Python
1
star