• Stars
    star
    111
  • Rank 314,510 (Top 7 %)
  • Language
    Python
  • Created over 2 years ago
  • Updated about 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

[MICCAI-2022] This is the official implementation of Multi-Modal Masked Autoencoders for Medical Vision-and-Language Pre-Training.

M3AE

This is the official implementation of Multi-Modal Masked Autoencoders for Medical Vision-and-Language Pre-Training at MICCAI-2022.

Table of Contents

Requirements

Run the following command to install the required packages:

pip install -r requirements.txt

Download M3AE

You can download the models we pre-trained and fine-tuned in the corresponding datasets from here.

Pre-training

1. Dataset Preparation

Please organize the pre-training datasets as the following structure:

root:[data]
+--pretrain_data
| +--roco
| | +--val
| | +--test
| | +--train
| +--medicat
| | +--release
| | +--net

2. Pre-processing

Run the following command to pre-process the data:

python prepro/prepro_pretraining_data.py

to get the following arrow files:

root:[data]
+--pretrain_arrows
| +--medicat_train.arrow
| +--medicat_val.arrow
| +--medicat_test.arrow
| +--roco_train.arrow
| +--roco_val.arrow
| +--roco_test.arrow

3. Pre-training

Now we can start to pre-train the m3ae model:

bash run_scripts/pretrain_m3ae.sh

Downstream Evaluation

1. Dataset Preparation

Please organize the fine-tuning datasets as the following structure:

root:[data]
+--finetune_data
| +--melinda
| | +--train.csv
| | +--dev.csv
| | +--test.csv
| | +--melinda_images
| +--slack
| | +--train.json
| | +--validate.json
| | +--test.json
| | +--imgs
| +--vqa_rad
| | +--trainset.json
| | +--valset.json
| | +--testset.json
| | +--images
| +--medvqa_2019
| | +--val
| | +--test
| | +--train

2. Pre-processing

Run the following command to pre-process the data:

python prepro/prepro_finetuning_data.py

to get the following arrow files:

root:[data]
+--finetune_arrows
| +--vqa_vqa_rad_train.arrow
| +--vqa_vqa_rad_val.arrow
| +--vqa_vqa_rad_test.arrow
| +--vqa_slack_train.arrow
| +--vqa_slack_test.arrow
| +--vqa_slack_val.arrow
| +--vqa_medvqa_2019_train.arrow
| +--vqa_medvqa_2019_val.arrow
| +--vqa_medvqa_2019_test.arrow
| +--cls_melinda_train.arrow
| +--cls_melinda_val.arrow
| +--cls_melinda_test.arrow
| +--irtr_roco_train.arrow
| +--irtr_roco_val.arrow
| +--irtr_roco_test.arrow

3. Fine-Tuning

Now you can start to fine-tune the m3ae model:

bash run_scripts/finetune_m3ae.sh

4. Test

You can also test our fine-tuned models directly:

bash run_scripts/test_m3ae.sh

NOTE: This is a good way to check whether your environment is set up in the same way as ours (if you can reproduce the same results).

Acknowledgement

The code is based on ViLT, METER and MAE. We thank the authors for their open-sourced code and encourage users to cite their works when applicable.

Citations

If M3AE is useful for your research, please consider citing:

@inproceedings{chen2022m3ae,
  title={Multi-Modal Masked Autoencoders for Medical Vision-and-Language Pre-Training},
  author={Chen, Zhihong and Du, Yuhao and Hu, Jinpeng and Liu, Yang and Li, Guanbin and Wan, Xiang and Chang, Tsung-Hui},
  booktitle={International Conference on Medical Image Computing and Computer-Assisted Intervention},
  year={2022},
  organization={Springer}
}

More Repositories

1

awesome-image-captioning

A curated list of image captioning and related area resources. :-)
1,057
star
2

awesome-radiology-report-generation

A curated list of radiology report generation (medical report generation) and related areas. :-)
171
star
3

R2Gen

[EMNLP-2020] The official implementation of Generating Radiology Reports via Memory-driven Transformer.
Python
79
star
4

R2GenCMN

[ACL-2021] The official implementation of Cross-modal Memory Networks for Radiology Report Generation.
Python
73
star
5

awesome-few-shot-learning-in-nlp

A curated list of few-shot learning in NLP. :-)
65
star
6

PTUnifier

[ICCV-2023] Towards Unifying Medical Vision-and-Language Pre-training via Soft Prompts
Python
59
star
7

awesome-vision-and-language-pretraining

A curated list of vision-and-language pre-training (VLP). :-)
56
star
8

circleloss.pytorch

Examples of playing with Circle Loss from the paper "Circle Loss: A Unified Perspective of Pair Similarity Optimization", CVPR 2020.
Python
49
star
9

ARL

[ACMMM-2022] This is the official implementation of Align, Reason and Learn: Enhancing Medical Vision-and-Language Pre-training with Knowledge.
Python
32
star
10

SK-VG

[CVPR-2023] The official dataset of Advancing Visual Grounding with Scene Knowledge: Benchmark and Method.
28
star
11

awesome-reinforcement-learning-in-nlp

A curated list of reinforcement learning in NLP. :-)
20
star
12

awesome-disentanglement-in-nlp

A curated list of disentanglement in NLP. :-)
17
star
13

awesome-zero-shot-learning-in-nlp

A curated list of zero-shot learning in NLP. :-)
13
star
14

bert-clip-synesthesia

[Findings of ACL-2023] This is the official implementation of On the Difference of BERT-style and CLIP-style Text Encoders.
Jupyter Notebook
12
star
15

awesome-causality-in-nlp

A curated list of causality in NLP. :-)
8
star
16

arXiv-text-generation-papers

A curated list of text generation papers in arXiv.
8
star
17

awesome-attack-and-defense-in-nlp

A curated list of attack and defense in NLP. :-)
5
star
18

weakly-supervised-segmentation

weakly supervised medical image segmentation
Python
5
star
19

awesome-nlp-surveys

A curated list of surveys in NLP. :-)
2
star
20

awesome-contrastive-learning-in-nlp

A curated list of contrastive learning in NLP. :-)
2
star
21

awesome-noisy-channel-model

A curated list of noisy channel model and related areas. :-)
1
star
22

awesome-interesting-topics-in-nlp

A curated list of interesting topics in NLP. :-)
1
star
23

mae.pytorch

Simple and clean implementation of MAE (Masked Autoencoders Are Scalable Vision Learners) using Huggingface Transformers.
Python
1
star