• Stars
    star
    316
  • Rank 131,874 (Top 3 %)
  • Language
    Python
  • License
    MIT License
  • Created over 3 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

VisualGPT, CVPR 2022 Proceeding, GPT as a decoder for vision-language models

VisualGPT

Our Paper VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning

Main Architecture of Our VisualGPT

image

Download the GPT-2 pretrained weights

curl --output gpt2-pytorch_model.bin https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-pytorch_model.bin

Enviroment setup

Clone the repository and create the visualgpt conda environmnet

conda env create -f environment.yml
conda activate visualgpt

Then download spacy data

python -m spacy download en

Data preparation

We provide the COCO dataset for downloading. Please download the annotations file annotations.zip and extract it. and coco_detections.hdf5, in which the data is stored in a <key, value> where key is the image id and value is a tensor (N, 2048). N it the number of detections

code structure

create the log folder mkdir logs and start the training

Train the model

python train_visualGPT.py --batch_size 50 --head 12 --tau 0.2 --features_path coco_detections.hdf5 --annotation_folder annotations --lr 1e-4 --gpt_model_type gpt --random_seed 42 --log_file logs/log --exp_name experiment_log --lr 1e-4 --decoder_layer 12 --optimizer_type adamw  --gradient_accumulation_steps 2 --train_percentage 0.001 --split_train_data

Acknowledgement

This code used resources from Meshed Memory Transformer and Transformers

Please cite our paper from the following bibtex

@@InProceedings{Chen_2022_CVPR,
    author    = {Chen, Jun and Guo, Han and Yi, Kai and Li, Boyang and Elhoseiny, Mohamed},
    title     = {VisualGPT: Data-Efficient Adaptation of Pretrained Language Models for Image Captioning},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2022},
    pages     = {18030-18040}
}

More Repositories

1

MiniGPT-4

Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)
Python
25,271
star
2

MiniGPT4-video

Official code for Goldfish model for long video understanding and MiniGPT4-video for short video understanding
Python
486
star
3

ChatCaptioner

Official Repository of ChatCaptioner
Jupyter Notebook
450
star
4

3DCoMPaT-v2

3DCoMPaT++: An improved large-scale 3D vision dataset for compositional recognition
Python
75
star
5

MiniGPT-Med

Open-sourced code of miniGPT-Med
Python
63
star
6

LTVRR

Python
35
star
7

RelTransformer

Python
29
star
8

MammalNet

Python
25
star
9

artemis-v2

Code for the paper: It is Okay to Not Be Okay: Overcoming Emotional Bias in Affective Image Captioning by Contrastive Data Collection
Jupyter Notebook
17
star
10

3DCoMPaT

Official repository for the 3DCoMPaT dataset (ECCV2022 Oral)
Jupyter Notebook
16
star
11

saai-factory-tutorial-creative-ai

Creative AI for Visual Art and Music slides and demos.
11
star
12

AF-Guide

Official repository of Action-Free Guide
Python
11
star
13

InfiniBench

Python
10
star
14

affectiveVisDial

Python
9
star
15

CWAN

Creative Walk Adversarial Networks: Novel Art Generation with Probabilistic Random Walk Deviation from Style Norms
Python
7
star
16

WAGA

Code for Wรถlfflin Affective Generative Analysis paper published in ICCC 2021
Jupyter Notebook
6
star
17

CIZSLv2

CIZSL++: Creativity Inspired Generative Zero-Shot Learning. T-PAMI under review.
Python
6
star
18

HalentNet

Python
6
star
19

cs326-few-shot-classification

CS326 Practical assignment #2: few-shot classification
Python
5
star
20

GRaWD

Imaginative Walks: Generative Random Walk Deviation Loss for Improved Unseen Learning Representation. CVPR 2022 Workshop, ICCC 2022.
Python
4
star
21

artelingo

Jupyter Notebook
3
star
22

UnlikelihoodMotionForecasting

Jupyter Notebook
3
star