• Stars
    star
    317
  • Rank 132,216 (Top 3 %)
  • Language
    Python
  • License
    MIT License
  • Created almost 4 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

VisualGPT, CVPR 2022 Proceeding, GPT as a decoder for vision-language models

VisualGPT

Our Paper VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning

Main Architecture of Our VisualGPT

image

Download the GPT-2 pretrained weights

curl --output gpt2-pytorch_model.bin https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-pytorch_model.bin

Enviroment setup

Clone the repository and create the visualgpt conda environmnet

conda env create -f environment.yml
conda activate visualgpt

Then download spacy data

python -m spacy download en

Data preparation

We provide the COCO dataset for downloading. Please download the annotations file annotations.zip and extract it. and coco_detections.hdf5, in which the data is stored in a <key, value> where key is the image id and value is a tensor (N, 2048). N it the number of detections

code structure

create the log folder mkdir logs and start the training

Train the model

python train_visualGPT.py --batch_size 50 --head 12 --tau 0.2 --features_path coco_detections.hdf5 --annotation_folder annotations --lr 1e-4 --gpt_model_type gpt --random_seed 42 --log_file logs/log --exp_name experiment_log --lr 1e-4 --decoder_layer 12 --optimizer_type adamw  --gradient_accumulation_steps 2 --train_percentage 0.001 --split_train_data

Acknowledgement

This code used resources from Meshed Memory Transformer and Transformers

Please cite our paper from the following bibtex

@@InProceedings{Chen_2022_CVPR,
    author    = {Chen, Jun and Guo, Han and Yi, Kai and Li, Boyang and Elhoseiny, Mohamed},
    title     = {VisualGPT: Data-Efficient Adaptation of Pretrained Language Models for Image Captioning},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2022},
    pages     = {18030-18040}
}

More Repositories

1

MiniGPT-4

Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)
Python
25,382
star
2

MiniGPT4-video

Official code for Goldfish model for long video understanding and MiniGPT4-video for short video understanding
Python
547
star
3

ChatCaptioner

Official Repository of ChatCaptioner
Jupyter Notebook
451
star
4

LongVU

Python
85
star
5

MiniGPT-Med

Open-sourced code of miniGPT-Med
Python
80
star
6

3DCoMPaT-v2

3DCoMPaT++: An improved large-scale 3D vision dataset for compositional recognition
Python
76
star
7

LTVRR

Python
35
star
8

RelTransformer

Python
29
star
9

MammalNet

Python
27
star
10

artemis-v2

Code for the paper: It is Okay to Not Be Okay: Overcoming Emotional Bias in Affective Image Captioning by Contrastive Data Collection
Jupyter Notebook
18
star
11

3DCoMPaT

Official repository for the 3DCoMPaT dataset (ECCV2022 Oral)
Jupyter Notebook
16
star
12

InfiniBench

Python
12
star
13

saai-factory-tutorial-creative-ai

Creative AI for Visual Art and Music slides and demos.
11
star
14

affectiveVisDial

Python
11
star
15

AF-Guide

Official repository of Action-Free Guide
Python
11
star
16

CWAN

Creative Walk Adversarial Networks: Novel Art Generation with Probabilistic Random Walk Deviation from Style Norms
Python
7
star
17

WAGA

Code for Wรถlfflin Affective Generative Analysis paper published in ICCC 2021
Jupyter Notebook
6
star
18

CIZSLv2

CIZSL++: Creativity Inspired Generative Zero-Shot Learning. T-PAMI under review.
Python
6
star
19

HalentNet

Python
6
star
20

cs326-few-shot-classification

CS326 Practical assignment #2: few-shot classification
Python
5
star
21

GRaWD

Imaginative Walks: Generative Random Walk Deviation Loss for Improved Unseen Learning Representation. CVPR 2022 Workshop, ICCC 2022.
Python
4
star
22

artelingo

Jupyter Notebook
3
star
23

UnlikelihoodMotionForecasting

Jupyter Notebook
3
star