Top Rating
- Top Contributors
  Discover the Top Open Source contributors by country or by language
- Interviews
  Discover real stories from Open Source developers
Discover

Discover your Favorite Language
Discover the top trending repositories and projects on Github. Explore the latest trends in your preferred languages.

Haskell

Crystal

F#

Julia

MATLAB

Zig

Lua

Rust

More Languages
Awesome

Awesome repositories
Discover the most awesome repositories and projects of your favorite languages. Inspired by the Awesome-* lists trend in GitHub.

JavaScript

Nix

C

Zig

Dart

Ruby

Perl

Racket

More Languages
By Country

Rankings by Country
Discover the community of talented open source contributors in each country.

🇸🇳 Senegal

🇹🇨 Turks and Caicos Islands

🇮🇪 Ireland

🇳🇵 Nepal

🇸🇲 San Marino

🇦🇿 Azerbaijan

🇨🇳 China

🇪🇨 Ecuador

All Countries Compare Countries

Vision-CAIR/VisualGPT

Stars
317
Rank 132,216 (Top 3 %)
Language
Python
License
MIT License
Created almost 4 years ago
Updated over 1 year ago

Vision-CAIR/VisualGPT

Vision-CAIR

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

VisualGPT, CVPR 2022 Proceeding, GPT as a decoder for vision-language models

VisualGPT

Our Paper VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning

Main Architecture of Our VisualGPT

Download the GPT-2 pretrained weights

curl --output gpt2-pytorch_model.bin https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-pytorch_model.bin

Enviroment setup

Clone the repository and create the visualgpt conda environmnet

conda env create -f environment.yml
conda activate visualgpt

Then download spacy data

python -m spacy download en

Data preparation

We provide the COCO dataset for downloading. Please download the annotations file annotations.zip and extract it. and coco_detections.hdf5, in which the data is stored in a <key, value> where key is the image id and value is a tensor (N, 2048). N it the number of detections

code structure

create the log folder mkdir logs and start the training

Train the model

python train_visualGPT.py --batch_size 50 --head 12 --tau 0.2 --features_path coco_detections.hdf5 --annotation_folder annotations --lr 1e-4 --gpt_model_type gpt --random_seed 42 --log_file logs/log --exp_name experiment_log --lr 1e-4 --decoder_layer 12 --optimizer_type adamw  --gradient_accumulation_steps 2 --train_percentage 0.001 --split_train_data

Acknowledgement

This code used resources from Meshed Memory Transformer and Transformers

Please cite our paper from the following bibtex

@@InProceedings{Chen_2022_CVPR,
    author    = {Chen, Jun and Guo, Han and Yi, Kai and Li, Boyang and Elhoseiny, Mohamed},
    title     = {VisualGPT: Data-Efficient Adaptation of Pretrained Language Models for Image Captioning},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2022},
    pages     = {18030-18040}
}

MiniGPT-4

Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)

MiniGPT4-video

Official code for Goldfish model for long video understanding and MiniGPT4-video for short video understanding

ChatCaptioner

Official Repository of ChatCaptioner

Jupyter Notebook

LongVU

MiniGPT-Med

Open-sourced code of miniGPT-Med

3DCoMPaT-v2

3DCoMPaT++: An improved large-scale 3D vision dataset for compositional recognition

LTVRR

RelTransformer

MammalNet

artemis-v2

Code for the paper: It is Okay to Not Be Okay: Overcoming Emotional Bias in Affective Image Captioning by Contrastive Data Collection

Jupyter Notebook

3DCoMPaT

Official repository for the 3DCoMPaT dataset (ECCV2022 Oral)

Jupyter Notebook

InfiniBench

saai-factory-tutorial-creative-ai

Creative AI for Visual Art and Music slides and demos.

affectiveVisDial

AF-Guide

Official repository of Action-Free Guide

CWAN

Creative Walk Adversarial Networks: Novel Art Generation with Probabilistic Random Walk Deviation from Style Norms

WAGA

Code for Wölfflin Affective Generative Analysis paper published in ICCC 2021

Jupyter Notebook

CIZSLv2

CIZSL++: Creativity Inspired Generative Zero-Shot Learning. T-PAMI under review.

HalentNet

cs326-few-shot-classification

CS326 Practical assignment #2: few-shot classification

GRaWD

Imaginative Walks: Generative Random Walk Deviation Loss for Improved Unseen Learning Representation. CVPR 2022 Workshop, ICCC 2022.

artelingo

Jupyter Notebook

UnlikelihoodMotionForecasting

Jupyter Notebook