• Stars
    star
    292
  • Rank 142,152 (Top 3 %)
  • Language
    Python
  • License
    GNU General Publi...
  • Created almost 6 years ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A pytroch reimplementation of "Bilinear Attention Network", "Intra- and Inter-modality Attention", "Learning Conditioned Graph Structures", "Learning to count object", "Bottom-up top-down" for Visual Question Answering 2.0

Several Recent Approaches (2018) on VQA v2

The project is based on Cyanogenoid/vqa-counting. Most of the current VQA2.0 projects are based on https://github.com/hengyuan-hu/bottom-up-attention-vqa, while I personally prefer the Cyanogenoid's framework, because it's very clean and clear. So I reimplement several recent approaches including :

One of the benefit of our framework is that you can easily add counting module into your own model, which is proved to be effictive in imporving counting questions without harm the performance of your own model.

Dependencies

  • Python 3.6
    • torch > 0.4
    • torchvision 0.2
    • h5py 2.7
    • tqdm 4.19

Prepare dataset (FollowCyanogenoid/vqa-counting)

  • In the data directory, execute ./download.sh to download VQA v2.
    • For experimenting, using 36 fixed proposals is faster, at the expense of a bit of accuracy. Uncomment the relevant lines in download.sh and change the paths in config.py accordingly. Don't forget to set output_size in there to 36 to actually get the speed-up.
  • Prepare the data by running
python preprocess-images.py
python preprocess-vocab.py

This creates an h5py database (95 GiB) containing the object proposal features and a vocabulary for questions and answers at the locations specified in config.py.

How to Train

All the models are named as XXX_model.py, and most of the parameters is under config.py. To change the model, simply change model_type in config.py. Then train your model with:

python train.py [optional-name]
  • To evaluate accuracy (VQA accuracy and balanced pair accuracy) in various categories, you can run
python eval-acc.py <path to .pth log> [<more paths to .pth logs> ...]

Support training whole trainval split and generate result.json file for you to upload to the vqa2.0 online evaluation server

  • First, I merge the question and annotation json for train and validation splits. You can download trainval_annotation.json and trainval_question.json from the links, and put them into ./data/ directory
  • To train your model using the entire train & val sets, simply type the --trainval option during your training
python train.py --trainval
  • To generate result.json file for you to upload to the vqa2.0 online evaluation server, you need to resume from the previous model trained from trainval split and select test. The generated result.json will be put into config.result_json_path
python train.py --test --resume=./logs/YOUR_MODEL.pth
  • One More Thing: note that most of the methods require different learning rates when they train through the entire train&val splits. Usually, it's small than the learning rate that used to train on single train split.

Model Details

Note that I didn't implement tfidf embedding of BAN model (though the current model has competitive/almost the same performance even without tfidf), only Glove Embedding is provided. About Intra- and Inter-modality Attention, Although I implemented all the details provided by the paper, it still seems not as good as the paper reported, even after I discussed with auther and made some modifications.

To Train Counting Model

Set following parameters in config.py:

model_type = 'counting'

To Train Bottom-up Top-down

model_type = 'baseline' 

To Train Bilinear attention network

model_type = 'ban' 

Note that BAN is very Memory Comsuming, so please ensure you got enough GPUs and run main.py with CUDA_VISIBLE_DEVICES=0,1,2,3

To Train Intra- and Inter-modality Attention

model_type = 'inter_intra' 

You may need to change the learning rate decay strategy as well from gradual_warmup_steps and lr_decay_epochs in config.py

To Train Learning Conditioned Graph Structures

model_type = 'graph' 

Though this method seem less competitive.

Looking for previous methods to compare in your experiments

Please refer to my CVPR 2019 oral paper:

@inproceedings{tang2018learning,
  title={Learning to Compose Dynamic Tree Structures for Visual Contexts},
  author={Tang, Kaihua and Zhang, Hanwang and Wu, Baoyuan and Luo, Wenhan and Liu, Wei},
  booktitle= "Conference on Computer Vision and Pattern Recognition",
  year={2019}
}

More Repositories

1

Scene-Graph-Benchmark.pytorch

A new codebase for popular Scene Graph Generation methods (2020). Visualization & Scene Graph Extraction on custom images/datasets are provided. It's also a PyTorch implementation of paper โ€œUnbiased Scene Graph Generation from Biased Training CVPR 2020โ€
Jupyter Notebook
1,049
star
2

Long-Tailed-Recognition.pytorch

[NeurIPS 2020] This project provides a strong single-stage baseline for Long-Tailed Classification, Detection, and Instance Segmentation (LVIS). It is also a PyTorch implementation of the NeurIPS 2020 paper 'Long-Tailed Classification by Keeping the Good and Removing the Bad Momentum Causal Effect'.
Jupyter Notebook
560
star
3

ResNet50-Pytorch-Face-Recognition

Using Pytorch to implement a ResNet50 for Cross-Age Face Recognition
Python
136
star
4

VCTree-Scene-Graph-Generation

Code for the Scene Graph Generation part of CVPR 2019 oral paper: "Learning to Compose Dynamic Tree Structures for Visual Contexts"
Python
119
star
5

Generalized-Long-Tailed-Benchmarks.pytorch

[ECCV 2022] A generalized long-tailed challenge that incorporates both the conventional class-wise imbalance and the overlooked attribute-wise imbalance within each class. The proposed IFL together with other baselines are also included.
Jupyter Notebook
114
star
6

GGNN-for-bAbI-dataset.pytorch.1.0

A Complete PyTorch 1.0 Implementation of Gated Graph Sequence Neural Networks (GGNN)
Python
53
star
7

ResNet50-Tensorflow-Face-Recognition

Using Tensorflow to implement a ResNet50 for Cross-Age Face Recognition
Python
45
star
8

VCTree-Visual-Question-Answering

Code for the Visual Question Answering (VQA) part of CVPR 2019 oral paper: "Learning to Compose Dynamic Tree Structures for Visual Contexts"
Python
35
star
9

Local-Disco-Diffusion-v5.2.jupyterNote

A custom Disco Diffusion v5.2 that runs on local GPUS.
Jupyter Notebook
23
star
10

CiiV-Adversarial-Robustness.pytorch

The official PyTorch Implementation of the Paper "Adversarial Visual Robustness by Causal Intervention"
Jupyter Notebook
18
star
11

LVIS-for-mmdetection

support Large Vocabulary Instance Segmentation (LVIS) dataset for mmdetection
Python
16
star
12

Kinetics-Data-Preprocessing

An instruction to 1) download the Kinetics-400/Kinetics-600, 2) resize the videos, and 3) prepare annotations.
Python
9
star
13

Qwen-Tokenizer-Pruner

Due to the huge vocaburary size (151,936) of Qwen models, the Embedding and LM Head weights are excessively heavy. Therefore, this project provides a Tokenizer vocabulary shearing solution for Qwen and Qwen-VL.
Python
7
star
14

Describe-and-Guess-GAME-Using-GPT-3

A simple demo of how to use GPT-3 to play Describe-and-Guess in the specified topic and question type.
Python
6
star
15

kai-blog

SCSS
1
star
16

faster-rcnn.pytorch

Python
1
star
17

Minimalist-TinyLLaMA-to-Onnx

Export TinyLLaMA to Onnx and Conduct LLM inference using onnxruntime
Python
1
star
18

Quick-Draw-Multimodal-Recognition

The Course Project of CE7454 (Team 13)
Jupyter Notebook
1
star