• Stars
    star
    263
  • Rank 155,624 (Top 4 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created almost 2 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering".

Prophet

PWC PWC

This repository is the official implementation of the Prophet, a two stage framework designed to prompt GPT-3 with answer heuristics for knowledge-based VQA. In stage one, we train a vanilla VQA model on a specific knowledge-based VQA dataset and extract two types of complementary answer heuristics from the model: answer candidates and answer-aware examples. In stage two, answer heuristics are used to prompt GPT-3 to generate better answers. Prophet significantly outperforms existing state-of-the-art methods on two datasets, delivering 61.1% on OK-VQA and 55.7% on A-OKVQA. Please refer to our paper for details.

prophet

Updates

April 28, 2023

  • Add pretrained and finetuned models on A-OKVOA.

March 10, 2023

  • Training and testing codes of the two-stages Prophet framework.
  • Pretrained and finetuned models on OK-VOA.

Table of Contents

Prerequisites

Hardware and Software Requirements

To conduct the following experiments, a machine with at least 1 RTX 3090 GPU, 50GB memory, and 300GB free disk space is recommended. We strongly recommend using an SSD drive to guarantee high-speed I/O.

Following software is needed:

  1. Python >= 3.9
  2. Cuda >= 11.3
  3. Pytorch >= 12.0
  4. what you can find in environment.yml

We recommend downloading Anaconda first and then creating a new environment with the following command:

$ conda env create -f environment.yml

This command will create a new environment named prophet with all the required packages. To activate the environment, run:

$ conda activate prophet

Data Preparation

Before running the code, prepare two folders: datasets and assets. The datasets folder contains all the datasets and features used in this project, and the assets folder contains the pre-computed resources and other intermediate files (you can use them to skip some early experiment steps and save time).

First, download the datasets and assets. Then put the datasets and assets folder in the root directory of this project. Download MSCOCO 2014 and 2017 images from here (you can skip MSCOCO 2017 if you only experiments on OK-VQA) and put them in the datasets folder. Run the following command to extract the features of the images:

$ bash scripts/extract_img_feats.sh

After that, the datasets and assets folder will have the following structure:

Click to expand
datasets
β”œβ”€β”€ aokvqa
β”‚Β Β  β”œβ”€β”€ aokvqa_v1p0_test.json
β”‚Β Β  β”œβ”€β”€ aokvqa_v1p0_train.json
β”‚Β Β  └── aokvqa_v1p0_val.json
β”œβ”€β”€ coco2014
β”‚Β Β  β”œβ”€β”€ train2014
β”‚Β Β  └── val2014
β”œβ”€β”€ coco2014_feats
β”‚Β Β  β”œβ”€β”€ train2014
β”‚Β Β  └── val2014
β”œβ”€β”€ coco2017
β”‚Β Β  β”œβ”€β”€ test2017
β”‚Β Β  β”œβ”€β”€ train2017
β”‚Β Β  └── val2017
β”œβ”€β”€ coco2017_feats
β”‚Β Β  β”œβ”€β”€ test2017
β”‚Β Β  β”œβ”€β”€ train2017
β”‚Β Β  └── val2017
β”œβ”€β”€ okvqa
β”‚Β Β  β”œβ”€β”€ mscoco_train2014_annotations.json
β”‚Β Β  β”œβ”€β”€ mscoco_val2014_annotations.json
β”‚Β Β  β”œβ”€β”€ OpenEnded_mscoco_train2014_questions.json
β”‚Β Β  └── OpenEnded_mscoco_val2014_questions.json
└── vqav2
    β”œβ”€β”€ v2_mscoco_train2014_annotations.json
    β”œβ”€β”€ v2_mscoco_val2014_annotations.json
    β”œβ”€β”€ v2_OpenEnded_mscoco_train2014_questions.json
    β”œβ”€β”€ v2_OpenEnded_mscoco_val2014_questions.json
    β”œβ”€β”€ v2valvg_no_ok_annotations.json
    β”œβ”€β”€ v2valvg_no_ok_questions.json
    β”œβ”€β”€ vg_annotations.json
    └── vg_questions.json

We've also provided a tree structure of the entire project in misc/tree.txt.

Usage

We provide bash scripts for each stage of the Prophet framework. You can find them in the scripts directory. There are two common arguments you should take care of when running each script:

  • --task: specify the task (i.e., the target dataset) you want to deal with. The available options are ok (training on train set of OK-VQA and evaluating on the test set of OK-VQA), aok_val (training on train set of A-OKVQA and evaluating on the val set of A-OKVQA) and aok_test (training on train set and val set of A-OKVQA and evaluating on the test set of A-OKVQA);

Note that although Prophet uses VQA v2 datasets for pre-training, there are slight differences in how the datasets are used for different tasks (ok, aok_val, and aok_test), as detailed in configs/task_to_split.py. This means that different pre-training commands need to be followed for each task.

  • --version: specify the version name of this run. This name will be used to create a new folder in the outputs directory to store the results of this run.

Notice that you can omit any arguments when invoking following scripts, it will then use the default arguments written in the script files.

Before running any script, you can also update the configuration files (*.yml) in the configs directory to change hyperparameters.

1. OK-VQA

Take OK-VQA for example, Propht consists of two phases, stage one for training a vanilla VQA model and extracting answer heuristics, and stage two for prompting GPT-3 with answer heuristics.

Stage one

At this stage, we train an improved MCAN model (check the paper for detail description) through pretraning on VQA v2 and finetuning on target dataset. Multiple GPUs are supported by setting --gpu 0,1,2,3 (for example). Run pretraining step with commands:

$ bash scripts/pretrain.sh \
    --task ok --version okvqa_pretrain_1 --gpu 0

We've provided a pretrained model for OK-VQA here. Then, run finetuning step with commands:

$ bash scripts/finetune.sh \
    --task ok --version okvqa_finetune_1 --gpu 0 \
    --pretrained_model outputs/okvqa_pretrain_1/ckpts/epoch_13.pkl

All epoch checkpoints are saved in outputs/ckpts/{your_version_name}. We've also provided a finetuned model for OK-VQA here. You may pick one to generate answer heuristics by run following command:

$ bash scripts/heuristics_gen.sh \
    --task ok --version okvqa_heuristics_1
    --gpu 0 --ckpt_path outputs/okvqa_finetune_1/ckpts/epoch_6.pkl
    --candidate_num 10 --example_num 100

The extracted answer heuristics will be stored as candidates.json and examples.json in outputs/results/{your_version_name} directory.

Stage two

You may need the candidates.json and examples.json files generated in the former stage to step into this stage. Or you can just skip stage one, and use the files of answer heuristics we provided in assets. Especially, the candidates.json and examples.json files for OK-VQA are answer_aware_examples_okvqa.json and candidates_okvqa.json. To prompt GPT-3 with answer heuristics and generate better answers, run the following command:

$ bash scripts/prompt.sh \
    --task ok --version okvqa_prompt_1 \
    --examples_path outputs/results/okvqa_heuristics_1/examples.json \ 
    --candidates_path outputs/results/okvqa_heuristics_1/candidates.json \
    --openai_key sk-xxxxxxxxxxxxxxxxxxxxxx

The result file will be stored as result.json in outputs/results/{your_version_name} directory.

We also provide example scripts for the aok_val and aok_test modes on A-OKVQA.

Click to expand

2. A-OKVQA (val)

Stage one

Similary, for task of aok_val, run pretraining step with commands:

$ bash scripts/pretrain.sh \
    --task aok_val --version aokvqa_val_pretrain_1 --gpu 0

We've provided a pretrained model for aok_val here.Then, run finetuning step with commands:

$ bash scripts/finetune.sh \
    --task aok_val --version aokvqa_val_finetune_1 --gpu 0 \
    --pretrained_model outputs/aokvqa_val_pretrain_1/ckpts/epoch_13.pkl

All epoch checkpoints are saved in outputs/ckpts/{your_version_name}.We've also provided a finetuned model for aok_val here. You may pick one to generate answer heuristics by run following command:

$ bash scripts/heuristics_gen.sh \
    --task aok_val --version aokvqa_val_heuristics_1
    --gpu 0 --ckpt_path outputs/aokvqa_val_finetune_1/ckpts/epoch_6.pkl
    --candidate_num 10 --example_num 100

The extracted answer heuristics will be stored as candidates.json and examples.json in outputs/results/{your_version_name} directory.

Stage two

You may need the candidates.json and examples.json files generated in the former stage to step into this stage. Or you can just skip stage one, and use the files of answer heuristics we provided in assets. Especially, the candidates.json and examples.json files for aok_val are examples_aokvqa_val.json and candidates_aokvqa_val.json. To prompt GPT-3 with answer heuristics and generate better answers, run the following command:

$ bash scripts/prompt.sh \
    --task ok --version okvqa_val_prompt_1 \
    --examples_path outputs/results/aokvqa_val_heuristics_1/examples.json \ 
    --candidates_path outputs/results/aokvqa_val_heuristics_1/candidates.json \
    --captions_path assets/captions_aokvqa.json \
    --openai_key sk-xxxxxxxxxxxxxxxxxxxxxx

The result file will be stored as result.json in outputs/results/{your_version_name} directory.

3. A-OKVQA (test)

For task of aok_val, run pretraining step with commands:

Stage one

$ bash scripts/pretrain.sh \
    --task aok_test --version aokvqa_test_pretrain_1 --gpu 0

We've provided a pretrained model for aok_test here. Then, run finetuning step with commands:

$ bash scripts/finetune.sh \
    --task aok_test --version aokvqa_test_finetune_1 --gpu 0 \
    --pretrained_model outputs/aokvqa_test_pretrain_1/ckpts/epoch_13.pkl

All epoch checkpoints are saved in outputs/ckptss/{your_version_name}.We've also provided a finetuned model for aok_test here. You may pick one to generate answer heuristics by run following command:

$ bash scripts/heuristics_gen.sh \
    --task aok_test --version aokvqa_test_heuristics_1
    --gpu 0 --ckpt_path outputs/aokvqa_test_finetune_1/ckpts/epoch_6.pkl
    --candidate_num 10 --example_num 100

The extracted answer heuristics will be stored as candidates.json and examples.json in outputs/results/{your_version_name} directory.

Stage two

You may need the candidates.json and examples.json files generated in the former stage to step into this stage. Or you can just skip stage one, and use the files of answer heuristics we provided in assets. Especially, the candidates.json and examples.json files for aok_test are examples_aokvqa_test.json and candidates_aokvqa_test.json. To prompt GPT-3 with answer heuristics and generate better answers, run the following command:

$ bash scripts/prompt.sh \
    --task ok --version okvqa_test_prompt_1 \
    --examples_path outputs/results/aokvqa_test_heuristics_1/examples.json \ 
    --candidates_path outputs/results/aokvqa_test_heuristics_1/candidates.json \
    --captions_path assets/captions_aokvqa.json \
    --openai_key sk-xxxxxxxxxxxxxxxxxxxxxx

The result file will be stored as result.json in outputs/results/{your_version_name} directory.

Evaluation

For the task of ok and aok_val whose annotations are available, the scores are automatically computed after finetuning and prompting. You can also evaluate the result files that outputted after finetuning or prompting, by run

$ bash scripts/evaluate_file.sh \
    --task ok --result_path outputs/results/okvqa_prompt_1/result.json

Using the corresponding result files and evaluation script above, we obtain the accuracies in the following table, respectively.

OK-VQA A-OKVQA (val) A-OKVQA (test)
MCAN Prophet
53.0% 61.1%
MCAN Prophet
52.0% 58.2%
MCAN Prophet
45.6% 55.7%

For the task of aok_test, you need to submit the result file to the A-OKVQA Leaderboard to evaluate the result.

Citation

If you use this code in your research, please cite our paper:

@inproceedings{shao2023prompting,
  title={Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering},
  author={Shao, Zhenwei and Yu, Zhou and Wang, Meng and Yu, Jun},
  booktitle={Computer Vision and Pattern Recognition (CVPR)},
  pages={14974--14983},
  year={2023}
}

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.