• Stars
    star
    119
  • Rank 288,664 (Top 6 %)
  • Language
    Python
  • License
    MIT License
  • Created about 5 years ago
  • Updated about 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Code for the Scene Graph Generation part of CVPR 2019 oral paper: "Learning to Compose Dynamic Tree Structures for Visual Contexts"

VCTree-Scene-Graph-Generation

If you like our work, and want to start your own scene graph generation project, you might be interested in our new SGG codebase: Scene-Graph-Benchmark.pytorch. It's much easier to follow, and provides state-of-the-art baseline models.

Code for the Scene Graph Generation part of CVPR 2019 oral paper: "Learning to Compose Dynamic Tree Structures for Visual Contexts", as to the VQA part of this paper, please refer to KaihuaTang/VCTree-Visual-Question-Answering

UGLY CODE WARNING! UGLY CODE WARNING! UGLY CODE WARNING!

The code is directly modified from the project rowanz/neural-motifs. Most of the Codes about the proposed VCTree are located at lib/tree_lstm/*, and if you get any problem that cause you unable to run the project, you can check the issues under rowanz/neural-motifs first.

Dependencies

  • You may follow these commands to establish the environments under Ubuntu system
Install Anaconda
conda update -n base conda
conda create -n motif pip python=3.6
conda install pytorch=0.3 torchvision cuda90 -c pytorch
bash install_package.sh

Prepare Dataset and Setup

  1. Please follow the Instruction under ./data/stanford_filtered/ to download the dateset and put them under proper locations.

  2. Update the config file with the dataset paths. Specifically:

    • Visual Genome (the VG_100K folder, image_data.json, VG-SGG.h5, and VG-SGG-dicts.json). See data/stanford_filtered/README.md for the steps I used to download these.
    • You'll also need to fix your PYTHONPATH: export PYTHONPATH=/home/YourName/ThePathOfYourProject
  3. Compile everything. run make in the main directory: this compiles the Bilinear Interpolation operation for the RoIs.

  4. Pretrain VG detection. The old version involved pretraining COCO as well, but we got rid of that for simplicity. Run ./scripts/pretrain_detector.sh Note: You might have to modify the learning rate and batch size, particularly if you don't have 3 Titan X GPUs (which is what I used). You can also download the pretrained detector checkpoint here. Note that, this detector model is the default initialization of all VCTree models, so when you download this checkpoint, you need to change the "-ckpt THE_PATH_OF_INITIAL_CHECKPOINT_MODEL" under ./scripts/train_vctreenet

How to Train / Evaluation

  1. Note that, most of the parameters are under config.py. The training stages and settings are manipulated through ./scripts/train_vctreenet.sh Each line of command in train_vctreenet.sh needs to manually indicate "-ckpt" model (initial parameters) and "-save_dir" the path to save model. Since we have hybrid learning strategy, each task predcls/sgcls/sgdet will have two options for supervised stage and reinformence finetuning stage, respectively. When iteratively switch the stages, the -ckpt PATH should start with previous -save_dir PATH. The first supervised stage will init with detector checkpoint as mentioned above.

  2. Train VG predicate classification (predcls)

    • stage 1 (supervised stage of hybrid learning): run ./scripts/train_vctreenet.sh 5
    • stage 2 (reinformence finetuning stage of hybrid learning): run ./scripts/train_vctreenet.sh 4
    • (By default, it will run on GPU 2, you can modify CUDA_VISIBLE_DEVICES under train_vctreenet.sh).
    • The model will be saved by the name "-save_dir checkpoints/THE_NAME_YOU_WILL_SAVE_THE_MODEL"
  3. Train VG scene graph classification (sgcls)

    • stage 1 (supervised stage of hybrid learning): run ./scripts/train_vctreenet.sh 3
    • stage 2 (reinformence finetuning stage of hybrid learning): run ./scripts/train_vctreenet.sh 2
    • (By default, it will run on GPU 2, you can modify CUDA_VISIBLE_DEVICES under train_vctreenet.sh).
    • The model will be saved by the name "-save_dir checkpoints/THE_NAME_YOU_WILL_SAVE_THE_MODEL"
  4. Train VG scene graph detection (sgdet)

    • stage 1 (supervised stage of hybrid learning): run ./scripts/train_vctreenet.sh 1
    • stage 2 (reinformence finetuning stage of hybrid learning): run ./scripts/train_vctreenet.sh 0
    • (By default, it will run on GPU 2, you can modify CUDA_VISIBLE_DEVICES under train_vctreenet.sh).
    • The model will be saved by the name "-save_dir checkpoints/THE_NAME_YOU_WILL_SAVE_THE_MODEL"
  5. Evaluate predicate classification (predcls):

    • run ./scripts/eval_models.sh 0
    • OR, You can simply download our predcls checkpoint: VCTree/PredCls.
  6. Evaluate scene graph classification (sgcls):

    • run ./scripts/eval_models.sh 1
    • OR, You can simply download our sgcls checkpoint: VCTree/SGCls.
  7. Evaluate scene graph detection (sgdet):

    • run ./scripts/eval_models.sh 2
    • OR, You can simply download our sgdet checkpoint: VCTree/SGDET.

Other Things You Need To Know

  • When you evaluate your model, you will find 3 metrics are printed: 1st, "R@20/50/100" is what we use to report R@20/50/100 in our paper, 2nd, "cls avg" is corresponding mean recall mR@20/50/100 proposed by our paper, "total R" is another way to calculate recall that used in some previous papers/projects, which is quite tricky and unfair, because it almost always get higher recall.
  • The reinforcement part of hybrid learning is still far from satisfactory. Hence if you are interested in imporving our work, you may start with this part.

If this paper/project inspires your work, pls cite our work:

@inproceedings{tang2018learning,
  title={Learning to Compose Dynamic Tree Structures for Visual Contexts},
  author={Tang, Kaihua and Zhang, Hanwang and Wu, Baoyuan and Luo, Wenhan and Liu, Wei},
  booktitle= "Conference on Computer Vision and Pattern Recognition",
  year={2019}
}

More Repositories

1

Scene-Graph-Benchmark.pytorch

A new codebase for popular Scene Graph Generation methods (2020). Visualization & Scene Graph Extraction on custom images/datasets are provided. It's also a PyTorch implementation of paper โ€œUnbiased Scene Graph Generation from Biased Training CVPR 2020โ€
Jupyter Notebook
995
star
2

Long-Tailed-Recognition.pytorch

[NeurIPS 2020] This project provides a strong single-stage baseline for Long-Tailed Classification, Detection, and Instance Segmentation (LVIS). It is also a PyTorch implementation of the NeurIPS 2020 paper 'Long-Tailed Classification by Keeping the Good and Removing the Bad Momentum Causal Effect'.
Jupyter Notebook
547
star
3

VQA2.0-Recent-Approachs-2018.pytorch

A pytroch reimplementation of "Bilinear Attention Network", "Intra- and Inter-modality Attention", "Learning Conditioned Graph Structures", "Learning to count object", "Bottom-up top-down" for Visual Question Answering 2.0
Python
284
star
4

ResNet50-Pytorch-Face-Recognition

Using Pytorch to implement a ResNet50 for Cross-Age Face Recognition
Python
124
star
5

Generalized-Long-Tailed-Benchmarks.pytorch

[ECCV 2022] A generalized long-tailed challenge that incorporates both the conventional class-wise imbalance and the overlooked attribute-wise imbalance within each class. The proposed IFL together with other baselines are also included.
Jupyter Notebook
111
star
6

GGNN-for-bAbI-dataset.pytorch.1.0

A Complete PyTorch 1.0 Implementation of Gated Graph Sequence Neural Networks (GGNN)
Python
52
star
7

ResNet50-Tensorflow-Face-Recognition

Using Tensorflow to implement a ResNet50 for Cross-Age Face Recognition
Python
47
star
8

VCTree-Visual-Question-Answering

Code for the Visual Question Answering (VQA) part of CVPR 2019 oral paper: "Learning to Compose Dynamic Tree Structures for Visual Contexts"
Python
34
star
9

Local-Disco-Diffusion-v5.2.jupyterNote

A custom Disco Diffusion v5.2 that runs on local GPUS.
Jupyter Notebook
22
star
10

CiiV-Adversarial-Robustness.pytorch

The official PyTorch Implementation of the Paper "Adversarial Visual Robustness by Causal Intervention"
Jupyter Notebook
19
star
11

LVIS-for-mmdetection

support Large Vocabulary Instance Segmentation (LVIS) dataset for mmdetection
Python
16
star
12

Kinetics-Data-Preprocessing

An instruction to 1) download the Kinetics-400/Kinetics-600, 2) resize the videos, and 3) prepare annotations.
Python
9
star
13

Describe-and-Guess-GAME-Using-GPT-3

A simple demo of how to use GPT-3 to play Describe-and-Guess in the specified topic and question type.
Python
6
star
14

kai-blog

SCSS
1
star
15

faster-rcnn.pytorch

Python
1
star
16

Quick-Draw-Multimodal-Recognition

The Course Project of CE7454 (Team 13)
Jupyter Notebook
1
star