• Stars
    star
    180
  • Rank 213,097 (Top 5 %)
  • Language
    Python
  • License
    GNU General Publi...
  • Created over 2 years ago
  • Updated 8 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

GOOD: A Graph Out-of-Distribution Benchmark [NeurIPS 2022 Datasets and Benchmarks]

✨ GOOD: A Graph Out-of-Distribution Benchmark ✨

Documentation Status Last Commit License codecov CircleCI GOOD stars Contributing

Documentation | NeurIPS 2022 Paper | Preprint

This repo maintains and updates GOOD benchmark which is accepted by NeurIPS 2022 Datasets and Benchmarks Track. 😄

Roadmap

Tutorial

Algorithms

* denotes the method is reproduced by its authors.

Datasets

We are planning to include more graph out-of-distribution datasets for your convenience.

Features

  • Updated final result output for an easier result gathering. [Feb 20th updates]

Leaderboard [Feb 20th updates]

  • The leaderboard 1.1.0 on latest datasets will have larger hyperparameter spaces and more runs for hyperparameter sweeping.
  • Results will be posted on this leaderboard gradually.

Table of contents

Overview

GOOD (Graph OOD) is a graph out-of-distribution (OOD) algorithm benchmarking library depending on PyTorch and PyG to make develop and benchmark OOD algorithms easily.

Currently, GOOD contains 11 datasets with 17 domain selections. When combined with covariate, concept, and no shifts, we obtain 51 different splits. We provide performance results on 12 commonly used baseline methods (ERM, IRM, VREx, GroupDRO, Coral, DANN, MixupForGraph, DIR, GSAT, CIGA, EERM,SRGNN) including 6 graph specific methods with 10 random runs.

The GOOD dataset summaries are shown in the following figure.

Dataset

Why GOOD?

Whether you are an experienced researcher of graph out-of-distribution problems or a first-time learner of graph deep learning, here are several reasons to use GOOD as your Graph OOD research, study, and development toolkit.

  • Easy-to-use APIs: GOOD provides simple APIs for loading OOD algorithms, graph neural networks, and datasets so that you can take only several lines of code to start.
  • Flexibility: Full OOD split generalization code is provided for extensions and any new graph OOD dataset contributions. OOD algorithm base class can be easily overwritten to create new OOD methods.
  • Easy-to-extend architecture: In addition to playing as a package, GOOD is also an integrated and well-organized project ready to be further developed. All algorithms, models, and datasets can be easily registered by register and automatically embedded into the designed pipeline like a breeze! The only thing the user needs to do is write your own OOD algorithm class, your own model class, or your new dataset class. Then you can compare your results with the leaderboard.
  • Easy comparisons with the leaderboard: We provide insightful comparisons from multiple perspectives. Any research and studies can use our leaderboard results for comparison. Note that this is a growing project, so we will include new OOD algorithms gradually. Besides, if you hope to include your algorithms in the leaderboard, please contact us or contribute to this project. A big welcome!

Installation

Conda dependencies

GOOD depends on PyTorch (>=1.6.0), PyG (>=2.0), and RDKit (>=2020.09.5). For more details: conda environment

Note that we currently test on PyTorch (==1.10.1), PyG (==2.0.4), RDKit (==2020.09.5); thus we strongly encourage to install these versions.

Warning: Please install with cuda >= 11.3 to avoid unexpected cuda errors.

A recommended installation example:

# Create your own conda environment, then...
conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge
conda install pyg -c pyg
conda install -c conda-forge rdkit==2020.09.5

Pip

Installation for Project usages (recommended)

git clone https://github.com/divelab/GOOD.git && cd GOOD
pip install -e .

Quick Tutorial

Run an algorithm

It is a good beginning to make it work directly. Here, we provide the CLI goodtg (GOOD to go) to access the main function located at GOOD.kernel.main:goodtg. Choosing a config file in configs/GOOD_configs, we can start a task:

goodtg --config_path GOOD_configs/GOODCMNIST/color/concept/DANN.yaml

Hyperparameter sweeping

To perform automatic hyperparameter sweeping and job launching, you can use goodtl (GOOD to launch):

goodtl --sweep_root sweep_configs --launcher MultiLauncher --allow_datasets GOODMotif --allow_domains basis --allow_shifts covariate --allow_algs GSAT --allow_devices 0 1 2 3
  • --sweep_root is a config fold located at configs/sweep_configs, where we provide a GSAT algorithm hyperparameter sweeping setting example (on GOODMotif dataset, basis domain, and covariate shift).
    • Each hyperparameter searching range is specified by a list of values. Example
    • These hyperparameter configs will be transformed to be CLI argument combinations.
    • Note that hyperparameters in inner config files will overwrite the outer ones.
  • --launcher denotes the chosen job launcher. Available launchers:
    • Launcher: Dummy launcher, only print.
    • SingleLauncher: Sequential job launcher. Choose the first device in --allow_devices.
    • MultiLauncher: Multi-gpu job launcher. Launch on all gpus specified by --allow_devices.
  • --allow_XXX denotes the job scale. Note that for each "allow" combination (e.g. GSAT GOODMotif basis covariate), there should be a corresponding sweeping config: GSAT/GOODMotif/basis/covaraite/base.yaml in the fold specified by --sweep_root.
  • --allow_devices specifies the gpu devices used to launch jobs.

Sweeping result collection and config update.

To harvest all fruits you have grown (collect all results you have run), please use goodtl with a special launcher HarvestLauncher:

goodtl --sweep_root sweep_configs --final_root final_configs --launcher HarvestLauncher --allow_datasets GOODMotif --allow_domains basis --allow_shifts covariate --allow_algs GSAT
  • --sweep_root: We still need it to specify the experiments that can be harvested.
  • --final_root: A config store place that will store the best config settings. We will update the best configurations (according to the sweeping) into the config files in it.

(Experimental function.)

The output numpy array:

  • Rows: In-distribution train/In-distribution test/Out-of-distribution train/Out-of-distribution test/Out-of-distribution validation
  • Columns: Mean/Std.

Final runs

It is sometimes not practical to run 10 rounds for hyperparameter sweeping, especially when the searching space is huge. Therefore, we can generally run hyperparameter sweeping for 2~3 rounds, then perform all rounds after selecting the best hyperparameters. Now, remove the --sweep_root, set --config_root to your updated best config saving location, and set the --allow_rounds.

goodtl --config_root final_configs --launcher MultiLauncher --allow_datasets GOODMotif --allow_domains basis --allow_shifts covariate --allow_algs GSAT --allow_devices 0 1 2 3 --allow_rounds 1 2 3 4 5 6 7 8 9 10

Note that the results are valid only after 3+ rounds experiments in this benchmark.

Final result collection

goodtl --config_root final_configs --launcher HarvestLauncher --allow_datasets GOODMotif --allow_domains basis --allow_shifts covariate --allow_algs GSAT --allow_rounds 1 2 3 4 5 6 7 8 9 10

Output: Markdown format table. (This table is also saved in the file: <Project_root>/result_table.md).

You can customize your own launcher at GOOD/kernel/launchers/.

Add a new algorithm

Please follow this documentation to add a new algorithm.

Any contributions are welcomed! Please refer to contributing for adding your algorithm into GOOD.

Leaderboard

The initial leaderboard results are listed in the paper. And the validation of these results is described here.

Leaderboard 1.1.0 with updated datasets will be available here.

Citing GOOD

If you find this repository helpful, please cite our paper.

@inproceedings{
gui2022good,
title={{GOOD}: A Graph Out-of-Distribution Benchmark},
author={Shurui Gui and Xiner Li and Limei Wang and Shuiwang Ji},
booktitle={Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
year={2022},
url={https://openreview.net/forum?id=8hHg-zs_p-h}
}

License

The GOOD datasets are under MIT license. The GOOD code are under GPLv3 license.

Discussion

Please submit new issues or start a new discussion for any technical or other questions.

Contact

Please feel free to contact Shurui Gui, Xiner Li, or Shuiwang Ji!

More Repositories

1

DIG

A library for graph deep learning research
Python
1,852
star
2

AIRS

Artificial Intelligence Research for Science (AIRS)
Python
497
star
3

MoleculeX

Python
160
star
4

lgcn

Python
136
star
5

GraphBP

Official implementation of "Generating 3D Molecules for Target Protein Binding" [ICML2022 Long Presentation]
Python
101
star
6

Noise2Same

Jupyter Notebook
63
star
7

dtn

Python
58
star
8

Non-Local-GNN

Official implementation of "Non-Local Graph Neural Networks" [TPAMI]
Python
21
star
9

vqa-text

Python
19
star
10

deepem3d

MATLAB
17
star
11

LECI

The implementation of "Joint Learning of Label and Environment Causal Independence for Graph Out-of-Distribution Generalization" (NeurIPS 2023)
Python
16
star
12

ATTA

Active Test-Time Adaptation: Theoretical Analyses and An Algorithm [ICLR 2024]
Python
16
star
13

svae

Tensorflow implementation of Spatial VAE via Matrix-Variate Normal Distributions
Python
14
star
14

completion

Python
13
star
15

DIG_storage

Save data or other files for DIG library usage
10
star
16

GPT

Python
9
star
17

Neighbor2Seq

Official implementation of "Neighbor2Seq: Deep Learning on Massive Graphs by Transforming Neighbors to Sequences" [SDM2022]
Python
9
star
18

RMwGGIS

Official implementation of "Gradient-Guided Importance Sampling for Learning Binary Energy-Based Models" [ICLR2023]
Python
9
star
19

msvae

Python
8
star
20

crnn

Python
7
star
21

slep

SLEP: Sparse Learning with Efficient Projections
7
star
22

cgan

Computational Modeling of Cellular Structures Using Conditional Deep Generative Networks
Python
4
star
23

AEANets

AEANets
Python
4
star
24

wsdmcup2022

wsdmcup2022
Python
3
star
25

MoleculeKit

2
star
26

3dem

Code for Cremi Task
2
star
27

cleftnet

cleftnet
2
star
28

bigneuron

Python
1
star
29

VoxelDCN

Python
1
star
30

icnn

Shell
1
star
31

vgg

Implementation of VGG
Python
1
star
32

sensors

Python
1
star
33

divelab

1
star