• Stars
    star
    187
  • Rank 205,232 (Top 5 %)
  • Language
    Python
  • License
    MIT License
  • Created almost 4 years ago
  • Updated over 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Subgraph Neural Networks (NeurIPS 2020)

SubGNN

Repository for NeurIPS 2020 paper: Subgraph Neural Networks

Authors: Emily Alsentzer*, Sam Finlayson*, Michelle Li, Marinka Zitnik

Project Website

To use SubGNN, do the following:

  • Install the environment
  • Prepare data
  • Modify PROJECT_ROOT in config.py
  • Modify the appropriate config.json file
  • Train and evaluate SubGNN

Install the Environment

We provide a yml file containing the necessary packages for SubGNN. Once you have conda installed, you can create an environment as follows:

conda env create --file SubGNN.yml 

Prepare data

Prepare data for SubGNN by either (1) downloading our provided datasets or following the steps in the prepare_dataset folder README to (2) generate synthetic datasets or (3) format your own data.

Real-World Datasets: We are releasing four new real-world datasets: HPO-NEURO, HPO-METAB, PPI-BP, and EM-USER. You can download these files from Dropbox here. You should unzip the folder and set the PROJECT_ROOT in config.py to the path where you downloaded the data (e.g. /PATH/TO/SubGNN_data).

Synthetic Datasets: We also provide a script to generate the DENSITY, CORENESS, COMPONENT, and CUTRATIO synthetic graphs featured in our paper. See the README in the prepare_dataset folder for more information on how to generate these synthetic datasets.

Your Own Data: To use your own data with SubGNN, you will need an edge list file containing the edges of the base graph and a file containing the node ids of the subgraphs, their labels, and whether they are in the train/val/test splits. Then you will need to generate node embeddings and precompute similarity metrics. For more info on how to do this, refer to the README in the prepare_dataset folder.

How to Train

To train SubGNN, you should first specify your project directory via PROJECT_ROOT in config.py if you haven't already. This directory should include folders containing all datasets and will ultimately contain all tensorboard folders with model outputs. Then, modify the config.json file for the appropriate dataset to set the tensorboard output directory and the hyperparameter search ranges, including which SubGNN channels (neighborhood, structure, or position) to turn on. To learn more about the hyperparameters, go to the README in the config_files folder. Finally, train the model via the following:

cd SubGNN
python train_config.py -config_path config_files/hpo_metab/metab_config.json

The model and asssociated hyperparameters will be saved in the tensorboard directory specified by tb_dir and tb_name in the config file. We use the hpo_metab dataset as as example, but you can easily run any of the datasets by passing in the appropriate config file. Note that, while you can also train the model via train.py, we highly recommend using train_config.py instead.

How to Evaluate

Re-train & test on 10 random seeds

Once you have trained SubGNN and selected the best hyperparameters on the validaation set, run the test.py script to re-train the model on 10 random seeds and evaluate on the test set:

cd SubGNN
python test.py \
-task hpo_metab \
-tb_dir NAME_OF_TENSORBOARD_FOLDER \
-tb_name NAME_OF_RUN_TYPE
-restoreModelPath PATH/TO/MODEL/LOCATION/WITH/BEST/HYPERPARAMETERS

Note that the restoreModelPath directory should contain a .ckpt file and a hyperparams.json file. This command will create a tensorboard directory at PROJECT_ROOT/tb_dir/tb_name where tb_dir and tb_name are specified by the input parameters. The test performance on each random seed will be saved in test_results.json files in folders in this tensorboard directory. The experiment_results.json file summarizes test performance across all random seeds.

Test on single random seed

You can also evaluate the model on a single random seed. You can use train.py with the -noTrain and -runTest flags to restore a specific model and evaluate on test data. The results will be printed to the console.

cd SubGNN
python train.py \
-task hpo_metab \
-noTrain \
-runTest \
-no_save \ 
-restoreModelPath PATH/TO/SAVED/MODEL \ 
-restoreModelName CHECKPOINT_FILE_NAME.ckpt

How to Cite

@article{alsentzer2020subgraph,
  title={Subgraph Neural Networks},
  author={Alsentzer, Emily and Finlayson, Samuel G and Li, Michelle M and Zitnik, Marinka},
  journal={Proceedings of Neural Information Processing Systems, NeurIPS},
  year={2020}
}

Contact Us

Please open an issue or contact [email protected] with any questions.

More Repositories

1

TDC

Therapeutics Commons (TDC-2): Multimodal Foundation for Therapeutic Science
Jupyter Notebook
979
star
2

nimfa

Nimfa: Nonnegative matrix factorization in Python
Python
535
star
3

decagon

Graph convolutional neural network for multirelational link prediction
Jupyter Notebook
442
star
4

TFC-pretraining

Self-supervised contrastive learning for time series via time-frequency consistency
Python
405
star
5

UniTS

A unified multi-task time series model.
Python
386
star
6

PrimeKG

Precision Medicine Knowledge Graph (PrimeKG)
Jupyter Notebook
364
star
7

graphml-tutorials

Tutorials for Machine Learning on Graphs
Jupyter Notebook
204
star
8

Raindrop

Graph Neural Networks for Irregular Time Series
Python
162
star
9

GraphXAI

GraphXAI: Resource to support the development and evaluation of GNN explainers
Python
151
star
10

scikit-fusion

scikit-fusion: Data fusion via collective latent factor models
Python
143
star
11

G-Meta

Graph meta learning via local subgraphs (NeurIPS 2020)
Python
119
star
12

Raincoat

Domain Adaptation for Time Series Under Feature and Label Shifts
Jupyter Notebook
105
star
13

ohmnet

OhmNet: Representation learning in multi-layer graphs
Python
78
star
14

PINNACLE

Contextualizing protein representations using deep learning on protein networks and single-cell data
Python
62
star
15

TxGNN

TxGNN: Zero-shot prediction of therapeutic use with geometric deep learning and clinician centered design
Jupyter Notebook
61
star
16

GNNGuard

Defending graph neural networks against adversarial attacks (NeurIPS 2020)
Python
57
star
17

SHEPHERD

SHEPHERD: Few shot learning for phenotype-driven diagnosis of patients with rare genetic diseases
HTML
41
star
18

GNNDelete

General Strategy for Unlearning in Graph Neural Networks
Python
36
star
19

crank

Prioritizing network communities
C++
28
star
20

TimeX

Time series explainability via self-supervised model behavior consistency
Python
25
star
21

SPECTRA

Spectral Framework For AI Model Evaluation
Roff
23
star
22

pathways

Disease pathways in the human interactome
Python
22
star
23

fastGNMF

Fast graph-regularized matrix factorization
Python
19
star
24

PDGrapher

Combinatorial prediction of therapeutic perturbations using causally-inspired neural networks
Python
17
star
25

fusenet

Network inference by fusing data from diverse distributions
Python
14
star
26

medusa

Jumping across biomedical contexts using compressive data fusion
Python
7
star
27

life-tree

Evolution of protein interactomes across the tree of life
C++
7
star
28

patient-safety

Population-scale patient safety data reveal inequalities in adverse events before and during COVID-19 pandemic
Jupyter Notebook
7
star
29

nimfa-ipynb

IPython notebooks demonstrating Nimfa's functionality
6
star
30

scCIPHER

scCIPHER: Contextual deep learning on single-cell-enriched knowledge graphs in neurological disorders
Jupyter Notebook
5
star
31

ngmc

Network-guided matrix completion
Python
3
star
32

BMI702

Biomedical Artificial Intelligence
HTML
3
star
33

AWARE

AWARE: Contextualizing protein representations using deep learning on interactomes and single-cell experiments
Python
3
star
34

data-mining-unipv

Short Course on Data Mining at University of Pavia
Jupyter Notebook
2
star
35

collage-dicty

Gene prioritization by compressive data fusion and chaining
Python
2
star
36

copacar

Collective pairwise classification for multi-way (multi-relational) data analysis
Python
1
star
37

mims-harvard.github.io

Lab website
HTML
1
star