• Stars
    star
    181
  • Rank 210,841 (Top 5 %)
  • Language
    Jupyter Notebook
  • License
    MIT License
  • Created over 4 years ago
  • Updated over 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Distance Encoding for GNN Design

Distance-encoding for GNN design

This repository is the official PyTorch implementation of the DEGNN and DEAGNN framework reported in the paper:
Distance-Encoding -- Design Provably More PowerfulGNNs for Structural Representation Learning, to appear in NeurIPS 2020.

The project's home page is: http://snap.stanford.edu/distance-encoding/

Authors & Contact

Pan Li, Yanbang Wang, Hongwei Wang, Jure Leskovec

Questions on this repo can be emailed to [email protected] (Yanbang Wang)

Installation

Requirements: Python >= 3.5, Anaconda3

  • Update conda:
conda update -n base -c defaults conda
  • Install basic dependencies to virtual environment and activate it:
conda env create -f environment.yml
conda activate degnn-env
  • Install PyTorch >= 1.4.0 and torch-geometric >= 1.5.0 (please refer to the PyTorch and PyTorch Geometric official websites for more details). Commands examples are:
conda install pytorch=1.4.0 torchvision cudatoolkit=10.1 -c pytorch
pip install torch-scatter==latest+cu101 -f https://pytorch-geometric.com/whl/torch-1.4.0.html
pip install torch-sparse==latest+cu101 -f https://pytorch-geometric.com/whl/torch-1.4.0.html
pip install torch-cluster==latest+cu101 -f https://pytorch-geometric.com/whl/torch-1.4.0.html
pip install torch-spline-conv==latest+cu101 -f https://pytorch-geometric.com/whl/torch-1.4.0.html
pip install torch-geometric

The latest tested combination is: Python 3.8.2 + Pytorch 1.4.0 + torch-geometric 1.5.0.

Quick Start

python main.py --dataset celegans --feature sp --hidden_features 100 --prop_depth 1 --test_ratio 0.1 --epoch 300

    This uses 100-dimensional hidden features, 80/10/10 split of train/val/test set, and trains for 300 epochs.

  • To train DEAGNN-SPD for Task 3 (node-triads prediction) on C.elegans dataset:
python main.py --dataset celegans_tri --hidden_features 100 --prop_depth 2 --epoch 300 --feature sp --max_sp 5 --l2 1e-3 --test_ratio 0.1 --seed 9

    This enables 2-hop propagation per layer, truncates distance encoding at 5, and uses random seed 9.

  • To train DEGNN-LP (i.e. the random walk variant) for Task 1 (node-level prediction) on usa-airports using average accuracy as evaluation metric:
python main.py --dataset usa-airports --metric acc --hidden_features 100 --feature rw --rw_depth 2 --epoch 500 --bs 128 --test_ratio 0.1

Note that here the test_ratio currently contains both validation set and the actual test set, and will be changed to contain only test set.

  • To generate Figure2 LEFT of the paper (Simulation to validate Theorem 3.3):
python main.py --dataset simulation --max_sp 10

    The result will be plot to ./simulation_results.png.

  • All detailed training logs can be found at <log_dir>/<dataset>/<training-time>.log. A one-line summary will also be appended to <log_dir>/result_summary.log for each training instance.

Usage Summary

Interface for DE-GNN framework [-h] [--dataset DATASET] [--test_ratio TEST_RATIO]
                                      [--model {DE-GNN,GIN,GCN,GraphSAGE,GAT}] [--layers LAYERS]
                                      [--hidden_features HIDDEN_FEATURES] [--metric {acc,auc}] [--seed SEED] [--gpu GPU]
                                      [--data_usage DATA_USAGE] [--directed DIRECTED] [--parallel] [--prop_depth PROP_DEPTH]
                                      [--use_degree USE_DEGREE] [--use_attributes USE_ATTRIBUTES] [--feature FEATURE]
                                      [--rw_depth RW_DEPTH] [--max_sp MAX_SP] [--epoch EPOCH] [--bs BS] [--lr LR]
                                      [--optimizer OPTIMIZER] [--l2 L2] [--dropout DROPOUT] [--k K] [--n [N [N ...]]]
                                      [--N N] [--T T] [--log_dir LOG_DIR] [--summary_file SUMMARY_FILE] [--debug]

Optinal Arguments

  -h, --help            show this help message and exit
  
  # general settings
  --dataset DATASET     dataset name
  --test_ratio TEST_RATIO
                        ratio of the test against whole
  --model {DE-GCN,GIN,GAT,GCN,GraphSAGE}
                        model to use
  --layers LAYERS       largest number of layers
  --hidden_features HIDDEN_FEATURES
                        hidden dimension
  --metric {acc,auc}    metric for evaluating performance
  --seed SEED           seed to initialize all the random modules
  --gpu GPU             gpu id
  --adj_norm {asym,sym,None}
                        how to normalize adj
  --data_usage DATA_USAGE
                        use partial dataset
  --directed DIRECTED   (Currently unavailable) whether to treat the graph as directed
  --parallel            (Currently unavailable) whether to use multi cpu cores to prepare data
  
  # positional encoding settings
  --prop_depth PROP_DEPTH
                        propagation depth (number of hops) for one layer
  --use_degree USE_DEGREE
                        whether to use node degree as the initial feature
  --use_attributes USE_ATTRIBUTES
                        whether to use node attributes as the initial feature
  --feature FEATURE     distance encoding category: shortest path or random walk (landing probabilities)
  --rw_depth RW_DEPTH   random walk steps
  --max_sp MAX_SP       maximum distance to be encoded for shortest path feature
  
  # training settings
  --epoch EPOCH         number of epochs to train
  --bs BS               minibatch size
  --lr LR               learning rate
  --optimizer OPTIMIZER
                        optimizer to use
  --l2 L2               l2 regularization weight
  --dropout DROPOUT     dropout rate
  
  # imulation settings (valid only when dataset == 'simulation')
  --k K                 node degree (k) or synthetic k-regular graph
  --n [N [N ...]]       a list of number of nodes in each connected k-regular subgraph
  --N N                 total number of nodes in simultation
  --T T                 largest number of layers to be tested
  
  # logging
  --log_dir LOG_DIR     log directory
  --summary_file SUMMARY_FILE
                        brief summary of training result
  --debug               whether to use debug mode

Reference

If you make use of the code/experiment of Distance-encoding in your work, please cite our paper:

@article{li2020distance,
  title={Distance Encoding: Design Provably More Powerful Neural Networks for Graph Representation Learning},
  author={Li, Pan and Wang, Yanbang and Wang, Hongwei and Leskovec, Jure},
  journal={Advances in Neural Information Processing Systems},
  volume={33},
  year={2020}
}

More Repositories

1

snap

Stanford Network Analysis Platform (SNAP) is a general purpose network analysis and graph mining library.
C++
2,167
star
2

ogb

Benchmark datasets, data loaders, and evaluators for graph machine learning
Python
1,906
star
3

GraphGym

Platform for designing and evaluating Graph Neural Networks (GNN)
Python
1,669
star
4

pretrain-gnns

Strategies for Pre-training Graph Neural Networks
Python
955
star
5

deepsnap

Python library assists deep learning on graphs
Python
543
star
6

GraphRNN

Python
408
star
7

med-flamingo

Python
375
star
8

neural-subgraph-learning-GNN

Jupyter Notebook
327
star
9

snap-python

SNAP Python code, SWIG related files
C++
294
star
10

cs224w-notes

CS224W Course Notes
CSS
292
star
11

stark

STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases (https://stark.stanford.edu/)
Python
276
star
12

KGReasoning

Multi-Hop Logical Reasoning in Knowledge Graphs
Python
274
star
13

GreaseLM

[ICLR 2022 spotlight]GreaseLM: Graph REASoning Enhanced Language Models for Question Answering
Python
229
star
14

MLAgentBench

Python
224
star
15

GEARS

GEARS is a geometric deep learning model that predicts outcomes of novel multi-gene perturbations
Python
189
star
16

relbench

RelBench: Relational Deep Learning Benchmark
Python
181
star
17

graphwave

Jupyter Notebook
169
star
18

covid-mobility

Jupyter Notebook
147
star
19

UCE

UCE is a zero-shot foundation model for single-cell gene expression data
Python
135
star
20

GIB

Graph Information Bottleneck (GIB) for learning minimal sufficient structural and feature information using GNNs
Jupyter Notebook
123
star
21

roland

Jupyter Notebook
120
star
22

mars

Discovering novel cell types across heterogenous single-cell experiments
Jupyter Notebook
119
star
23

comet

[ICLR 2021] Concept Learners for Few-Shot Learning
Python
111
star
24

SATURN

Jupyter Notebook
103
star
25

orca

[ICLR 2022] Open-World Semi-Supervised Learning
Python
85
star
26

prodigy

Python
75
star
27

CAW

Python
72
star
28

snapvx

Python
65
star
29

conformalized-gnn

Uncertainty Quantification over Graph with Conformalized Graph Neural Networks (NeurIPS 2023)
Python
64
star
30

multiscale-interactome

Python
62
star
31

plato

Python
61
star
32

miner-data

Python
60
star
33

stellar

Jupyter Notebook
58
star
34

mambo

Jupyter Notebook
37
star
35

lamp

[ICLR23] First deep learning-based surrogate model that jointly learns the evolution model and optimizes computational cost via remeshing
Python
36
star
36

crust

[NeurIPS 2020] Coresets for Robust Training of Neural Networks against Noisy Labels
Python
33
star
37

bc-emb

Python
32
star
38

csr

Python
30
star
39

zeroc

ZeroC is a neuro-symbolic method that trained with elementary visual concepts and relations, can zero-shot recognize and acquire more complex, hierarchical concepts, even across domains
Jupyter Notebook
28
star
40

masa

Motif-Aware State Assignment in Noisy Time Series Data
Python
24
star
41

le_pde

LE-PDE accelerates PDEs' forward simulation and inverse optimization via latent global evolution, achieving significant speedup with SOTA accuracy
Jupyter Notebook
21
star
42

ConE

Python
20
star
43

F-FADE

Python
17
star
44

MetroMaps

MetroMaps Release
Python
16
star
45

MAG

Programs for Microsoft Academic Graph
Python
16
star
46

BioDiscoveryAgent

BioDiscoveryAgent is an LLM-based AI agent for closed-loop design of genetic perturbation experiments
Python
16
star
47

snap-dev

SNAP repository for Ringo
C++
14
star
48

exposure-segregation

Python
13
star
49

ringo

Next generation graph processing platform
Python
12
star
50

planet

PlaNet: Predicting population response to drugs via clinical knowledge graph
Python
12
star
51

covid-mobility-tool

Jupyter Notebook
10
star
52

reddit-processing

preprocessing of Reddit data
Python
7
star
53

news-search

search Internet news archive
Java
7
star
54

snap-python-64

C++
6
star
55

snap-dev-64

64-bit SNAP (in development, not intended for general use)
C++
6
star
56

snapworld

Python
6
star
57

ViRel

ViRel: Unsupervised Visual Relations Discovery with Graph-level Analogy
Python
5
star
58

lego

5
star
59

yperf

Simple performance monitor for Linux
Python
4
star
60

pebble-fit

become less sedentary with pebble
C
4
star
61

dec2vec

Python
3
star
62

caml

Python
3
star
63

snaptime

Python
2
star
64

SnapTimeTF

Python
2
star
65

covid-spillovers

Jupyter Notebook
2
star
66

curis-2012

Summer 2012 Curis Project
JavaScript
2
star
67

GNN-reading-group

1
star
68

supply-chains

Jupyter Notebook
1
star
69

relbench-user-study

Python
1
star
70

AutoTransfer

Python
1
star
71

hash

C++
1
star