Linear Graph Autoencoders

This repository provides Python (Tensorflow) code to reproduce experiments from the article Keep It Simple: Graph Autoencoders Without Graph Convolutional Networks presented at the NeurIPS 2019 Workshop on Graph Representation Learning.

Update: an extended conference version of this article is now available here: Simple and Effective Graph Autoencoders with One-Hop Linear Models (accepted at ECML-PKDD 2020).

Update 2: do you prefer PyTorch? An implementation of Linear Graph AE and VAE is now available in the pytorch_geometric project! See the example here.

Introduction

We release Tensorflow implementations of the following two graph embedding models from the paper:

Linear Graph Autoencoders
Linear Graph Variational Autoencoders

together with standard Graph Autoencoders (AE) and Graph Variational Autoencoders (VAE) models (with 2-layer or 3-layer Graph Convolutional Networks encoders) from Kipf and Welling (2016).

We evaluate all models on the link prediction and node clustering tasks introduced in the paper. We provide the Cora, Citeseer and Pubmed datasets in the data folder, and refer to section 4 of the paper for direct link to the additional datasets used in our experiments.

Our code builds upon Thomas Kipf's original Tensorflow implementation of standard Graph AE/VAE.

Scaling-Up Graph AE and VAE

Standard Graph AE and VAE models suffer from scalability issues. In order to scale them to large graphs with millions of nodes and egdes, we also provide an implementation of our framework from the article A Degeneracy Framework for Scalable Graph Autoencoders (IJCAI 2019). In this paper, we propose to train the graph AE/VAE only from a dense subset of nodes, namely the k-core or k-degenerate subgraph. Then, we propagate embedding representations to the remaining nodes using faster heuristics.

Update: in this other repository, we provide an implementation of FastGAE, a new (and more effective) method from our group to scale Graph AE and VAE.

Installation

python setup.py install

Requirements: tensorflow (1.X), networkx, numpy, scikit-learn, scipy

Run Experiments

cd linear_gae
python train.py --model=gcn_vae --dataset=cora --task=link_prediction
python train.py --model=linear_vae --dataset=cora --task=link_prediction

The above commands will train a standard Graph VAE with 2-layer GCN encoders (line 2) and a Linear Graph VAE (line 3) on Cora dataset and will evaluate embeddings on the Link Prediction task, with all parameters set to default values.

python train.py --model=gcn_vae --dataset=cora --task=link_prediction --kcore=True --k=2
python train.py --model=gcn_vae --dataset=cora --task=link_prediction --kcore=True --k=3
python train.py --model=gcn_vae --dataset=cora --task=link_prediction --kcore=True --k=4

By adding --kcore=True, the model will only be trained on the k-core subgraph instead of using the entire graph. Here, k is a parameter (from 0 to the maximal core number of the graph) to specify using the --k flag.

Complete list of parameters

Parameter	Type	Description	Default Value
`model`	string	Name of the model, among: - `gcn_ae`: Graph AE from Kipf and Welling (2016), with 2-layer GCN encoder and inner product decoder - `gcn_vae`: Graph VAE from Kipf and Welling (2016), with Gaussian distributions, 2-layer GCN encoders for mu and sigma, and inner product decoder - `linear_ae`: Linear Graph AE, as introduced in section 3 of NeurIPS workshop paper, with linear encoder, and inner product decoder - `linear_vae`: Linear Graph VAE, as introduced in section 3 of NeurIPS workshop paper, with Gaussian distributions, linear encoders for mu and sigma, and inner product decoder - `deep_gcn_ae`: Deeper version of Graph AE, with 3-layer GCN encoder, and inner product decoder - `deep_gcn_vae`: Deeper version of Graph VAE, with Gaussian distributions, 3-layer GCN encoders for mu and sigma, and inner product decoder	`gcn_ae`
`dataset`	string	Name of the dataset, among: - `cora`: scientific publications citation network - `citeseer`: scientific publications citation network - `pubmed`: scientific publications citation network We provide the preprocessed versions, coming from the tkipf/gae repository. Please check the LINQS website for raw data You can specify any additional graph dataset, in edgelist format, by editing `input_data.py`	`cora`
`task`	string	Name of the Machine Learning evaluation task, among: - `link_prediction`: Link Prediction - `node_clustering`: Node Clustering See section 4 and supplementary material of NeurIPS 2019 workshop paper for details about tasks	`link_prediction`
`dropout`	float	Dropout rate	`0.`
`epoch`	int	Number of epochs in model training	`200`
`features`	boolean	Whether to include node features in encoder	`False`
`learning_rate`	float	Initial learning rate (with Adam optimizer)	`0.01`
`hidden`	int	Number of units in GCN encoder hidden layer(s)	`32`
`dimension`	int	Dimension of encoder output, i.e. embedding dimension	`16`
`kcore`	boolean	Whether to run k-core decomposition and use the degeneracy framework from IJCAI paper. If `False`, the AE/VAE will be trained on the entire graph	`False`
`k`	int	Which k-core to use. Higher k => smaller graphs and faster (but maybe less accurate) training	`2`
`nb_run`	integer	Number of model runs + tests	`1`
`prop_val`	float	Proportion of edges in validation set (for Link Prediction)	`5.`
`prop_test`	float	Proportion of edges in test set (for Link Prediction)	`10.`
`validation`	boolean	Whether to report validation results at each epoch (for Link Prediction)	`False`
`verbose`	boolean	Whether to print full comments details	`True`

Models from the paper

Cora

python train.py --dataset=cora --model=linear_ae --task=link_prediction --epochs=200 --learning_rate=0.01 --dimension=16 --nb_run=5
python train.py --dataset=cora --model=linear_vae --task=link_prediction --epochs=200 --learning_rate=0.01 --dimension=16 --nb_run=5
python train.py --dataset=cora --model=gcn_ae --task=link_prediction --epochs=200 --learning_rate=0.01 --hidden=32 --dimension=16 --nb_run=5
python train.py --dataset=cora --model=gcn_vae --task=link_prediction --epochs=200 --learning_rate=0.01 --hidden=32 --dimension=16 --nb_run=5
python train.py --dataset=cora --model=deep_gcn_ae --task=link_prediction --epochs=200 --learning_rate=0.01 --hidden=32 --dimension=16 --nb_run=5
python train.py --dataset=cora --model=deep_gcn_vae --task=link_prediction --epochs=200 --learning_rate=0.01 --hidden=32 --dimension=16 --nb_run=5

Cora - with features

python train.py --dataset=cora --features=True --model=linear_ae --task=link_prediction --epochs=200 --learning_rate=0.01 --dimension=16 --nb_run=5
python train.py --dataset=cora --features=True --model=linear_vae --task=link_prediction --epochs=200 --learning_rate=0.01 --dimension=16 --nb_run=5
python train.py --dataset=cora --features=True --model=gcn_ae --task=link_prediction --epochs=200 --learning_rate=0.01 --hidden=32 --dimension=16 --nb_run=5
python train.py --dataset=cora --features=True --model=gcn_vae --task=link_prediction --epochs=200 --learning_rate=0.01 --hidden=32 --dimension=16 --nb_run=5
python train.py --dataset=cora --features=True --model=deep_gcn_ae --task=link_prediction --epochs=200 --learning_rate=0.01 --hidden=32 --dimension=16 --nb_run=5
python train.py --dataset=cora --features=True --model=deep_gcn_vae --task=link_prediction --epochs=200 --learning_rate=0.01 --hidden=32 --dimension=16 --nb_run=5

Citeseer

python train.py --dataset=citeseer --model=linear_ae --task=link_prediction --epochs=200 --learning_rate=0.01 --dimension=16 --nb_run=5
python train.py --dataset=citeseer --model=linear_vae --task=link_prediction --epochs=200 --learning_rate=0.01 --dimension=16 --nb_run=5
python train.py --dataset=citeseer --model=gcn_ae --task=link_prediction --epochs=200 --learning_rate=0.01 --hidden=32 --dimension=16 --nb_run=5
python train.py --dataset=citeseer --model=gcn_vae --task=link_prediction --epochs=200 --learning_rate=0.01 --hidden=32 --dimension=16 --nb_run=5
python train.py --dataset=citeseer --model=deep_gcn_ae --task=link_prediction --epochs=200 --learning_rate=0.01 --hidden=32 --dimension=16 --nb_run=5
python train.py --dataset=citeseer --model=deep_gcn_vae --task=link_prediction --epochs=200 --learning_rate=0.01 --hidden=32 --dimension=16 --nb_run=5

Citeseer - with features

python train.py --dataset=citeseer --features=True --model=linear_ae --task=link_prediction --epochs=200 --learning_rate=0.01 --dimension=16 --nb_run=5
python train.py --dataset=citeseer --features=True --model=linear_vae --task=link_prediction --epochs=200 --learning_rate=0.01 --dimension=16 --nb_run=5
python train.py --dataset=citeseer --features=True --model=gcn_ae --task=link_prediction --epochs=200 --learning_rate=0.01 --hidden=32 --dimension=16 --nb_run=5
python train.py --dataset=citeseer --features=True --model=gcn_vae --task=link_prediction --epochs=200 --learning_rate=0.01 --hidden=32 --dimension=16 --nb_run=5
python train.py --dataset=citeseer --features=True --model=deep_gcn_ae --task=link_prediction --epochs=200 --learning_rate=0.01 --hidden=32 --dimension=16 --nb_run=5
python train.py --dataset=citeseer --features=True --model=deep_gcn_vae --task=link_prediction --epochs=200 --learning_rate=0.01 --hidden=32 --dimension=16 --nb_run=5

Pubmed

python train.py --dataset=pubmed --model=linear_ae --task=link_prediction --epochs=200 --learning_rate=0.01 --dimension=16 --nb_run=5
python train.py --dataset=pubmed --model=linear_vae --task=link_prediction --epochs=200 --learning_rate=0.01 --dimension=16 --nb_run=5
python train.py --dataset=pubmed --model=gcn_ae --task=link_prediction --epochs=200 --learning_rate=0.01 --hidden=32 --dimension=16 --nb_run=5
python train.py --dataset=pubmed --model=gcn_vae --task=link_prediction --epochs=200 --learning_rate=0.01 --hidden=32 --dimension=16 --nb_run=5
python train.py --dataset=pubmed --model=deep_gcn_ae --task=link_prediction --epochs=200 --learning_rate=0.01 --hidden=32 --dimension=16 --nb_run=5
python train.py --dataset=pubmed --model=deep_gcn_vae --task=link_prediction --epochs=200 --learning_rate=0.01 --hidden=32 --dimension=16 --nb_run=5

Pubmed - with features

python train.py --dataset=pubmed --features=True --model=linear_ae --task=link_prediction --epochs=200 --learning_rate=0.01 --dimension=16 --nb_run=5
python train.py --dataset=pubmed --features=True --model=linear_vae --task=link_prediction --epochs=200 --learning_rate=0.01 --dimension=16 --nb_run=5
python train.py --dataset=pubmed --features=True --model=gcn_ae --task=link_prediction --epochs=200 --learning_rate=0.01 --hidden=32 --dimension=16 --nb_run=5
python train.py --dataset=pubmed --features=True --model=gcn_vae --task=link_prediction --epochs=200 --learning_rate=0.01 --hidden=32 --dimension=16 --nb_run=5
python train.py --dataset=pubmed --features=True --model=deep_gcn_ae --task=link_prediction --epochs=200 --learning_rate=0.01 --hidden=32 --dimension=16 --nb_run=5
python train.py --dataset=pubmed --features=True --model=deep_gcn_vae --task=link_prediction --epochs=200 --learning_rate=0.01 --hidden=32 --dimension=16 --nb_run=5

Notes:

Set --task=node_clustering with same hyperparameters to evaluate models on node clustering (as in Table 4) instead of link prediction
Set --nb_run=100 to report mean AUC and AP along with standard errors over 100 runs, as in the paper
We recommend GPU usage for faster learning

Cite

1 - Please cite the following paper(s) if you use linear graph AE/VAE code in your own work.

NeurIPS 2019 workshop version:

@misc{salha2019keep,
  title={Keep It Simple: Graph Autoencoders Without Graph Convolutional Networks},
  author={Salha, Guillaume and Hennequin, Romain and Vazirgiannis, Michalis},
  howpublished={Workshop on Graph Representation Learning, 33rd Conference on Neural Information Processing Systems (NeurIPS)},
  year={2019}
}

and/or the extended conference version:

@inproceedings{salha2020simple,
  title={Simple and Effective Graph Autoencoders with One-Hop Linear Models},
  author={Salha, Guillaume and Hennequin, Romain and Vazirgiannis, Michalis},
  booktitle={European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD)},
  year={2020}
}

2 - Please cite the following paper if you use the k-core framework for scalability in your own work.

@inproceedings{salha2019degeneracy,
  title={A Degeneracy Framework for Scalable Graph Autoencoders},
  author={Salha, Guillaume and Hennequin, Romain and Tran, Viet Anh and Vazirgiannis, Michalis},
  booktitle={28th International Joint Conference on Artificial Intelligence (IJCAI)},
  year={2019}
}

deezer/linear_graph_autoencoders

deezer

Reviews

Repository Details