• Stars
    star
    1,207
  • Rank 38,822 (Top 0.8 %)
  • Language
    C++
  • License
    Apache License 2.0
  • Created over 5 years ago
  • Updated 6 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

GraphVite: A General and High-performance Graph Embedding System

GraphVite logo

GraphVite - graph embedding at high speed and large scale

Install with conda License Downloads

Docs | Tutorials | Benchmarks | Pre-trained Models

GraphVite is a general graph embedding engine, dedicated to high-speed and large-scale embedding learning in various applications.

GraphVite provides complete training and evaluation pipelines for 3 applications: node embedding, knowledge graph embedding and graph & high-dimensional data visualization. Besides, it also includes 9 popular models, along with their benchmarks on a bunch of standard datasets.

Node Embedding Knowledge Graph Embedding Graph & High-dimensional Data Visualization

Here is a summary of the training time of GraphVite along with the best open-source implementations on 3 applications. All the time is reported based on a server with 24 CPU threads and 4 V100 GPUs.

Training time of node embedding on Youtube dataset.

Model Existing Implementation GraphVite Speedup
DeepWalk 1.64 hrs (CPU parallel) 1.19 mins 82.9x
LINE 1.39 hrs (CPU parallel) 1.17 mins 71.4x
node2vec 24.4 hrs (CPU parallel) 4.39 mins 334x

Training / evaluation time of knowledge graph embedding on FB15k dataset.

Model Existing Implementation GraphVite Speedup
TransE 1.31 hrs / 1.75 mins (1 GPU) 13.5 mins / 54.3 s 5.82x / 1.93x
RotatE 3.69 hrs / 4.19 mins (1 GPU) 28.1 mins / 55.8 s 7.88x / 4.50x

Training time of high-dimensional data visualization on MNIST dataset.

Model Existing Implementation GraphVite Speedup
LargeVis 15.3 mins (CPU parallel) 13.9 s 66.8x

Requirements

Generally, GraphVite works on any Linux distribution with CUDA >= 9.2.

The library is compatible with Python 2.7 and 3.6/3.7.

Installation

From Conda

conda install -c milagraph -c conda-forge graphvite cudatoolkit=$(nvcc -V | grep -Po "(?<=V)\d+.\d+")

If you only need embedding training without evaluation, you can use the following alternative with minimal dependencies.

conda install -c milagraph -c conda-forge graphvite-mini cudatoolkit=$(nvcc -V | grep -Po "(?<=V)\d+.\d+")

From Source

Before installation, make sure you have conda installed.

git clone https://github.com/DeepGraphLearning/graphvite
cd graphvite
conda install -y --file conda/requirements.txt
mkdir build
cd build && cmake .. && make && cd -
cd python && python setup.py install && cd -

On Colab

!wget -c https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
!chmod +x Miniconda3-latest-Linux-x86_64.sh
!./Miniconda3-latest-Linux-x86_64.sh -b -p /usr/local -f

!conda install -y -c milagraph -c conda-forge graphvite \
    python=3.6 cudatoolkit=$(nvcc -V | grep -Po "(?<=V)\d+\.\d+")
!conda install -y wurlitzer ipykernel
import site
site.addsitedir("/usr/local/lib/python3.6/site-packages")
%reload_ext wurlitzer

Quick Start

Here is a quick-start example of the node embedding application.

graphvite baseline quick start

Typically, the example takes no more than 1 minute. You will obtain some output like

Batch id: 6000
loss = 0.371041

------------- link prediction --------------
AUC: 0.899933

----------- node classification ------------
macro-F1@20%: 0.242114
micro-F1@20%: 0.391342

Baseline Benchmark

To reproduce a baseline benchmark, you only need to specify the keywords of the experiment. e.g. model and dataset.

graphvite baseline [keyword ...] [--no-eval] [--gpu n] [--cpu m] [--epoch e]

You may also set the number of GPUs and the number of CPUs per GPU.

Use graphvite list to get a list of available baselines.

Custom Experiment

Create a yaml configuration scaffold for graph, knowledge graph, visualization or word graph.

graphvite new [application ...] [--file f]

Fill some necessary entries in the configuration following the instructions. You can run the configuration by

graphvite run [config] [--no-eval] [--gpu n] [--cpu m] [--epoch e]

High-dimensional Data Visualization

You can visualize your high-dimensional vectors with a simple command line in GraphVite.

graphvite visualize [file] [--label label_file] [--save save_file] [--perplexity n] [--3d]

The file can be either a numpy dump *.npy or a text matrix *.txt. For the save file, we recommend to use png format, while pdf is also supported.

Contributing

We welcome all contributions from bug fixs to new features. Please let us know if you have any suggestion to our library.

Development Team

GraphVite is developed by MilaGraph, led by Prof. Jian Tang.

Authors of this project are Zhaocheng Zhu, Shizhen Xu, Meng Qu and Jian Tang. Contributors include Kunpeng Wang and Zhijian Duan.

Citation

If you find GraphVite useful for your research or development, please cite the following paper.

@inproceedings{zhu2019graphvite,
    title={GraphVite: A High-Performance CPU-GPU Hybrid System for Node Embedding},
     author={Zhu, Zhaocheng and Xu, Shizhen and Qu, Meng and Tang, Jian},
     booktitle={The World Wide Web Conference},
     pages={2494--2504},
     year={2019},
     organization={ACM}
 }

Acknowledgements

We would like to thank Compute Canada for supporting GPU servers. We specially thank Wenbin Hou for useful discussions on C++ and GPU programming techniques.

More Repositories

1

LiteratureDL4Graph

A comprehensive collection of recent papers on graph deep learning
3,068
star
2

torchdrug

A powerful and flexible machine learning platform for drug discovery
Python
1,382
star
3

KnowledgeGraphEmbedding

Python
1,184
star
4

RecommenderSystems

Python
1,058
star
5

ULTRA

A foundation model for knowledge graph reasoning
Python
420
star
6

GMNN

Graph Markov Neural Networks
Python
400
star
7

GearNet

GearNet and Geometric Pretraining Methods for Protein Structure Representation Learning, ICLR'2023 (https://arxiv.org/abs/2203.06125)
Python
265
star
8

NBFNet

Official implementation of Neural Bellman-Ford Networks (NeurIPS 2021)
Python
196
star
9

ConfGF

Implementation of Learning Gradient Fields for Molecular Conformation Generation (ICML 2021).
Python
159
star
10

pLogicNet

Python
143
star
11

RNNLogic

C++
123
star
12

AStarNet

Official implementation of A* Networks
Python
121
star
13

GraphAny

GraphAny: A foundation model for node classification on any graph.
Python
101
star
14

GNN-QE

Official implementation of Graph Neural Network Query Executor (ICML 2022)
Python
89
star
15

PEER_Benchmark

PEER Benchmark, appear at NeurIPS 2022 Dataset and Benchmark Track (https://arxiv.org/abs/2206.02096)
Python
79
star
16

ESM-GearNet

ESM-GearNet for Protein Structure Representation Learning (https://arxiv.org/abs/2303.06275)
Python
75
star
17

DiffPack

Implementation of DiffPack: A Torsional Diffusion Model for Autoregressive Protein Side-Chain Packing
Python
71
star
18

GraphLoG

Implementation of Self-supervised Graph-level Representation Learning with Local and Global Structure (ICML 2021).
Python
68
star
19

ProtST

[ICML-23 ORAL] ProtST: Multi-Modality Learning of Protein Sequences and Biomedical Texts
Python
62
star
20

GraphAF

50
star
21

InductiveQE

Official implementation of Inductive Logical Query Answering in Knowledge Graphs (NeurIPS 2022)
Python
47
star
22

ContinuousGNN

Python
44
star
23

FewShotRE

Python
38
star
24

SiamDiff

Code for Pre-training Protein Encoder via Siamese Sequence-Structure Diffusion Trajectory Prediction (https://arxiv.org/abs/2301.12068)
Python
38
star
25

SPN

Python
29
star
26

GearBind

Pretrainable geometric graph neural network for antibody affinity maturation
Python
28
star
27

esm-s

Structure-Informed Protein Language Model
Python
26
star
28

DrugTutorial_AAAI2021

Tutorial for Drug Discovery on AAAI 2021.
CSS
8
star
29

DeepGraphLearning

Homepage
7
star
30

torchdrug-site

Website for TorchDrug
SCSS
6
star
31

GraphRepresentationLiterature

The literature on graph representation learning
4
star
32

ultra_torchdrug

A TorchDrug version of ULTRA for reproducibility
Python
4
star
33

AAAI19Tutorial

Tutorial "graph representation learning" given at AAAI'19
3
star
34

torchprotein-site

Website for TorchProtein
SCSS
3
star
35

coursewebsite

Course website for Deep Learning and Applications
CSS
2
star
36

Math80600A_2021W

Python
1
star