• Stars
    star
    119
  • Rank 297,930 (Top 6 %)
  • Language
    Python
  • License
    GNU General Publi...
  • Created over 6 years ago
  • Updated about 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

An implementation of "Community Preserving Network Embedding" (AAAI 2017)

M-NMF

codebeat badge repo sizeâ €benedekrozemberczkiâ €

Abstract

Network embedding, aiming to learn the low-dimensional representations of nodes in networks, is of paramount importance in many real applications. One basic requirement of network embedding is to preserve the structure and inherent properties of the networks. While previous network embedding methods primarily preserve the microscopic structure, such as the first- and second-order proximities of nodes, the mesoscopic community structure, which is one of the most prominent feature of networks, is largely ignored. In this paper, we propose a novel Modularized Nonnegative Matrix Factorization (M-NMF) model to incorporate the community structure into network embedding. We exploit the consensus relationship between the representations of nodes and community structure, and then jointly optimize NMF based representation learning model and modularity based community detection model in a unified framework, which enables the learned representations of nodes to preserve both of the microscopic and community structures. We also provide efficient updating rules to infer the parameters of our model, together with the correctness and convergence guarantees. Extensive experimental results on a variety of real-world networks show the superior performance of the proposed method over the state-of-the-arts.

The model is now also available in the package Karate Club.

This repository provides a TensorFlow implementation for M-NMF as it is described in:

Community Preserving Network Embedding. Xiao Wang, Peng Cui, Jing Wang, Jain Pei, WenWu Zhu, Shiqiang Yang. Proceedings of the Thirsty-First AAAI conference on Artificial Intelligence (AAAI-17).

A reference MatLab implementation is available [here].

Requirements

The codebase is implemented in Python 3.5.2. Package versions used for development are just below.

networkx          2.4
tqdm              4.19.5
numpy             1.13.3
pandas            0.20.3
tensorflow-gpu    1.12.0
jsonschema        2.6.0
texttable         1.2.1
python-louvain    0.11

Datasets

The code takes an input graph in a csv file. Every row indicates an edge between two nodes separated by a comma. The first row is a header. Nodes should be indexed starting with 0. A sample graph for the `Facebook Politicians` dataset is included in the `data/` directory.

Logging

The models are defined in a way that parameter settings and cluster quality is logged in every single epoch. Specifically we log the followings:

1. Hyperparameter settings.     We save each hyperparameter used in the experiment.
2. Cluster quality.             Measured by modularity. We calculate it in every epoch.
3. Runtime.                     We measure the time needed for optimization -- measured by seconds.

Options

Learning of the embedding is handled by the src/main.py script which provides the following command line arguments.

Input and output options

  --input                STR         Input graph path.                                 Default is `data/food_edges.csv`.
  --embedding-output     STR         Embeddings path.                                  Default is `output/embeddings/food_embedding.csv`.
  --cluster-mean-output  STR         Cluster centers path.                             Default is `output/cluster_means/food_means.csv'`.
  --log-output           STR         Log path.                                         Default is `output/logs/food.log`.
  --assignment-output    STR         Node-cluster assignment dictionary path.          Default is `output/assignments/food.json`.
  --dump-matrices        BOOL        Whether the trained model should be saved.        Default is `True`.

Model options

  --dimensions        INT         Number of dimensions.                             Default is 16.
  --clusters          INT         Number of clusters.                               Default is 20.
  --lambd             FLOAT       KKT penalty.			                                Default is 0.2.
  --alpha             FLOAT       Clustering penalty.                               Default is 0.05.
  --beta              FLOAT       Modularity regularization penalty.                Default is 0.05.
  --eta               FLOAT       Similarity mixing parameter.                      Default is 5.0.
  --lower-control     FLOAT       Floating point overflow control.                  Default is 10**-15.
  --iteration-number  INT         Number of power iterations.                       Default is 200.
  --early-stopping    INT         Early stopping round number based on modularity.  Default is 3.

Examples

The following commands learn a graph embedding, cluster centers and writes them to disk. The node representations are ordered by the ID.

Creating an MNMF embedding of the default dataset with the default hyperparameter settings. Saving the embedding, cluster centres and the log file at the default path.

$ python src/main.py

Turning off the model saving.

$ python src/main.py --dump-matrices False

Creating an embedding of an other dataset the Facebook Companies. Saving the output and the log in a custom place.

$ python src/main.py --input data/company_edges.csv  --embedding-output output/embeddings/company_embedding.csv --cluster-mean-output output/cluster_means/company_means.csv

Creating a clustered embedding of the default dataset in 128 dimensions and 10 cluster centers.

$ python src/main.py --dimensions 128 --clusters 10

License


More Repositories

1

awesome-graph-classification

A collection of important graph embedding, classification and representation learning papers with implementations.
Python
4,666
star
2

pytorch_geometric_temporal

PyTorch Geometric Temporal: Spatiotemporal Signal Processing with Neural Machine Learning Models (CIKM 2021)
Python
2,621
star
3

awesome-decision-tree-papers

A collection of research papers on decision, classification and regression trees with implementations.
Python
2,248
star
4

awesome-community-detection

A curated list of community detection research papers with implementations.
Python
2,224
star
5

karateclub

Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)
Python
2,065
star
6

awesome-fraud-detection-papers

A curated list of data mining papers about fraud detection.
Python
1,481
star
7

CapsGNN

A PyTorch implementation of "Capsule Graph Neural Network" (ICLR 2019).
Python
1,216
star
8

awesome-gradient-boosting-papers

A curated list of gradient boosting research papers with implementations.
Python
966
star
9

graph2vec

A parallel implementation of "graph2vec: Learning Distributed Representations of Graphs" (MLGWorkshop 2017).
Python
860
star
10

ClusterGCN

A PyTorch implementation of "Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks" (KDD 2019).
Python
757
star
11

littleballoffur

Little Ball of Fur - A graph sampling extension library for NetworKit and NetworkX (CIKM 2020)
Python
676
star
12

SimGNN

A PyTorch implementation of "SimGNN: A Neural Network Approach to Fast Graph Similarity Computation" (WSDM 2019).
Python
657
star
13

awesome-monte-carlo-tree-search-papers

A curated list of Monte Carlo tree search papers with implementations.
Python
565
star
14

datasets

A repository of pretty cool datasets that I collected for network science and machine learning research.
551
star
15

GraphWaveletNeuralNetwork

A PyTorch implementation of "Graph Wavelet Neural Network" (ICLR 2019)
Python
548
star
16

MixHop-and-N-GCN

An implementation of "MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing" (ICML 2019).
Python
395
star
17

APPNP

A PyTorch implementation of "Predict then Propagate: Graph Neural Networks meet Personalized PageRank" (ICLR 2019).
Python
351
star
18

AttentionWalk

A PyTorch Implementation of "Watch Your Step: Learning Node Embeddings via Graph Attention" (NeurIPS 2018).
Python
309
star
19

SGCN

A PyTorch implementation of "Signed Graph Convolutional Network" (ICDM 2018).
Python
262
star
20

GAM

A PyTorch implementation of "Graph Classification Using Structural Attention" (KDD 2018).
Python
261
star
21

GEMSEC

The TensorFlow reference implementation of 'GEMSEC: Graph Embedding with Self Clustering' (ASONAM 2019).
Python
244
star
22

SEAL-CI

A PyTorch implementation of "Semi-Supervised Graph Classification: A Hierarchical Graph Perspective" (WWW 2019)
Python
204
star
23

shapley

The official implementation of "The Shapley Value of Classifiers in Ensemble Games" (CIKM 2021).
Python
203
star
24

Splitter

A Pytorch implementation of "Splitter: Learning Node Representations that Capture Multiple Social Contexts" (WWW 2019).
Python
203
star
25

DANMF

A sparsity aware implementation of "Deep Autoencoder-like Nonnegative Matrix Factorization for Community Detection" (CIKM 2018).
Python
194
star
26

GraphWaveMachine

A scalable implementation of "Learning Structural Node Embeddings Via Diffusion Wavelets (KDD 2018)".
Python
176
star
27

role2vec

A scalable Gensim implementation of "Learning Role-based Graph Embeddings" (IJCAI 2018).
Python
158
star
28

MUSAE

The reference implementation of "Multi-scale Attributed Node Embedding". (Journal of Complex Networks 2021)
Python
136
star
29

EdMot

An implementation of "EdMot: An Edge Enhancement Approach for Motif-aware Community Detection" (KDD 2019)
Python
128
star
30

diff2vec

Reference implementation of Diffusion2Vec (Complenet 2018) built on Gensim and NetworkX.
Python
117
star
31

LabelPropagation

A NetworkX implementation of Label Propagation from a "Near Linear Time Algorithm to Detect Community Structures in Large-Scale Networks" (Physical Review E 2008).
Python
111
star
32

walklets

A lightweight implementation of Walklets from "Don't Walk Skip! Online Learning of Multi-scale Network Embeddings" (ASONAM 2017).
Python
98
star
33

tigerlily

TigerLily: Finding drug interactions in silico with the Graph.
Jupyter Notebook
95
star
34

BANE

A sparsity aware implementation of "Binarized Attributed Network Embedding" (ICDM 2018).
Python
85
star
35

EgoSplitting

A NetworkX implementation of "Ego-splitting Framework: from Non-Overlapping to Overlapping Clusters" (KDD 2017).
Python
80
star
36

ASNE

A sparsity aware and memory efficient implementation of "Attributed Social Network Embedding" (TKDE 2018).
Python
77
star
37

TENE

A sparsity aware implementation of "Enhanced Network Embedding with Text Information" (ICPR 2018).
Python
71
star
38

SINE

A PyTorch Implementation of "SINE: Scalable Incomplete Network Embedding" (ICDM 2018).
Python
69
star
39

RolX

An alternative implementation of Recursive Feature and Role Extraction (KDD11 & KDD12)
Python
58
star
40

GraRep

A SciPy implementation of "GraRep: Learning Graph Representations with Global Structural Information" (WWW 2015).
Python
58
star
41

PDN

The official PyTorch implementation of "Pathfinder Discovery Networks for Neural Message Passing" (WebConf '21)
Python
55
star
42

TADW

An implementation of "Network Representation Learning with Rich Text Information" (IJCAI '15).
Python
54
star
43

spatiotemporal_datasets

Spatiotemporal datasets collected for network science, deep learning and general machine learning research.
43
star
44

NMFADMM

A sparsity aware implementation of "Alternating Direction Method of Multipliers for Non-Negative Matrix Factorization with the Beta-Divergence" (ICASSP 2014).
Python
40
star
45

FEATHER

The reference implementation of FEATHER from the CIKM '20 paper "Characteristic Functions on Graphs: Birds of a Feather, from Statistical Descriptors to Parametric Models".
Python
40
star
46

BoostedFactorization

An implementation of "Multi-Level Network Embedding with Boosted Low-Rank Matrix Approximation" (ASONAM 2019).
Python
33
star
47

resolutions-2019

A list of data mining and machine learning papers that I implemented in 2019.
20
star
48

OrbitalFeatures

A sparsity aware implementation of "Biological Network Comparison Using Graphlet Degree Distribution" (Bioinformatics 2007)
Python
19
star
49

FSCNMF

An implementation of "Fusing Structure and Content via Non-negative Matrix Factorization for Embedding Information Networks".
Python
18
star
50

GRAF

Inner product natural graph factorization machine used in 'GEMSEC: Graph Embedding with Self Clustering' .
Python
10
star
51

HullCoverConditionedUnitDiskGraph

A generator for unit disk graphs conditioned on concave hull cover.
Python
8
star
52

AV_Ultimate_Student_Hunt

Solution for the Ultimate Student Hunt Challenge (1st place).
R
8
star
53

NestedSubtreeHash

A distributed implementation of "Nested Subtree Hash Kernels for Large-Scale Graph Classification Over Streams" (ICDM 2012).
Python
7
star
54

Societe-General

Solution for ENS - Societe Generale Challenge (1st place).
R
5
star
55

resolutions-2020

4
star
56

graphmining.ai

Benedek Rozemberczki Personal Webpage
4
star
57

benedekrozemberczki

3
star