• Stars
    star
    143
  • Rank 256,935 (Top 6 %)
  • Language
    C++
  • License
    MIT License
  • Created over 4 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

In-Memory Subgraph Matching: An In-depth Study by Dr. Shixuan Sun and Prof. Qiong Luo

SubgraphMatching

Introduction

We study the performance of eight representative in-memory subgraph matching (subgraph query) algorithms. Specifically, we put QuickSI, GraphQL, CFL, CECI, DP-iso, RI and VF2++ in a common framework to compare them on the following four aspects: (1) method of filtering candidate vertices in the data graph; (2) method of ordering query vertices; (3) method of enumer- ating partial results; and (4) other optimization techniques. Then, we compare the overall performance of these algo- rithms with Glasgow, an algorithm based on the constraint programming. Through experiments, we find that (1) the fil- tering method of GraphQL is competitive to that of the latest algorithms CFL, CECI and DP-iso in terms of pruning power; (2) the ordering methods in GraphQL and RI are usually the most effective; (3) the set intersection based local candidate computation in CECI and DP-iso performs the best in the enumeration; and (4) the failing sets pruning in DP-iso can significantly improve the performance when queries become large. Based on these new results, we recommend users to adopt specific techniques depending on the data graph den- sity and query size.

For the details, please refer to our SIGMOD'2020 paper "In-Memory Subgraph Matching: an In-depth Study" by Dr. Shixuan Sun and Prof. Qiong Luo. If you have any further questions, please feel free to contact us.

Please cite our paper, if you use our source code.

  • "Shixuan Sun and Qiong Luo. In-Memory Subgraph Matching: an In-depth Study. SIGMOD 2020."

Compile

Under the root directory of the project, execute the following commands to compile the source code.

mkdir build
cd build
cmake ..
make

Test

Execute the following commands to test the correctness of the binary file.

cd test
python test.py ../build/matching/SubgraphMatching.out

Execute

After compiling the source code, you can find the binary file 'SubgraphMatching.out' under the 'build/matching' directory. Execute the binary with the following command ./SubgraphMatching.out -d data_graphs -q query_graphs -filter method_of_filtering_candidate_vertices -order method_of_ordering_query_vertices -engine method_of_enumerating_partial_results -num number_of_embeddings, in which -d specifies the input of the data graphs and -q specifies the input of the query graphs. The -filter parameter gives the filtering method, the -order specifies the ordering method, and the -engine sets the enumeration method. The -num parameter sets the maximum number of embeddings that you would like to find. If the number of embeddings enumerated reaches the limit or all the results have been found, then the program will terminate. Set -num as 'MAX' to find all results.

Example (Use the filtering and ordering methods of GraphQL to generate the candidate vertex sets and the matching order respectively. Enumerate results with the set-intersection based local candidate computation method):

./SubgraphMatching.out -d ../../test/sample_dataset/test_case_1.graph -q ../../test/sample_dataset/query1_positive.graph -filter GQL -order GQL -engine LFTJ -num MAX

Input

Both the input query graph and data graph are vertex-labeled. Each graph starts with 't N M' where N is the number of vertices and M is the number of edges. A vertex and an edge are formatted as 'v VertexID LabelId Degree' and 'e VertexId VertexId' respectively. Note that we require that the vertex id is started from 0 and the range is [0,N - 1] where V is the vertex set. The following is an input sample. You can also find sample data sets and query sets under the test folder.

Example:

t 5 6
v 0 0 2
v 1 1 3
v 2 2 3
v 3 1 2
v 4 2 2
e 0 1
e 0 2
e 1 2
e 1 3
e 2 4
e 3 4

Techniques Supported

The filtering methods that generate candidate vertex sets. Note that (1) every filtering method enables LDF by default; and (2) We optimize the filtering method of TurboIso with the technique of CFL.

Parameter of Command Line (-filter) Description
LDF the label and degree filter
NLF the neighborhood label frequency filter
GQL the filtering method of GraphQL
TSO the filtering method of TurboIso
CFL the filtering method of CFL
DPiso the filtering method of DP-iso
CECI the filtering method of CECI

The ordering methods that generate matching order.

Parameter of Command Line (-order) Description
QSI the ordering method of QuickSI
GQL the ordering method of GraphQL
TSO the ordering method of TurboIso
CFL the ordering method of CFL
DPiso the ordering method of DP-iso
CECI the ordering method of CECI
RI the ordering method of RI
VF2++ the ordering method of VF2++

The enumeration methods that find all results.

Parameter of Command Line (-engine) Description
QSI the enumeration method of QuickSI and RI (Algorithm 2)
GQL the enumeration method of GraphQL (Algorithm 3)
EXPLORE the enumeration method of CFL (Algorithm 4)
DPiso the enumeration method of DP-iso (Algorithm 5)
CECI the enumeration method of CECI (Algorithm 5)
LFTJ the set-intersection based local candidates computation

We can execute a query by specifying the filtering method, the ordering method, and the enumeration method. For example, we evaluate the query with the GraphQL algorithm in the following command.

./SubgraphMatching.out -d ../../test/sample_dataset/test_case_1.graph -q ../../test/sample_dataset/query1_positive.graph -filter GQL -order GQL -engine GQL -num MAX

Apart from evaluating queries with a specific algorithm, we can also execute queries by integrating techniques from different algorithms. For example, we generate the candidates with the label and degree filter, obtain the matching order with the ordering method of CFL, and finally enumerate the results with the set-intersection based local candidates computation method. Note that the source code cannot support the filtering/ordering/enumeration orders of CECI to integrate with other algorithms.

./SubgraphMatching.out -d ../../test/sample_dataset/test_case_1.graph -q ../../test/sample_dataset/query1_positive.graph -filter LDF -order CFL -engine LFTJ -num MAX

In addition to setting the filtering, ordering and enumeration methods, you can also configure the set intersection algorithms and optimization techniques by defining macros in 'configuration/config.h'. Please see the comments in 'configuration/config.h' for the detail. We briefly introduce these methods in the following.

Macro Description
HYBRID 0 a hybrid method handling the cardinality skew by integrating the merge-based method with the galloping-based method
HYBRID 1 the merge-based set intersection
SI 0 Accelerate the set intersection with AVX2
SI 1 Accelerate the set intersection with AVX512
SI 2 Scalar set intersection
ENABLE_QFLITER the QFilter set intersection algorithm
ENABLE_FAILING_SET the failing set pruning technique in DP-iso

Furthermore, our source code can support the spectrum analysis on a query. Specifically, we evaluate a query by generate a number of matching orders by performing the permutation of query vertices. For each order, we set a time limit for evaluating the query. In the following command line, we generate 1000 matching orders and set the time limit for each order as 60 seconds. Note that you need to first define SPECTRUM in 'configuration/config.h' file before executing the command.

./SubgraphMatching.out -d ../../test/sample_dataset/test_case_1.graph -q ../../test/sample_dataset/query1_positive.graph -filter LDF -order Spectrum -engine Spectrum -order_num 100 -time_limit 60 -num 100000

Recommendation

We recommend you to use GQL, CFL or DPiso as the filtering method and use GQL or RI as the ordering method. We always recommend you to use LFTJ as the enumeration method. For the large queries, we recommend you to enable the failing set pruning. For the dense data graphs, we recommend you to enable QFilter. Otherwise, use the hybrid set intersection method with AVX2.

In summary, you can execute your workloads with the following parameters as your baseline method in your experiments. First, add the definition "#define ENABLE_FAILING_SET" in configuration/config.h. Second, re-compile the source code. Finally, execute the workloads. Note that in our experiments, we set the target number of results as 100000 and terminate a query if it cannot be completed within 5 minutes (300 seconds). You can adjust these settings according to your requirement.

Execute the program with the "RI":

./SubgraphMatching.out -d ../../test/sample_dataset/test_case_1.graph -q ../../test/sample_dataset/query1_positive.graph -filter GQL -order RI -engine LFTJ -num 100000

Execute the program with the "GQL":

./SubgraphMatching.out -d ../../test/sample_dataset/test_case_1.graph -q ../../test/sample_dataset/query1_positive.graph -filter GQL -order GQL -engine LFTJ -num 100000

Experiment Datasets

The real world datasets and the corresponding query sets used in our paper can be downloaded here. As the synthetic datasets are large, we do not upload them. You can easily generate them by following the instruction in our paper.

More Repositories

1

CommunityDetectionCodes

Some overlapping community detection algorithms (Until 2016). by Yulin Che (https://github.com/CheYulin) for the PhD qualification exam (survey on community detection algorithms)
C++
203
star
2

EngineRaceRapids

Rapids团队 (https://github.com/CheYulin , https://github.com/shixuansun and https://github.com/WANG-lp), Engine Race (Key-Value Store on Intel Optane SSD, https://tianchi.aliyun.com/competition/entrance/231689/rankingList/1 ),线上成绩413.69s, 排名第1
C++
112
star
3

ContinuousSubgraphMatching

Source code and datasets of "An In-Depth Study of Continuous Subgraph Matching", accepted by VLDB'22 - By Xibo Sun, Dr. Shixuan Sun, Prof. Qiong Luo, and Prof. Bingsheng He
C++
40
star
4

RapidMatch

Source code and datasets of "RapidMatch: A Holistic Approach to Subgraph Query Processing", accepted by VLDB'21 - By Shixuan Sun, Xibo Sun, Yulin Che, Prof. Qiong Luo, and Prof. Bingsheng He
C++
34
star
5

ppSCAN

ppSCAN: Parallelizing Pruning-based Graph Structural Clustering (ICPP'18) - by Yulin Che, Shixuan Sun and Prof. Qiong Luo
C++
30
star
6

RapidEC

Source code of "GPU-accelerated elliptic curve digital signature algorithms", accepted by SC'22 - by Zonghao Feng, Qipeng Xie, Qiong Luo, etc.
Cuda
22
star
7

SubgraphContainment

Scaling Up Subgraph Query Processing with Efficient Subgraph Matching by Shixuan Sun and Dr. Qiong Luo
C++
17
star
8

EGSM

Source code and datasets of "Efficient GPU-Accelerated Subgraph Matching", accepted by SIGMOD'23 - By Xibo Sun and Prof. Qiong Luo
Cuda
14
star
9

AccTrussDecomposition

Source code of "Accelerating Truss Decomposition on Heterogeneous Processors", accepted by VLDB'20 - By Yulin Che, Zhuohang Lai, Shixuan Sun, Yue Wang, and Prof. Qiong Luo
C++
14
star
10

SimRank

1) SimRank (single pair query, parallel all pair computation / dynamic updates) - by Yue Wang (https://github.com/KeithYue) and Yulin Che (https://github.com/CheYulin). 2) top-k version (https://github.com/CheYulin/SimRank/tree/develop) by Yue Wang and Zonghao Feng (https://github.com/unisolate)
C++
14
star
11

Graph500KroneckerGraphGenerator

Kronecker Graph Generator (Forked from Graph500, supporting a binary edge list format)
C
10
star
12

PrimitivesAndGraphProcessing-GPU

Survey of Primitives and Graph Processing on GPUs (Until 2016) - by Yulin Che (https://github.com/CheYulin)
10
star
13

DBGC

Source code of "Density-Based Geometry Compression for LiDAR Point Clouds", accepted by EDBT'23 - By Xibo Sun and Prof. Qiong Luo
C++
7
star
14

UNIPAR

UNItig construction in PARallel with CPUs and GPUs
Cuda
6
star
15

hga

Heterogeneous Graph Aligner
Cuda
6
star
16

LIGHT

Source code of "Efficient Parallel Subgraph Enumeration on a Single Machine (ICDE2019)" by Shixuan Sun, Yulin Che, Lipeng Wang and Qiong Luo
C++
6
star
17

VIC-DDPM

source code for paper 'A Conditional Denoising Diffusion Probabilistic Model for Radio Interferometric Image Reconstruction'
Python
6
star
18

pow-bench

Source code of "Evaluating Memory-Hard Proof-of-Work Algorithms on Three Processors", accepted by VLDB'20 - By Zonghao Feng and Qiong Luo
C
5
star
19

AccMultiwayJoins

Source code of "Accelerating Multi-way Joins on the GPU", accepted by VLDBJ'22 - By Dr. Zhuohang Lai, Xibo Sun, Prof. Qiong Luo, and Xiaolong Xie
Cuda
5
star
20

GraphReorderAndConverter

Graph Reorder And Converter (e.g, Parallel OpenMP Implementation of Edge List to CSR with Sorted Neighbors) - by Yulin Che (https://github.com/CheYulin)
C++
5
star
21

cuGridder

Source code of "Efficient Radio Interferometric Imaging on the GPU", accepted by eScience'22 - By Honghao Liu, Qiong Luo, and Feng Wang
Cuda
4
star
22

PolarRec

Source code of 'PolarRec: Improving Radio Interferometric Data Reconstruction Using Polar Coordinates', CVPR'24 accepted. By Ruoqi Wang, Zhuoyang Chen, Jiayi Zhu, Qiong Luo, Feng Wang
Python
4
star
23

manymap

Accelerating Long Read Alignment on Three Processors
C++
3
star
24

KroneckerBinEdgeListToCSR

Parallel Kronecker Binary EdgeList (*.bin) To CSR (Lijun Chang's Format: b_adj.bin, b_degree.bin), Graph Statistics: Parallel TC/Core/DODG Analytics
C++
2
star
25

TriangleCounting

Triangle Counting (DataFoutain, 三角形图计算算法设计及性能优化, top 11)
C
2
star
26

AccTriCnt

Accelerating All-Edge Common Neighbor Counting on Three Processors (ICPP'19) - By Yulin Che, Zhuohang Lai, Shixuan Sun, Prof. Qiong Luo and Yue Wang
C++
2
star
27

mic_bc

Betweenness centrality on Intel Xeon Phi
C++
2
star
28

SyntheticGraphBenchmark

Synthetic Graph Generator (LFR Benchmark, 2009)
C++
1
star
29

Efficient-DPP

Data-parallel primitives implementations in OpenCL their native code versions
Cuda
1
star
30

EngineRace

Engine Race (Key-Value Store for Intel Optane SSD) - By Rapids团队 (https://github.com/CheYulin & https://github.com/shixuansun & https://github.com/WANG-lp)
C++
1
star