• Stars
    star
    381
  • Rank 112,502 (Top 3 %)
  • Language
    Java
  • License
    Apache License 2.0
  • Created about 10 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Mazerunner extends a Neo4j graph database to run scheduled big data graph compute algorithms at scale with HDFS and Apache Spark.

Graph Analytics for Neo4j

This docker image adds high-performance graph analytics to a Neo4j graph database. This image deploys a container with Apache Spark and uses GraphX to perform ETL graph analysis on subgraphs exported from Neo4j. The results of the analysis are applied back to the data in the Neo4j database.

Supported Algorithms

PageRank

Closeness Centrality

Betweenness Centrality

Triangle Counting

Connected Components

Strongly Connected Components

Neo4j Mazerunner Service

The Neo4j Mazerunner service in this image is a unmanaged extension that adds a REST API endpoint to Neo4j for submitting graph analysis jobs to Apache Spark GraphX. The results of the analysis are applied back to the nodes in Neo4j as property values, making the results queryable using Cypher.

Installation/Deployment

Installation requires 3 docker image deployments, each containing a separate linked component.

  • Hadoop HDFS (sequenceiq/hadoop-docker:2.4.1)
  • Neo4j Graph Database (kbastani/docker-neo4j:2.2.1)
  • Apache Spark Service (kbastani/neo4j-graph-analytics:1.1.0)

Pull the following docker images:

docker pull sequenceiq/hadoop-docker:2.4.1
docker pull kbastani/docker-neo4j:2.2.1
docker pull kbastani/neo4j-graph-analytics:1.1.0

After each image has been downloaded to your Docker server, run the following commands in order to create the linked containers.

# Create HDFS
docker run -i -t --name hdfs sequenceiq/hadoop-docker:2.4.1 /etc/bootstrap.sh -bash

# Create Mazerunner Apache Spark Service
docker run -i -t --name mazerunner --link hdfs:hdfs kbastani/neo4j-graph-analytics:1.1.0

# Create Neo4j database with links to HDFS and Mazerunner
# Replace <user> and <neo4j-path>
# with the location to your existing Neo4j database store directory
docker run -d -P -v /Users/<user>/<neo4j-path>/data:/opt/data --name graphdb --link mazerunner:mazerunner --link hdfs:hdfs kbastani/docker-neo4j:2.2.1

Use Existing Neo4j Database

To use an existing Neo4j database, make sure that the database store directory, typically data/graph.db, is available on your host OS. Read the setup guide for kbastani/docker-neo4j for additional details.

Note: The kbastani/docker-neo4j:2.2.1 image is running Neo4j 2.2.1. If you point it to an older database store, that database may become unable to be attached to a previous version of Neo4j. Make sure you back up your store files before proceeding.

Use New Neo4j Database

To create a new Neo4j database, use any path to a valid directory.

Accessing the Neo4j Browser

The Neo4j browser is exposed on the graphdb container on port 7474. If you're using boot2docker on MacOSX, follow the directions here to access the Neo4j browser.

Usage Directions

Graph analysis jobs are started by accessing the following endpoint:

http://localhost:7474/service/mazerunner/analysis/{analysis}/{relationship_type}

Replace {analysis} in the endpoint with one of the following analysis algorithms:

  • pagerank
  • closeness_centrality
  • betweenness_centrality
  • triangle_count
  • connected_components
  • strongly_connected_components

Replace {relationship_type} in the endpoint with the relationship type in your Neo4j database that you would like to perform analysis on. The nodes that are connected by that relationship will form the graph that will be analyzed. For example, the equivalent Cypher query would be the following:

MATCH (a)-[:FOLLOWS]->(b)
RETURN id(a) as src, id(b) as dst

The result of the analysis will set the property with {analysis} as the key on (a) and (b). For example, if you ran the pagerank analysis on the FOLLOWS relationship type, the following Cypher query will display the results:

MATCH (a)-[:FOLLOWS]-()
RETURN DISTINCT id(a) as id, a.pagerank as pagerank
ORDER BY pagerank DESC

Available Metrics

To begin graph analysis jobs on a particular metric, HTTP GET request on the following Neo4j server endpoints:

PageRank

http://172.17.0.21:7474/service/mazerunner/analysis/pagerank/FOLLOWS
  • Gets all nodes connected by the FOLLOWS relationship and updates each node with the property key pagerank.

  • The value of the pagerank property is a float data type, ex. pagerank: 3.14159265359.

  • PageRank is used to find the relative importance of a node within a set of connected nodes.

Closeness Centrality

http://172.17.0.21:7474/service/mazerunner/analysis/closeness_centrality/FOLLOWS
  • Gets all nodes connected by the FOLLOWS relationship and updates each node with the property key closeness_centrality.

  • The value of the closeness_centrality property is a float data type, ex. pagerank: 0.1337.

  • A key node centrality measure in networks is closeness centrality (Freeman, 1978; Opsahl et al., 2010; Wasserman and Faust, 1994). It is defined as the inverse of farness, which in turn, is the sum of distances to all other nodes.

Betweenness Centrality

http://172.17.0.21:7474/service/mazerunner/analysis/betweenness_centrality/FOLLOWS
  • Gets all nodes connected by the FOLLOWS relationship and updates each node with the property key betweenness_centrality.

  • The value of the betweenness_centrality property is a float data type, ex. betweenness_centrality: 20.345.

  • Betweenness centrality is an indicator of a node's centrality in a network. It is equal to the number of shortest paths from all vertices to all others that pass through that node. A node with high betweenness centrality has a large influence on the transfer of items through the network, under the assumption that item transfer follows the shortest paths.

Triangle Counting

http://172.17.0.21:7474/service/mazerunner/analysis/triangle_count/FOLLOWS
  • Gets all nodes connected by the FOLLOWS relationship and updates each node with the property key triangle_count.

  • The value of the triangle_count property is an integer data type, ex. triangle_count: 2.

  • The value of triangle_count represents the count of the triangles that a node is connected to.

  • A node is part of a triangle when it has two adjacent nodes with a relationship between them. The triangle_count property provides a measure of clustering for each node.

Connected Components

http://172.17.0.21:7474/service/mazerunner/analysis/connected_components/FOLLOWS
  • Gets all nodes connected by the FOLLOWS relationship and updates each node with the property key connected_components.

  • The value of connected_components property is an integer data type, ex. connected_components: 181.

  • The value of connected_components represents the Neo4j internal node ID that has the lowest integer value for a set of connected nodes.

  • Connected components are used to find isolated clusters, that is, a group of nodes that can reach every other node in the group through a bidirectional traversal.

Strongly Connected Components

http://172.17.0.21:7474/service/mazerunner/analysis/strongly_connected_components/FOLLOWS
  • Gets all nodes connected by the FOLLOWS relationship and updates each node with the property key strongly_connected_components.

  • The value of strongly_connected_components property is an integer data type, ex. strongly_connected_components: 26.

  • The value of strongly_connected_components represents the Neo4j internal node ID that has the lowest integer value for a set of strongly connected nodes.

  • Strongly connected components are used to find clusters, that is, a group of nodes that can reach every other node in the group through a directed traversal.

Architecture

Mazerunner uses a message broker to distribute graph processing jobs to Apache Spark's GraphX module. When an agent job is dispatched, a subgraph is exported from Neo4j and written to Apache Hadoop HDFS.

After Neo4j exports a subgraph to HDFS, a separate Mazerunner service for Spark is notified to begin processing that data. The Mazerunner service will then start a distributed graph processing algorithm using Scala and Spark's GraphX module. The GraphX algorithm is serialized and dispatched to Apache Spark for processing.

Once the Apache Spark job completes, the results are written back to HDFS as a Key-Value list of property updates to be applied back to Neo4j.

Neo4j is then notified that a property update list is available from Apache Spark on HDFS. Neo4j batch imports the results and applies the updates back to the original graph.

License

This library is licensed under the Apache License, Version 2.0.

More Repositories

1

neo4j-apoc-procedures

Awesome Procedures On Cypher for Neo4j - codenamed "apoc"                     If you like it, please ★ above ⇧            
Java
1,706
star
2

neovis.js

Neo4j + vis.js = neovis.js. Graph visualizations in the browser with data from Neo4j.
TypeScript
1,601
star
3

neomodel

An Object Graph Mapper (OGM) for the Neo4j graph database.
Python
955
star
4

spatial

Neo4j Spatial is a library of utilities for Neo4j that faciliates the enabling of spatial operations on data. In particular you can add spatial indexes to already located data, and perform spatial operations on the data like searching for data within specified regions or within a specified distance of a point of interest. In addition classes are provided to expose the data to geotools and thereby to geotools enabled applications like geoserver and uDig.
Scheme
780
star
5

neo4j-graph-algorithms

Efficient Graph Algorithms for Neo4j
Java
771
star
6

django-neomodel

Neomodel plugin for Django
Python
286
star
7

neoclipse

Graph Database Tool
Java
219
star
8

neo4j-etl

Data import from relational databases to Neo4j.
HTML
215
star
9

neo4j-elasticsearch

Neo4j ElasticSearch Integration
Java
211
star
10

graphgist

Easy publishing with graph data included
JavaScript
208
star
11

neo4j-streams

Neo4j Kafka Connector
Kotlin
173
star
12

neo4j-helm

Helm Charts for running Neo4j on Kubernetes [DEPRECATED]
Shell
88
star
13

rabbithole

Interactive, embeddable Neo4j-Console
Java
79
star
14

neo4j-mobile-android

Neo4j for Android
Java
78
star
15

authentication-extension

Neo4j Server Auth Extension
Java
72
star
16

gists

Gists for use in GraphGists.
68
star
17

sparql-plugin

Java
67
star
18

kubernetes-neo4j

(RETIRED) Kubernetes experiments with Neo4j. See updated Helm Repo
60
star
19

neo4j-tableau

Neo4j Tableau Integration via WDC
Java
59
star
20

gremlin-plugin

A Plugin for the Neo4j server add Tinkerpop-related functionality
Java
55
star
21

neo4j-org

neo4j.org website
JavaScript
55
star
22

ec2neo

CloudFormation Templates for deploying Neo4j
Ruby
51
star
23

graph-collections

In-graph collections for the Neo4j graph database.
Java
49
star
24

twitter-neo4j

CSS
46
star
25

osm

OSM Data Model for Neo4j
Java
46
star
26

trumpworld-graph

Import, Extend, Query & Visualize the TrumpWorld Graph with Neo4j & Cypher (originally based on the BuzzFeed data)
Jupyter Notebook
43
star
27

fast-http

Fast HTTP protocol with separate netty+disruptor based server
Java
42
star
28

neo4j-meetups-reporting

This is an end-to-end graph-based reporting sample built on Neo4j for tracking and measuring meetup group membership statistics over time.
JavaScript
38
star
29

neo4j-ml-procedures

This project provides procedures and functions to support machine learning applications with Neo4j.
Java
37
star
30

training

HTML
36
star
31

neo4j-faker

Use faker cypher functions to generate demo and test data with cypher
Java
34
star
32

neo4j-guides

Tooling to create Neo4j Browser Guides from AsciiDoc Content
HTML
33
star
33

neo4j-puppet

Puppet module for installing Neo4j on Linux systems
Puppet
30
star
34

sql2cypher

Experimental SQL to Cypher Transpiler using jooq and cypher-dsl
Java
26
star
35

ImageToGraph

A CLI tool to convert images into graphs
Java
25
star
36

python-embedded

Python bindings for Neo4j
Python
25
star
37

neo4j-tinkerpop-api-impl

Implementation of Apache Licensed Neo4j API for Tinkerpop3
Java
24
star
38

training-v3

(new) Neo4j Training Material
Jupyter Notebook
21
star
39

neo4j-csv-import-web

Prototype web app to automate data import of csv files into Neo4j
HTML
21
star
40

spatial-algorithms

Spatial algorithms for both cartesian and geographic data
Java
21
star
41

northwind-neo4j

21
star
42

py2neo

EOL! Py2neo is a comprehensive Neo4j driver library and toolkit for Python.
Python
20
star
43

neo4j-geoff

GEOFF file loader plugin for Neo4j
Java
19
star
44

neo4j-script-procedures

Neo4j Procedures to declare, store and run Javascript, Python, R, Ruby based procedures
Java
18
star
45

training-v2

HTML
14
star
46

sandbox-guides

Resources for building and deploying Neo4j Browser Guides including with sandbox instances
HTML
12
star
47

neo4j-osgi

Java
12
star
48

asciidoc-slides

HTML
11
star
49

javascript-plugin

JavaScript plugin for Neo4j Server.
Java
11
star
50

neo4j-dwh-connector

Scala
11
star
51

neo4j-temp-db

JavaScript
9
star
52

manual-chinese

Shell
9
star
53

neo4j-cassandra-connector

Python command line application for inserting data from Cassandra into Neo4j
Python
8
star
54

m2

Maven Repository
8
star
55

neo4j-jfr

Toolbox to help monitor Neo4j internals with Java Flight Recorder
Java
7
star
56

graph-refactoring

Procedures for Graph Refactorings
Java
7
star
57

datascience-ml-training

Python
6
star
58

developer-resources-fr

French Translation Developer Resources
Shell
6
star
59

script-extension

Server Extension for JVM scripting languages
Java
6
star
60

neo4j-tinkerpop-api

Apache Licensed Neo4j API for Tinkerpop3
Java
6
star
61

neo4j-alexa-skills

Amazon Echo Alexa Skills for querying Neo4j
PHP
6
star
62

cypher-http-examples

Examples on how to connect to the Cypher endpoints from Java with different http libraries
Java
6
star
63

waza-zen-table

Zen-Table Hacking for http://waza.heroku.com
JavaScript
5
star
64

graphgist-portal-v3

GraphGist Portal v3 (JavaScript)
JavaScript
5
star
65

node-neo4j-demo

Template application for Neo4j using Node.js
JavaScript
5
star
66

geoff-plugin

Java
4
star
67

relate-at-graphconnect

Relate! at Graph Connect
4
star
68

neo4j-rdf-sparql

Java
4
star
69

neo4j-rdf

Java
4
star
70

manual-french

Shell
4
star
71

neo_lock_down

Security Rule for Neo4j that disables Traversal REST API
Java
4
star
72

neo4splunk

Neo4Splunk is a Neo4j Splunk Connector using Spunks Python APIs and the Neo4j Python Driver (Proof of Concept)
Python
3
star
73

neo4j-lucene5-index

Neo4j Lucene 5 Integration
Java
3
star
74

local-dataflow-runner

Local Dataflow Runner for the googlecloud-to-neo4j template
Java
3
star
75

neo4j-sproc-compiler

Compile-time annotation processor to verify Neo4j procedure|functions validity
Java
3
star
76

neo4j-community-api

API for Neo4j Community Management
Python
3
star
77

neo4j-data-science-yelp

Online Data Science Class with the Yelp Dataset
Jupyter Notebook
3
star
78

neo4j-contrib.github.io

Neo4j Contrib Landing Page
HTML
2
star
79

design-patterns

2
star
80

community

Neo4j Community Working Group
2
star
81

neo4j-com-examples

Example code for the website
Java
2
star
82

neo4j-utils

Java
2
star
83

training-backend

Neo4j Training Backend for Online Course with Versal
Java
2
star
84

neo4j-library-resources

A resource for neo4j driver/library creators to help creating a helpful website
2
star
85

neo4j-rdf-sail

Java
2
star
86

neo4j-learn

JavaScript
2
star
87

neo4j-dcos

Mesopshere Universe Package(s) for Neo4j Cluster
Shell
2
star
88

neo4j-http

PoC for an external HTTP API using Bolt.
Java
2
star
89

j2ee

Java
2
star
90

neo4j-website

Neo4j Website
JavaScript
2
star
91

neo4j-graph-matching

Java
2
star
92

neo4j-meta-model

Java
2
star
93

legacy-index

Legacy index implementation for Neo4j, i.e. IndexService and LuceneIndexService
Java
2
star
94

sandbox-code-updater

Code Updater for the sandbox repositories in github.com/neo4j-graph-examples
Java
2
star
95

classmarker-integration

HTML
1
star
96

applied-graph-algorithms-training

HTML
1
star
97

neo4j-movies

Basic Graph Domain Model for Movies and Import/Export/Examples/Integration
1
star
98

GSoC

Neo4j Google Summer of Code Projects
1
star
99

aura-professional-on-GCP

Instructions for getting started with Aura
JavaScript
1
star
100

tooling

Java
1
star