• Stars
    star
    378
  • Rank 110,841 (Top 3 %)
  • Language
    Java
  • License
    Apache License 2.0
  • Created over 9 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Mazerunner extends a Neo4j graph database to run scheduled big data graph compute algorithms at scale with HDFS and Apache Spark.

Graph Analytics for Neo4j

This docker image adds high-performance graph analytics to a Neo4j graph database. This image deploys a container with Apache Spark and uses GraphX to perform ETL graph analysis on subgraphs exported from Neo4j. The results of the analysis are applied back to the data in the Neo4j database.

Supported Algorithms

PageRank

Closeness Centrality

Betweenness Centrality

Triangle Counting

Connected Components

Strongly Connected Components

Neo4j Mazerunner Service

The Neo4j Mazerunner service in this image is a unmanaged extension that adds a REST API endpoint to Neo4j for submitting graph analysis jobs to Apache Spark GraphX. The results of the analysis are applied back to the nodes in Neo4j as property values, making the results queryable using Cypher.

Installation/Deployment

Installation requires 3 docker image deployments, each containing a separate linked component.

  • Hadoop HDFS (sequenceiq/hadoop-docker:2.4.1)
  • Neo4j Graph Database (kbastani/docker-neo4j:2.2.1)
  • Apache Spark Service (kbastani/neo4j-graph-analytics:1.1.0)

Pull the following docker images:

docker pull sequenceiq/hadoop-docker:2.4.1
docker pull kbastani/docker-neo4j:2.2.1
docker pull kbastani/neo4j-graph-analytics:1.1.0

After each image has been downloaded to your Docker server, run the following commands in order to create the linked containers.

# Create HDFS
docker run -i -t --name hdfs sequenceiq/hadoop-docker:2.4.1 /etc/bootstrap.sh -bash

# Create Mazerunner Apache Spark Service
docker run -i -t --name mazerunner --link hdfs:hdfs kbastani/neo4j-graph-analytics:1.1.0

# Create Neo4j database with links to HDFS and Mazerunner
# Replace <user> and <neo4j-path>
# with the location to your existing Neo4j database store directory
docker run -d -P -v /Users/<user>/<neo4j-path>/data:/opt/data --name graphdb --link mazerunner:mazerunner --link hdfs:hdfs kbastani/docker-neo4j:2.2.1

Use Existing Neo4j Database

To use an existing Neo4j database, make sure that the database store directory, typically data/graph.db, is available on your host OS. Read the setup guide for kbastani/docker-neo4j for additional details.

Note: The kbastani/docker-neo4j:2.2.1 image is running Neo4j 2.2.1. If you point it to an older database store, that database may become unable to be attached to a previous version of Neo4j. Make sure you back up your store files before proceeding.

Use New Neo4j Database

To create a new Neo4j database, use any path to a valid directory.

Accessing the Neo4j Browser

The Neo4j browser is exposed on the graphdb container on port 7474. If you're using boot2docker on MacOSX, follow the directions here to access the Neo4j browser.

Usage Directions

Graph analysis jobs are started by accessing the following endpoint:

http://localhost:7474/service/mazerunner/analysis/{analysis}/{relationship_type}

Replace {analysis} in the endpoint with one of the following analysis algorithms:

  • pagerank
  • closeness_centrality
  • betweenness_centrality
  • triangle_count
  • connected_components
  • strongly_connected_components

Replace {relationship_type} in the endpoint with the relationship type in your Neo4j database that you would like to perform analysis on. The nodes that are connected by that relationship will form the graph that will be analyzed. For example, the equivalent Cypher query would be the following:

MATCH (a)-[:FOLLOWS]->(b)
RETURN id(a) as src, id(b) as dst

The result of the analysis will set the property with {analysis} as the key on (a) and (b). For example, if you ran the pagerank analysis on the FOLLOWS relationship type, the following Cypher query will display the results:

MATCH (a)-[:FOLLOWS]-()
RETURN DISTINCT id(a) as id, a.pagerank as pagerank
ORDER BY pagerank DESC

Available Metrics

To begin graph analysis jobs on a particular metric, HTTP GET request on the following Neo4j server endpoints:

PageRank

http://172.17.0.21:7474/service/mazerunner/analysis/pagerank/FOLLOWS
  • Gets all nodes connected by the FOLLOWS relationship and updates each node with the property key pagerank.

  • The value of the pagerank property is a float data type, ex. pagerank: 3.14159265359.

  • PageRank is used to find the relative importance of a node within a set of connected nodes.

Closeness Centrality

http://172.17.0.21:7474/service/mazerunner/analysis/closeness_centrality/FOLLOWS
  • Gets all nodes connected by the FOLLOWS relationship and updates each node with the property key closeness_centrality.

  • The value of the closeness_centrality property is a float data type, ex. pagerank: 0.1337.

  • A key node centrality measure in networks is closeness centrality (Freeman, 1978; Opsahl et al., 2010; Wasserman and Faust, 1994). It is defined as the inverse of farness, which in turn, is the sum of distances to all other nodes.

Betweenness Centrality

http://172.17.0.21:7474/service/mazerunner/analysis/betweenness_centrality/FOLLOWS
  • Gets all nodes connected by the FOLLOWS relationship and updates each node with the property key betweenness_centrality.

  • The value of the betweenness_centrality property is a float data type, ex. betweenness_centrality: 20.345.

  • Betweenness centrality is an indicator of a node's centrality in a network. It is equal to the number of shortest paths from all vertices to all others that pass through that node. A node with high betweenness centrality has a large influence on the transfer of items through the network, under the assumption that item transfer follows the shortest paths.

Triangle Counting

http://172.17.0.21:7474/service/mazerunner/analysis/triangle_count/FOLLOWS
  • Gets all nodes connected by the FOLLOWS relationship and updates each node with the property key triangle_count.

  • The value of the triangle_count property is an integer data type, ex. triangle_count: 2.

  • The value of triangle_count represents the count of the triangles that a node is connected to.

  • A node is part of a triangle when it has two adjacent nodes with a relationship between them. The triangle_count property provides a measure of clustering for each node.

Connected Components

http://172.17.0.21:7474/service/mazerunner/analysis/connected_components/FOLLOWS
  • Gets all nodes connected by the FOLLOWS relationship and updates each node with the property key connected_components.

  • The value of connected_components property is an integer data type, ex. connected_components: 181.

  • The value of connected_components represents the Neo4j internal node ID that has the lowest integer value for a set of connected nodes.

  • Connected components are used to find isolated clusters, that is, a group of nodes that can reach every other node in the group through a bidirectional traversal.

Strongly Connected Components

http://172.17.0.21:7474/service/mazerunner/analysis/strongly_connected_components/FOLLOWS
  • Gets all nodes connected by the FOLLOWS relationship and updates each node with the property key strongly_connected_components.

  • The value of strongly_connected_components property is an integer data type, ex. strongly_connected_components: 26.

  • The value of strongly_connected_components represents the Neo4j internal node ID that has the lowest integer value for a set of strongly connected nodes.

  • Strongly connected components are used to find clusters, that is, a group of nodes that can reach every other node in the group through a directed traversal.

Architecture

Mazerunner uses a message broker to distribute graph processing jobs to Apache Spark's GraphX module. When an agent job is dispatched, a subgraph is exported from Neo4j and written to Apache Hadoop HDFS.

After Neo4j exports a subgraph to HDFS, a separate Mazerunner service for Spark is notified to begin processing that data. The Mazerunner service will then start a distributed graph processing algorithm using Scala and Spark's GraphX module. The GraphX algorithm is serialized and dispatched to Apache Spark for processing.

Once the Apache Spark job completes, the results are written back to HDFS as a Key-Value list of property updates to be applied back to Neo4j.

Neo4j is then notified that a property update list is available from Apache Spark on HDFS. Neo4j batch imports the results and applies the updates back to the original graph.

License

This library is licensed under the Apache License, Version 2.0.

More Repositories

1

neo4j-apoc-procedures

Awesome Procedures On Cypher for Neo4j - codenamed "apoc"                     If you like it, please ★ above ⇧            
Java
1,676
star
2

neovis.js

Neo4j + vis.js = neovis.js. Graph visualizations in the browser with data from Neo4j.
TypeScript
1,538
star
3

neomodel

An Object Graph Mapper (OGM) for the Neo4j graph database.
Python
910
star
4

spatial

Neo4j Spatial is a library of utilities for Neo4j that faciliates the enabling of spatial operations on data. In particular you can add spatial indexes to already located data, and perform spatial operations on the data like searching for data within specified regions or within a specified distance of a point of interest. In addition classes are provided to expose the data to geotools and thereby to geotools enabled applications like geoserver and uDig.
Scheme
774
star
5

neo4j-graph-algorithms

Efficient Graph Algorithms for Neo4j
Java
766
star
6

neo4j-spark-connector

Neo4j Connector for Apache Spark, which provides bi-directional read/write access to Neo4j from Spark, using the Spark DataSource APIs
Scala
307
star
7

django-neomodel

Neomodel plugin for Django
Python
271
star
8

neoclipse

Graph Database Tool
Java
215
star
9

neo4j-etl

Data import from relational databases to Neo4j.
HTML
211
star
10

neo4j-elasticsearch

Neo4j ElasticSearch Integration
Java
210
star
11

graphgist

Easy publishing with graph data included
JavaScript
209
star
12

cypher-dsl

A Java DSL (Builder) for the Cypher Query Language
Java
188
star
13

neo4j-streams

Neo4j Kafka Connector
Kotlin
171
star
14

neo4j-helm

Helm Charts for running Neo4j on Kubernetes [DEPRECATED]
Shell
88
star
15

neo4j-mobile-android

Neo4j for Android
Java
78
star
16

rabbithole

Interactive, embeddable Neo4j-Console
Java
77
star
17

authentication-extension

Neo4j Server Auth Extension
Java
72
star
18

sparql-plugin

Java
67
star
19

gists

Gists for use in GraphGists.
66
star
20

kubernetes-neo4j

(RETIRED) Kubernetes experiments with Neo4j. See updated Helm Repo
60
star
21

neo4j-tableau

Neo4j Tableau Integration via WDC
Java
59
star
22

neo4j-org

neo4j.org website
JavaScript
56
star
23

gremlin-plugin

A Plugin for the Neo4j server add Tinkerpop-related functionality
Java
56
star
24

ec2neo

CloudFormation Templates for deploying Neo4j
Ruby
51
star
25

graph-collections

In-graph collections for the Neo4j graph database.
Java
49
star
26

twitter-neo4j

CSS
46
star
27

trumpworld-graph

Import, Extend, Query & Visualize the TrumpWorld Graph with Neo4j & Cypher (originally based on the BuzzFeed data)
Jupyter Notebook
43
star
28

osm

OSM Data Model for Neo4j
Java
43
star
29

fast-http

Fast HTTP protocol with separate netty+disruptor based server
Java
42
star
30

neo4j-meetups-reporting

This is an end-to-end graph-based reporting sample built on Neo4j for tracking and measuring meetup group membership statistics over time.
JavaScript
38
star
31

neo4j-ml-procedures

This project provides procedures and functions to support machine learning applications with Neo4j.
Java
37
star
32

training

HTML
36
star
33

neo4j-guides

Tooling to create Neo4j Browser Guides from AsciiDoc Content
HTML
31
star
34

neo4j-faker

Use faker cypher functions to generate demo and test data with cypher
Java
31
star
35

neo4j-puppet

Puppet module for installing Neo4j on Linux systems
Puppet
30
star
36

ImageToGraph

A CLI tool to convert images into graphs
Java
25
star
37

python-embedded

Python bindings for Neo4j
Python
25
star
38

neo4j-tinkerpop-api-impl

Implementation of Apache Licensed Neo4j API for Tinkerpop3
Java
24
star
39

sql2cypher

Experimental SQL to Cypher Transpiler using jooq and cypher-dsl
Java
23
star
40

training-v3

(new) Neo4j Training Material
Jupyter Notebook
21
star
41

neo4j-csv-import-web

Prototype web app to automate data import of csv files into Neo4j
HTML
21
star
42

spatial-algorithms

Spatial algorithms for both cartesian and geographic data
Java
21
star
43

neo4j-geoff

GEOFF file loader plugin for Neo4j
Java
19
star
44

neo4j-script-procedures

Neo4j Procedures to declare, store and run Javascript, Python, R, Ruby based procedures
Java
18
star
45

northwind-neo4j

18
star
46

py2neo

EOL! Py2neo is a comprehensive Neo4j driver library and toolkit for Python.
Python
16
star
47

training-v2

HTML
14
star
48

neo4j-osgi

Java
12
star
49

sandbox-guides

Resources for building and deploying Neo4j Browser Guides including with sandbox instances
HTML
12
star
50

asciidoc-slides

HTML
11
star
51

javascript-plugin

JavaScript plugin for Neo4j Server.
Java
11
star
52

neo4j-dwh-connector

Scala
11
star
53

neo4j-temp-db

JavaScript
9
star
54

manual-chinese

Shell
9
star
55

neo4j-cassandra-connector

Python command line application for inserting data from Cassandra into Neo4j
Python
8
star
56

m2

Maven Repository
8
star
57

neo4j-jfr

Toolbox to help monitor Neo4j internals with Java Flight Recorder
Java
7
star
58

graph-refactoring

Procedures for Graph Refactorings
Java
7
star
59

developer-resources-fr

French Translation Developer Resources
Shell
6
star
60

script-extension

Server Extension for JVM scripting languages
Java
6
star
61

neo4j-tinkerpop-api

Apache Licensed Neo4j API for Tinkerpop3
Java
6
star
62

datascience-ml-training

Python
6
star
63

neo4j-alexa-skills

Amazon Echo Alexa Skills for querying Neo4j
PHP
6
star
64

cypher-http-examples

Examples on how to connect to the Cypher endpoints from Java with different http libraries
Java
6
star
65

waza-zen-table

Zen-Table Hacking for http://waza.heroku.com
JavaScript
5
star
66

node-neo4j-demo

Template application for Neo4j using Node.js
JavaScript
5
star
67

graphgist-portal-v3

GraphGist Portal v3 (JavaScript)
JavaScript
5
star
68

geoff-plugin

Java
4
star
69

relate-at-graphconnect

Relate! at Graph Connect
4
star
70

neo4j-rdf-sparql

Java
4
star
71

neo4j-rdf

Java
4
star
72

manual-french

Shell
4
star
73

neo_lock_down

Security Rule for Neo4j that disables Traversal REST API
Java
4
star
74

neo4splunk

Neo4Splunk is a Neo4j Splunk Connector using Spunks Python APIs and the Neo4j Python Driver (Proof of Concept)
Python
3
star
75

neo4j-sproc-compiler

Compile-time annotation processor to verify Neo4j procedure|functions validity
Java
3
star
76

neo4j-lucene5-index

Neo4j Lucene 5 Integration
Java
3
star
77

local-dataflow-runner

Local Dataflow Runner for the googlecloud-to-neo4j template
Java
3
star
78

neo4j-community-api

API for Neo4j Community Management
Python
3
star
79

neo4j-data-science-yelp

Online Data Science Class with the Yelp Dataset
Jupyter Notebook
3
star
80

neo4j-contrib.github.io

Neo4j Contrib Landing Page
HTML
2
star
81

design-patterns

2
star
82

community

Neo4j Community Working Group
2
star
83

training-backend

Neo4j Training Backend for Online Course with Versal
Java
2
star
84

neo4j-com-examples

Example code for the website
Java
2
star
85

neo4j-utils

Java
2
star
86

neo4j-library-resources

A resource for neo4j driver/library creators to help creating a helpful website
2
star
87

neo4j-rdf-sail

Java
2
star
88

neo4j-learn

JavaScript
2
star
89

neo4j-dcos

Mesopshere Universe Package(s) for Neo4j Cluster
Shell
2
star
90

neo4j-http

PoC for an external HTTP API using Bolt.
Java
2
star
91

j2ee

Java
2
star
92

neo4j-website

Neo4j Website
JavaScript
2
star
93

neo4j-graph-matching

Java
2
star
94

neo4j-meta-model

Java
2
star
95

legacy-index

Legacy index implementation for Neo4j, i.e. IndexService and LuceneIndexService
Java
2
star
96

sandbox-code-updater

Code Updater for the sandbox repositories in github.com/neo4j-graph-examples
Java
2
star
97

classmarker-integration

HTML
1
star
98

applied-graph-algorithms-training

HTML
1
star
99

neo4j-movies

Basic Graph Domain Model for Movies and Import/Export/Examples/Integration
1
star
100

GSoC

Neo4j Google Summer of Code Projects
1
star