• This repository has been archived on 06/May/2021
  • Stars
    star
    261
  • Rank 156,630 (Top 4 %)
  • Language
    Java
  • Created about 9 years ago
  • Updated over 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

GraphAware Framework Module for Integrating Neo4j with Elasticsearch

GraphAware Neo4j Elasticsearch Integration (Neo4j Module) - RETIRED

GraphAware Neo4j Elasticsearch Integration Has Been Retired

As of May 2021, this repository has been retired.

GraphAware Elasticsearch Integration is an enterprise-grade bi-directional integration between Neo4j and Elasticsearch. It consists of two independent modules plus a test suite. Both modules can be used independently or together to achieve full integration.

The first module (this project) is a plugin to Neo4j (more precisely, a GraphAware Transaction-Driven Runtime Module), which can be configured to transparently and asynchronously replicate data from Neo4j to Elasticsearch. This module is now production-ready and officially supported by GraphAware for GraphAware Enterprise subscribers.

The second module (a.k.a. Graph-Aided Search) is a plugin to Elasticsearch that can consult the Neo4j database during an Elasticsearch query to enrich the result (boost the score) by results that are more efficiently calculated in a graph database, e.g. recommendations.

Community vs Enterprise

This open-source (GPL) version of the module is compatible with GraphAware Framework Community (GPL), which in turn is compatible with Neo4j Community Edition (GPL) only. It will not work with Neo4j Enterprise Edition, which is a proprietary and commercial software product of Neo4j, Inc..

GraphAware offers an Enterprise version of the GraphAware Framework to licensed users of Neo4j Enterprise Edition. Please get in touch to receive access.

Neo4j -> Elasticsearch

Getting the Software

Server Mode

When using Neo4j in the standalone server mode, you will need three (3) .jar files (all of which you can download here) dropped into the plugins directory of your Neo4j installation:

After changing a few lines of config (read on) and restarting Neo4j, the module will do its magic.

Embedded Mode / Java Development

Java developers that use Neo4j in embedded mode and those developing Neo4j server plugins, unmanaged extensions, GraphAware Runtime Modules, or Spring MVC Controllers can include use the module as a dependency for their Java project.

Releases

Releases are synced to Maven Central repository. When using Maven for dependency management, include the following dependency in your pom.xml and change the version to the correct one.

<dependencies>
    ...
    <dependency>
        <groupId>com.graphaware.integration.es</groupId>
        <!-- this will be com.graphaware.neo4j in the next release -->
        <artifactId>neo4j-to-elasticsearch</artifactId>
        <version>A.B.C.D.E</version>
    </dependency>
    ...
</dependencies>

Snapshots

To use the latest development version, just clone this repository, run mvn clean install and change the version in the dependency above to A.B.C.D.E-SNAPSHOT.

Note on Versioning Scheme

The version number has two parts. The first four numbers indicate compatibility with Neo4j GraphAware Framework. The last number is the version of the Elasticsearch Integration library. For example, version 2.3.2.37.1 is version 1 of the Elasticsearch Integration library compatible with GraphAware Neo4j Framework 2.3.2.37 (and thus Neo4j 2.3.2).

Note on UUID

It is a very bad practice to expose internal Neo4j node IDs to external systems. The reason for that is that these IDs are not guaranteed to be stable and are re-used when nodes are deleted. For this reason, unless you have your own unique identifier for your nodes already, we highly recommend using GraphAware Neo4j UUID Module in conjunction with the Elasticsearch Integration Library. The rest of this manual will show you how to do that.

Configuring things as described below means all (or a selected subset of) your nodes will automatically be assigned an immutable uuid property, which will be indexed in Neo4j and used in Elasticsearch as the key for your indexed nodes (a.k.a. documents). When Elasticsearch returns a result, it will be the UUID that you will use to retrieve the Node from Neo4j.

Setup and Configuration

Server Mode

Edit neo4j.conf to register the required modules:


# This setting should only be set once for registering the framework and all the used submodules
dbms.unmanaged_extension_classes=com.graphaware.server=/graphaware

com.graphaware.runtime.enabled=true

#UIDM becomes the module ID:
com.graphaware.module.UIDM.1=com.graphaware.module.uuid.UuidBootstrapper

#optional, default is "uuid". (only if using the UUID module)
com.graphaware.module.UIDM.uuidProperty=uuid

#optional, default is all nodes:
com.graphaware.module.UIDM.node=hasLabel('Label1') || hasLabel('Label2')

#optional, default is uuidIndex
com.graphaware.module.UIDM.uuidIndex=uuidIndex

#prevent the whole db to be assigned a new uuid if the uuid module is settle up together with neo4j2es
com.graphaware.module.UIDM.initializeUntil=0

#ES becomes the module ID:
com.graphaware.module.ES.2=com.graphaware.module.es.ElasticSearchModuleBootstrapper

#URI of Elasticsearch
com.graphaware.module.ES.uri=localhost

#Port of Elasticsearch
com.graphaware.module.ES.port=9201

#optional, protocol of Elasticsearch connection, defaults to http
com.graphaware.module.ES.protocol=http

#optional, Elasticsearch index name, default is neo4j-index
com.graphaware.module.ES.index=neo4j-index

#optional, node property key of a propery that is used as unique identifier of the node. Must be the same as com.graphaware.module.UIDM.uuidProperty (only if using UUID module), defaults to uuid
#use "ID()" to use native Neo4j IDs as Elasticsearch IDs (not recommended)
com.graphaware.module.ES.keyProperty=uuid

#optional, whether to retry if a replication fails, defaults to false
com.graphaware.module.ES.retryOnError=false

#optional, size of the in-memory queue that queues up operations to be synchronised to Elasticsearch, defaults to 10000
com.graphaware.module.ES.queueSize=10000

#optional, size of the batch size to use during re-initialization, defaults to 1000
com.graphaware.module.ES.reindexBatchSize=2000

#optional, specify which nodes to index in Elasticsearch, defaults to all nodes
com.graphaware.module.ES.node=hasLabel('Person')

#optional, specify which node properties to index in Elasticsearch, defaults to all properties
com.graphaware.module.ES.node.property=key != 'age'

#optional, specify whether to send updates to Elasticsearch in bulk, defaults to true (highly recommended)
com.graphaware.module.ES.bulk=true

#optional, read explanation below, defaults to 0
com.graphaware.module.ES.initializeUntil=0

#optional, whether or not the reindexation process (when db start) should be made in asynchronous mode
#default is "false" and the db will not be available until completed
#com.graphaware.module.ES.asyncIndexation=true

For explanation of the UUID configurations, please see the UUID Module docs.

For explanation of the syntax used in the configuration, refer to the Inclusion Policies.

The Elasticsearch Integration configuration is described in the inline comments above. The only property that needs a little more explanation is com.graphaware.module.ES.initializeUntil:

Every GraphAware Framework Module has methods (initialize() and reinitialize()) that provide a mechanism to get the world into a state equivalent to a situation in which the module has been running since the database was empty. These methods kick in in one of the following scenarios:

  • The database is not empty when the module has been registered for the first time (GraphAware Framework used on an existing database)
  • The configuration of the module has changed since the last time it was run
  • Some failure occurred that causes the Framework to think it should fix things.

We've decided that we should not shoot the whole database at Elasticsearch in one of these scenarios automatically, because it could well be quite large. Therefore, in order to trigger (re-)indexing, i.e. sending every node that should be indexed to Elasticsearch upon Neo4j restart, you have to manually intervene.

The way you intervene is set the com.graphaware.module.ES.initializeUntil to a number slightly higher than a Java call to System.currentTimeInMillis() would return when the module is starting. This way, the database will be (re-)indexed once, not with every following restart. In other words, re-indexing will happen iff System.currentTimeInMillis() < com.graphaware.module.ES.initializeUntil. If you're not sure what all of this means or don't know how to find the right number to set this value to, you're probably best off leaving it alone or getting in touch for some (paid) support.

ElasticSearch Shield Support

If Shield plugin is installed and enabled on Elasticsearch node, it is possible to add authentication parameters in the configuration. Here an example:

#optional, specify the Shield user
com.graphaware.module.ES.authUser=neo4j_user

#optional, specify the Shield password
com.graphaware.module.ES.authPassword=123456

Both of them MUST be specified to enabling Authentication. The user must be able to perform writes on the elasticsearch instance.

Embedded Mode / Java Development

To use the ElasticSearch Integration Module programmatically, register the module like this

GraphAwareRuntime runtime = GraphAwareRuntimeFactory.createRuntime(database); //where database is an instance of GraphDatabaseService
runtime.registerModule(new UuidModule("UUID", UuidConfiguration.defaultConfiguration(), database));

configuration = ElasticSearchConfiguration.defaultConfiguration(HOST, PORT);
runtime.registerModule(new ElasticSearchModule("ES", new ElasticSearchWriter(configuration), configuration));

runtime.start();

Alternatively:

 GraphDatabaseService database = new GraphDatabaseFactory().newEmbeddedDatabaseBuilder(pathToDb)
    .loadPropertiesFromFile(this.getClass().getClassLoader().getResource("neo4j.properties").getPath())
    .newGraphDatabase();

 //make sure neo4j.properties contain the lines mentioned in previous section

Usage

Apart from the configuration described above, the GraphAware ElasticSearch Integration Module requires nothing else to function. It will replicate transactions asynchronously to ElasticSearch.

Cypher Procedures

This module provides a set of Cypher procedures that allows communicate with Elasticsearch using the Cypher query language. These are the available procedures:

Searching for nodes or relationships

This procedures allows to perform search queries on indexed nodes or relationships and return them for further use in the cypher query. Example of usage:

CALL ga.es.queryNode('{\"query\":{\"match\":{\"name\":\"alessandro\"}}}') YIELD node, score RETURN node, score"

Together with the nodes also the related score is returned.

Any search query can be submitted through the procedure, it will be performed on the index configured for replication on Elasticsearch.

Similar procedures are queryNodeRaw and queryRelationshipRaw procedures. These procedures are similar to the queryNode and queryRelationship (they accept the same parameters) but they return a JSON-encoded value of the node or relationship as returned by Ealsticsearch. Example:

CALL ga.es.queryRelationshipRaw('{\"query\":{\"match\":{\"city\":\"paris\"}}}') YIELD json, score RETURN json, score"

Monitoring the status of the reindexing process

Depending on your configuration, the module can be in initialization mode when starting, processing a complete reindexing of the Neo4j graph database content (in accordance with your configuration settings)

You can monitor the status of the init mode:

CALL ga.es.initialized() YIELD status RETURN status

Returns true or false

Getting the current node or relationship mapping

You can retrieve the current node or relationship mapping from Elasticsearch using the following procedure:

CALL ga.es.nodeMapping() YIELD json as mapping RETURN mapping

or

CALL ga.es.relationshipMapping() YIELD json as mapping RETURN mapping

This will return a JSON string containing the mapping returned by Elasticsearch's Get Mapping API. The returned JSON string needs to be decoded using JSON parsing library.

Getting Elasticsearch information

CALL ga.es.info() YIELD json as info return info

This will return a JSON string containing Elasticsearch server information as returned bas the Basic Status API. The returned JSON string needs to be decoded using JSON parsing library. An example of parsed result:

{
  "name" : "Sharon Friedlander",
  "cluster_name" : "elasticsearch",
  "version" : {
    "number" : "2.4.0",
    "build_hash" : "ce9f0c7394dee074091dd1bc4e9469251181fc55",
    "build_timestamp" : "2016-08-29T09:14:17Z",
    "build_snapshot" : false,
    "lucene_version" : "5.5.2"
  },
  "tagline" : "You Know, for Search"
}

Version of ElasticSearch

This module has been tested with ElasticSearch 2.3.0+.

License

Copyright (c) 2013-2020 GraphAware

GraphAware is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

More Repositories

1

neo4j-reco

Neo4j-based recommendation engine module with real-time and pre-computed recommendations.
Java
374
star
2

neo4j-nlp

NLP Capabilities in Neo4j
Java
335
star
3

neo4j-framework

GraphAware Neo4j Framework
Java
243
star
4

neo4j-timetree

Java and REST APIs for working with time-representing tree in Neo4j
Java
206
star
5

graph-aided-search

Elasticsearch plugin offering Neo4j integration for Personalized Search
Java
155
star
6

neo4j-php-ogm

Neo4j Object Graph Mapper for PHP
PHP
153
star
7

reco4php

Neo4j based Recommendation Engine Framework for PHP
PHP
130
star
8

neo4j-uuid

GraphAware Runtime Module that assigns a UUID to all nodes (and relationships) in the graph transparently
Java
103
star
9

neo4j-bolt-php

PHP Driver for Neo4j's Binary Protocol : Bolt
PHP
42
star
10

neo4j-algorithms

Custom graph algorithms for Neo4j with own Java and REST APIs
Java
34
star
11

neo4j-expire

GraphAware Module for Expiring (Deleting) Nodes and Relationships
Java
29
star
12

recommendations-meetup

Skeleton for Meetup - Building your own recommendation engine in an hour
Java
29
star
13

neo4j-casual-cluster-quickstart

A demonstration of causal clustering using Docker
27
star
14

neo4j-nlp-stanfordnlp

Stanford NLP implementation for Neo4j
Java
26
star
15

neo4j-importer

Java importer skeleton for complicated, business-logic-heavy high-performance Neo4j imports directly from SQL databases, CSV files, etc.
Java
26
star
16

neo4j-noderank

GraphAware Timer-Driven Runtime Module that executes PageRank-like algorithm on the graph
Java
26
star
17

neo4j-php-commons

Common Utility Classes for using Neo4j in PHP
PHP
24
star
18

graph-technology-landscape

Graph Technology Landscape
23
star
19

graph-aided-search-demo

21
star
20

neo4j-config-cli

neo4j-config-cli is a Neo4j utility to ensure the desired configuration state of a Neo4j database based on a json file definition.
Java
16
star
21

neo4j-changefeed

A GraphAware Framework Runtime Module allowing users to find out what were the latest changes performed on the graph
Java
16
star
22

fix-your-microservices

Code examples for talk 'Fix your microservice architecture using graph analysis'
Shell
14
star
23

neo4j-resttest

Library for testing Neo4j code over REST
Java
13
star
24

neo4j-nlp-opennlp

Java
12
star
25

neo4j-relcount

GraphAware Relationship Count Module
Java
11
star
26

neo4j-warmup

Simple library that warms up Neo4j caches with a single REST call
Java
10
star
27

neo4j-graphgen-procedure

Neo4j Procedure for generating test data
Java
9
star
28

neo4j-triggers

Neo4j Triggers on Steroids
Java
8
star
29

offheap

Java
7
star
30

graphaware-starter

A sample project that gets you quickly started with the GraphAware Framework
Java
7
star
31

neo4j-full-text-search-extra

Extra components for working with Neo4j Full Text Search
Java
6
star
32

monitoring-neo4j-prometheus-grafana

Docker-compose setup to test monitoring Neo4j Causal Cluster with Prometheus and Grafana
6
star
33

php-graphunit

Neo4j Graph Database Assertion Tool for PHPUnit
PHP
6
star
34

reco

Generic Recommendation Engine Skeleton
Java
5
star
35

node-local-relationship-index

Java
5
star
36

neo4j-rabbitmq-integration

GraphAware module offering transaction data to be sent as json to RabbitMQ
Java
5
star
37

ga-cytoscape

Cytoscape.js Web Component built with Stencil
TypeScript
5
star
38

node-neo4j-bolt-adapter

An adapter for the official neo4j-javascript-driver, allowing it to be used as a drop-in replacement for the node-neo4j community driver.
JavaScript
5
star
39

neo4j-php-ogm-tutorial

Code repository for the neo4j-php-ogm documentation's tutorial http://neo4j-php-ogm.readthedocs.io/en/latest/getting_started/tutorial/
PHP
5
star
40

neo4j-discourse-slack

App that notifies on Slack about a new message on the Neo4j discourse
Java
5
star
41

neo4j-nlp-docker

4
star
42

neoclient-timetree-extension

Leveraging the Neo4j TimeTree Extension in PHP with NeoClient
PHP
4
star
43

neo4j-testcontainers-blog

Repository with examples accompanying blog post about using testcontainers with Neo4j
Java
4
star
44

hume-nodes2020

Shell
3
star
45

neo4j-stress-test

Java
3
star
46

monitoring-procedure-example

Repository with example accompanying blog post about monitoring Neo4j and custom metrics
Java
3
star
47

hume-starters

Shell
2
star
48

hume-workshop-sep-2021

Shell
2
star
49

hume-helm-charts

Helm charts for deploying GraphAware Hume on Kubernetes
Smarty
2
star
50

rd-neo4j-streaming

Java
2
star
51

recommendation-bundle

PHP
2
star
52

hume-iframe-example

Example Docker setup for iframing GraphAware Hume inside a React application
Shell
2
star
53

neo4j-logging-logstash-elk

2
star
54

hume-helm

Helm charts for running GraphAware Hume on Kubernetes
Smarty
2
star
55

graphite

Define a graph schema. Get a fully working web application using Spring Boot, Spring Data Neo4j and Angular.
Java
2
star
56

neo4j-lucene-custom-analyzer

Java
1
star
57

nodes-2020-security-in-action

Java
1
star
58

docker-elk

1
star
59

php-simplemq

RabbitMQ's Rapid Application Development based on YAML definition
PHP
1
star
60

elasticsearch-tests-integration

Testing Support for GraphAware Neo4j<->Elasticsearch Integration
Java
1
star
61

issuebot_nlp_meetup

Issue Mention Bot using Neo4j and NLP demo code for the Neo4jFR meetup at Prestashop
Python
1
star
62

custom-fulltext-analyzer-blog

Java
1
star
63

neo4j-jmeter-load-tests

Load testing Neo4j queries and procedures with JMeter examples
1
star
64

GithubNeo4j

Demo Application importing User Github Public Events into Neo4j
PHP
1
star
65

neo4j-php-response-formatter

Advanced Neo4j Http Response Formatter Extension for NeoClient
PHP
1
star
66

neo4j-lifecycle

Java
1
star
67

neo4j-reactive-data-copy

Data copy from/to Neo4j example using reactive programming
Java
1
star
68

hume-configuration-as-code-example

Example Repository for a Hume Movies Knowledge Graph configured from YAML definitions
1
star
69

test-recommender

Java
1
star
70

hume-orchestra-workshop-mar-2022

Python
1
star
71

neo4j-multiple-drivers-example

Java
1
star