• This repository has been archived on 02/Aug/2022
  • Stars
    star
    276
  • Rank 149,319 (Top 3 %)
  • Language
    Java
  • License
    Apache License 2.0
  • Created over 5 years ago
  • Updated over 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

🆕 A machine learning plugin which supports an approximate k-NN search algorithm for Open Distro.

Testing Workflow codecov Documentation Chat PRs welcome!

Open Distro for Elasticsearch KNN

Open Distro for Elasticsearch enables you to run nearest neighbor search on billions of documents across thousands of dimensions with the same ease as running any regular Elasticsearch query. You can use aggregations and filter clauses to further refine your similarity search operations. K-NN similarity search powers use cases such as product recommendations, fraud detection, image and video search, related document search, and more.

Documentation

The README provides information for development of the k-NN plugin. To learn more about plugin usage, please see our documentation. Do not hesitate to create an issue if something is missing from the documentation!

Setup

  1. Check out the package from version control.
  2. Launch Intellij IDEA, choose Import Project, and select the settings.gradle file in the root of this package.
  3. To build from the command line, set JAVA_HOME to point to a JDK 14 before running ./gradlew build.

Build

The package uses the Gradle build system.

  1. Checkout this package from version control.
  2. To build from command line set JAVA_HOME to point to a JDK >=14
  3. Run ./gradlew build

JNI Library

The plugin relies on a JNI library to perform approximate k-NN search. For plugin installations from archive(.zip), it is necessary to ensure .so file for Linux and .jnilib file for Mac OS are present in the Java library path. This can be possible by copying .so/.jnilib to either $ES_HOME or by adding manually -Djava.library.path=<path_to_lib_files> in jvm.options file

To build the JNI Library, follow these steps:

cd jni
cmake .
make

The library will be placed in the jni/release directory.

To build an RPM or DEB of the JNI library, follow these steps:

cd jni
cmake .
make package

The artifacts will be placed in the jni/packages directory.

JNI Library Artifacts

We build and distribute binary library artifacts with Opendistro for Elasticsearch. We build the library binary, RPM and DEB in this GitHub action. We use Centos 7 with g++ 4.8.5 to build the DEB, RPM and ZIP. Additionally, in order to provide as much general compatibility as possible, we compile the library without optimized instruction sets enabled. For users that want to get the most out of the library, they should follow this section and build the library from source in their production environment, so that if their environment has optimized instruction sets, they take advantage of them.

Running Multi-node Cluster Locally

It can be useful to test and debug on a multi-node cluster. In order to launch a 3 node cluster with the KNN plugin installed, run the following command:

./gradlew run -PnumNodes=3

In order to run the integration tests with a 3 node cluster, run this command:

./gradlew :integTest -PnumNodes=3

Debugging

Sometimes it is useful to attach a debugger to either the Elasticsearch cluster or the integration test runner to see what's going on. For running unit tests, hit Debug from the IDE's gutter to debug the tests. For the Elasticsearch cluster, first, make sure that the debugger is listening on port 5005. Then, to debug the cluster code, run:

./gradlew :integTest -Dcluster.debug=1 # to start a cluster with debugger and run integ tests

OR

./gradlew run --debug-jvm # to just start a cluster that can be debugged

The Elasticsearch server JVM will connect to a debugger attached to localhost:5005 before starting. If there are multiple nodes, the servers will connect to debuggers listening on ports 5005, 5006, ...

To debug code running in an integration test (which exercises the server from a separate JVM), first, setup a remote debugger listening on port 8000, and then run:

./gradlew :integTest -Dtest.debug=1

The test runner JVM will connect to a debugger attached to localhost:8000 before running the tests.

Additionally, it is possible to attach one debugger to the cluster JVM and another debugger to the test runner. First, make sure one debugger is listening on port 5005 and the other is listening on port 8000. Then, run:

./gradlew :integTest -Dtest.debug=1 -Dcluster.debug=1

Contributions

We appreciate and encourage contributions from the community. If you experience a bug or have a feature request, please create an issue for it. If you decide to make a contribution, please fill out the Pull Request template with as much detail as possible. Also, when creating a title for your Pull Request, please do not include a prefix such as Bug Fix:. Instead, please use the corresponding tag to label the purpose of the Pull Request.

REQUEST FOR COMMENT (RFC)

We'd like to get your comments! Please read the plugin RFC document and raise an issue to add your comments and questions.

Credits and Acknowledgments

This project uses the Apache 2.0-licensed Non-Metric Space Library. Thank you to Bilegsaikhan Naidan, Leonid Boytsov, Yury Malkov, David Novak and all those who have contributed to that project!

Code of Conduct

This project has adopted an Open Source Code of Conduct.

Security issue notifications

If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our vulnerability reporting page. Please do not create a public GitHub issue.

Licensing

See the LICENSE file for our project's licensing. We will ask you to confirm the licensing of your contribution.

Copyright

Copyright 2021 Amazon.com, Inc. or its affiliates. All Rights Reserved.

More Repositories

1

sql

🔍 Open Distro SQL Plugin
Java
620
star
2

opendistro-build

🧰 Open Distro Build Scripts
Shell
343
star
3

alerting

📟 Open Distro Alerting Plugin
Kotlin
279
star
4

sample-code

👋 Welcome to the Open Distro sample-code area. Share your great ideas and code samples with the Open Distro Community.
Python
278
star
5

performance-analyzer

📈 OpenDistro Performance Analyzer
Java
146
star
6

alerting-kibana-plugin

📟 Open Distro Kibana Alerting Plugin
JavaScript
140
star
7

index-management

🗃 Open Distro Index Management
Kotlin
115
star
8

perftop

📈 PerfTop: A client for the Open Distro Performance Analyzer
JavaScript
94
star
9

anomaly-detection

A machine learning plugin in Open Distro for real time anomaly detection on streaming data.
Java
78
star
10

job-scheduler

🕓 Open Distro Job Scheduler
Java
47
star
11

deprecated-security-advanced-modules

[DO NOT USE - DEPRECATED as of v1.4.0] Advanced modules for the Open Distro security plugin; Merged into security repo.
Java
47
star
12

anomaly-detection-kibana-plugin

A Kibana plugin providing visualizations for anomaly detection in Open Distro.
TypeScript
44
star
13

index-management-kibana-plugin

🗃 Open Distro Index Management Kibana UI plugin
TypeScript
42
star
14

kibana-reports

Kibana Reports
TypeScript
39
star
15

performance-analyzer-rca

The Performance Analyzer RCA is a framework that builds on the Performance Analyzer engine to support root cause analysis (RCA) of performance and reliability problems for Elasticsearch instances.
Java
38
star
16

data-prepper

This repository is archived. Please migrate to the active project: https://github.com/opensearch-project/data-prepper
Java
37
star
17

deprecated-security-ssl

[DO NOT USE - DEPRECATED AS OF v1.0.0] SSL module for Open Distro security plugin
Java
31
star
18

odfe-cli

A full-featured command line interface (CLI) for Open Distro.
Go
24
star
19

asynchronous-search

▶️ Asynchronous search makes it possible for users to run queries in the background, allowing users to track the progress, and retrieve partial results as they become available.
Java
23
star
20

kibana-notebooks

Open Distro Kibana Notebooks
TypeScript
21
star
21

deprecated-security-parent

[DO NOT USE - DEPRECATED as of v1.4.0] Parent repo for Open Distro Security plugin; Merged into security repo.
19
star
22

security

Java
17
star
23

cross-cluster-replication

Kotlin
15
star
24

trace-analytics

TypeScript
7
star
25

kibana-visualizations

TypeScript
4
star
26

common-utils

Open Distro Common-Utils
Java
4
star
27

security-kibana-plugin

TypeScript
3
star
28

notifications

Notifications plugin for Open Distro enables other plugins to send notifications via Email, Slack, Amazon Chime, Custom web-hook etc channels
Kotlin
3
star
29

pipe-processing-language

Piped Processing Language (PPL) for Elasticsearch
3
star