• Stars
    star
    139
  • Rank 253,486 (Top 6 %)
  • Language
    C++
  • License
    Other
  • Created about 6 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Machine learning C++ code

Machine Learning for the Elastic Stack

https://www.elastic.co/what-is/elasticsearch-machine-learning

The ml-cpp repo is a part of Machine Learning for the Elastic Stack, which is available with either a trial or platinum license for the Elastic Stack.

This repo only contains the C++ code that implements the core analytics for machine learning.

Code for integrating into Elasticsearch and source for its documentation can be found in the main elasticsearch repo.

Elastic License Functionality

Usage in production requires that you have a license key that permits use of machine learning features. See LICENSE.txt for full information.

Getting Started

To get started with Machine Learning please have a look at https://www.elastic.co/guide/en/machine-learning/current/ml-getting-started.html.

Full documentation of Machine Learning can be found at https://www.elastic.co/guide/en/machine-learning/current/index.html.

Questions/Bug Reports/Help

We are happy to help and to make sure your questions can be answered by the right people, please follow the guidelines below:

  • If you have a general question about functionality please use our discuss forums.
  • If you have a support contract please use your dedicated support channel.
  • For questions regarding subscriptions please contact us.
  • For bug reports, pull requests and feature requests specifically for machine learning analytics, please use this GitHub repository.

Contributing

Please have a look at our contributor guidelines.

Setting up a build environment

You don't need to specifically build the C++ components for machine learning as, by default, the elasticsearch build will download pre-compiled C++ artifacts.

Setting up a build environment for ml-cpp native code is complex. If you are specifically interested in working with the ml-cpp code, then information regarding setting up a build environment can be found in the build-setup directory.

To use CLion with the project, please refer to the "Using CLion" tutorial.

Building

If you do choose to build the project from the command line yourself, for all platforms, the following instructions apply:

  • From the top level of the project, source the file set_env.sh e.g.
. ./set_env.sh

When building on Windows from the native command shell that command becomes

.\set_env.bat
  • Run cmake -B cmake-build-relwithdebinfo to generate the build system under the cmake-build-relwithdebinfo directory (the --config RelWithDebInfo option may be omitted on Linux and Mac).
  • Run cmake --build cmake-build-relwithdebinfo --config RelWithDebInfo to build the libraries and the executables for the project (the --config RelWithDebInfo option may be omitted on Linux and Mac). This may take some time, to speed up the build you can tell cmake to perform a parallel build using the -j (jobs) option. e.g.
cmake --build cmake-build-relwithdebinfo -j 7
  • To build and run the unit tests run cmake --build cmake-build-relwithdebinfo -t test. Again this can be sped up somewhat by using the -j option. e.g.
cmake --build cmake-build-relwithdebinfo -t test -j 7

Running

Although the executables are designed to be run from Elasticsearch it is possible to run them from the command line. This is particularly useful when attempting to debug issues and you have an input data set sufficient to replicate the error.

The location of the executables differs depending on the platform.

  • MacOS: build/distribution/platform/darwin-x86_64/controller.app/Contents/MacOS/
  • Linux: build/distribution/platform/linux-x86_64/bin/
  • Windows: build/distribution/platform/windows-x86_64/bin/

The command line arguments will of course differ depending on which executable is being run but each has the --help option e.g. `

./build/distribution/platform/linux-x86_64/bin/autodetect --help
Usage: autodetect [options] [<fieldname>+ [by <fieldname>]]
Options::
  --help                      Display this information and exit
  --version                   Display version information and exit
  --limitconfig arg           Optional limit config file
  --modelconfig arg           Optional model config file
  --fieldconfig arg           Optional field config file
  --modelplotconfig arg       Optional model plot config file
  --jobid arg                 ID of the job this process is associated with
  --logProperties arg         Optional logger properties file
  --logPipe arg               Optional log to named pipe
  --bucketspan arg            Optional aggregation bucket span (in seconds) - 
                              default is 300
  --latency arg               Optional maximum delay for out-of-order records 
                              (in seconds) - default is 0
  --summarycountfield arg     Optional field to that contains counts for 
                              pre-summarized input - default is none
  --delimiter arg             Optional delimiter character for delimited data 
                              formats - default is '' (tab separated)
  --lengthEncodedInput        Take input in length encoded binary format - 
                              default is delimited
  --timefield arg             Optional name of the field containing the 
                              timestamp - default is 'time'
  --timeformat arg            Optional format of the date in the time field in 
                              strptime code - default is the epoch time in 
                              seconds
  --quantilesState arg        Optional file to quantiles for normalization
  --deleteStateFiles          If the 'quantilesState' option is used and this 
                              flag is set then delete the model state files 
                              once they have been read
  --input arg                 Optional file to read input from - not present 
                              means read from STDIN
  --inputIsPipe               Specified input file is a named pipe
  --output arg                Optional file to write output to - not present 
                              means write to STDOUT
  --outputIsPipe              Specified output file is a named pipe
  --restore arg               Optional file to restore state from - not present
                              means no state restoration
  --restoreIsPipe             Specified restore file is a named pipe
  --persist arg               Optional file to persist state to - not present 
                              means no state persistence
  --persistIsPipe             Specified persist file is a named pipe
  --persistInterval arg       Optional time interval at which to periodically 
                              persist model state (Mutually exclusive with 
                              bucketPersistInterval)
  --persistInForeground       Persistence occurs in the foreground. Defaults to
                              background persistence.
  --bucketPersistInterval arg Optional number of buckets after which to 
                              periodically persist model state (Mutually 
                              exclusive with persistInterval)
  --maxQuantileInterval arg   Optional interval at which to periodically output
                              quantiles if they have not been output due to an 
                              anomaly - if not specified then quantiles will 
                              only be output following a big anomaly
  --maxAnomalyRecords arg     The maximum number of records to be outputted for
                              each bucket. Defaults to 100, a value 0 removes 
                              the limit.
  --memoryUsage               Log the model memory usage at the end of the job
  --multivariateByFields      Optional flag to enable multi-variate analysis of
                              correlated by fields

Other executables exist under the devbin directory. These are not built by default. To build these you need to explicitly specify a target.

cmake --build cmake-build-relwithdebinfo -j 7 -t model_extractor

The executable is created under the cmake-build-relwithdebinfo hierarchy, so to run do

./cmake-build-relwithdebinfo/devbin/model_extractor/model_extractor --help

More Repositories

1

elasticsearch

Free and Open, Distributed, RESTful Search Engine
Java
65,029
star
2

kibana

Your window into the Elastic Stack
TypeScript
19,124
star
3

logstash

Logstash - transport and process your logs, events, or other data
Java
13,615
star
4

beats

🐠 Beats - Lightweight shippers for Elasticsearch & Logstash
Go
11,967
star
5

elasticsearch-php

Official PHP client for Elasticsearch.
PHP
5,190
star
6

elasticsearch-js

Official Elasticsearch client library for Node.js
TypeScript
5,174
star
7

go-elasticsearch

The official Go client for Elasticsearch
Go
4,933
star
8

elasticsearch-py

Official Python client for Elasticsearch
Python
4,034
star
9

elasticsearch-dsl-py

High level Python client for Elasticsearch
Python
3,695
star
10

elasticsearch-definitive-guide

The Definitive Guide to Elasticsearch
HTML
3,521
star
11

elasticsearch-net

This strongly-typed, client library enables working with Elasticsearch. It is the official client maintained and supported by Elastic.
C#
3,469
star
12

curator

Curator: Tending your Elasticsearch indices
Python
3,020
star
13

elasticsearch-rails

Elasticsearch integrations for ActiveModel/Record and Ruby on Rails
Ruby
3,017
star
14

examples

Home for Elasticsearch examples available to everyone. It's a great way to get started.
Jupyter Notebook
2,587
star
15

cloud-on-k8s

Elastic Cloud on Kubernetes
Go
2,461
star
16

elasticsearch-ruby

Ruby integrations for Elasticsearch
Ruby
1,928
star
17

elasticsearch-hadoop

🐘 Elasticsearch real-time search and analytics natively integrated with Hadoop
Java
1,915
star
18

helm-charts

You know, for Kubernetes
Python
1,807
star
19

search-ui

Search UI. Libraries for the fast development of modern, engaging search experiences.
TypeScript
1,796
star
20

logstash-forwarder

An experiment to cut logs in preparation for processing elsewhere. Replaced by Filebeat: https://github.com/elastic/beats/tree/master/filebeat
Go
1,788
star
21

detection-rules

Python
1,751
star
22

ansible-elasticsearch

Ansible playbook for Elasticsearch
Ruby
1,567
star
23

otel-profiling-agent

The production-scale datacenter profiler
Go
1,231
star
24

stack-docker

Project no longer maintained.
Shell
1,189
star
25

apm-server

APM Server
Go
1,100
star
26

ecs

Elastic Common Schema
Python
920
star
27

protections-artifacts

Elastic Security detection content for Endpoint
YARA
848
star
28

ember

Elastic Malware Benchmark for Empowering Researchers
Jupyter Notebook
799
star
29

elasticsearch-docker

Official Elasticsearch Docker image
Python
790
star
30

elasticsearch-rs

Official Elasticsearch Rust Client
Rust
612
star
31

elasticsearch-cloud-aws

AWS Cloud Plugin for Elasticsearch
580
star
32

apm-agent-dotnet

Elastic APM .NET Agent
C#
540
star
33

apm-agent-nodejs

Elastic APM Node.js Agent
JavaScript
540
star
34

apm-agent-java

Elastic APM Java Agent
Java
536
star
35

eland

Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
Python
516
star
36

elasticsearch-mapper-attachments

Mapper Attachments Type plugin for Elasticsearch
Java
503
star
37

elasticsearch-servicewrapper

A service wrapper on top of elasticsearch
Shell
489
star
38

apm-agent-go

Official Go agent for Elastic APM
Go
390
star
39

sense

A JSON aware developer's interface to Elasticsearch. Comes with handy machinery such as syntax highlighting, autocomplete, formatting and code folding.
JavaScript
382
star
40

apm-agent-python

Official Python agent for Elastic APM
Python
381
star
41

elastic-charts

πŸ“Š Elastic Charts library
TypeScript
362
star
42

stream2es

Stream data into ES (Wikipedia, Twitter, stdin, or other ESes)
Clojure
356
star
43

timelion

Timelion was absorbed into Kibana 5. Don't use this. Time series composer for Elasticsearch and beyond.
JavaScript
347
star
44

elasticsearch-labs

Notebooks & Example Apps for Search & AI Applications with Elasticsearch
Jupyter Notebook
341
star
45

apm

Elastic Application Performance Monitoring - resources and general issue tracking for Elastic APM.
Gherkin
317
star
46

elasticsearch-net-example

A tutorial repository for Elasticsearch and NEST
305
star
47

elasticsearch-migration

This plugin will help you to check whether you can upgrade directly to the next major version of Elasticsearch, or whether you need to make changes to your data and cluster before doing so.
291
star
48

logstash-docker

Official Logstash Docker image
Python
286
star
49

elasticsearch-py-async

Backend for elasticsearch-py based on python's asyncio module.
Python
283
star
50

support-diagnostics

Support diagnostics utility for elasticsearch and logstash
Java
278
star
51

elasticsearch-java

Official Elasticsearch Java Client
Java
274
star
52

es2unix

Command-line ES
Clojure
274
star
53

elasticsearch-analysis-smartcn

Smart Chinese Analysis Plugin for Elasticsearch
268
star
54

dockerfiles

Dockerfiles for the official Elastic Stack images
Shell
253
star
55

go-sysinfo

go-sysinfo is a library for collecting system information.
Go
249
star
56

kibana-docker

Official Kibana Docker image
Python
243
star
57

elasticsearch-metrics-reporter-java

Metrics reporter, which reports to elasticsearch
Java
232
star
58

apm-agent-php

Elastic APM PHP Agent
PHP
229
star
59

docs

Ruby
229
star
60

elasticsearch-river-twitter

Twitter River Plugin for elasticsearch (STOPPED)
Java
202
star
61

elasticsearch-formal-models

Formal models of core Elasticsearch algorithms
Isabelle
200
star
62

rally-tracks

Track specifications for the Elasticsearch benchmarking tool Rally
Python
197
star
63

beats-dashboards

DEPRECATED. Moved to https://github.com/elastic/beats. Please use the new repository to add new issues.
Shell
192
star
64

elasticsearch-analysis-icu

ICU Analysis plugin for Elasticsearch
189
star
65

elasticsearch-river-rabbitmq

RabbitMQ River Plugin for elasticsearch (STOPPED)
Java
173
star
66

elasticsearch-analysis-kuromoji

Japanese (kuromoji) Analysis Plugin
168
star
67

terraform-provider-ec

Terraform provider for the Elasticsearch Service and Elastic Cloud Enterprise
Go
165
star
68

beats-docker

Official Beats Docker images
Python
165
star
69

elasticsearch-river-couchdb

CouchDB River Plugin for elasticsearch (STOPPED)
Java
163
star
70

apm-agent-ruby

Elastic APM agent for Ruby
Ruby
156
star
71

integrations

Elastic Integrations
Handlebars
155
star
72

require-in-the-middle

Module to hook into the Node.js require function
JavaScript
149
star
73

harp

Secret management by contract toolchain
Go
143
star
74

dorothy

Dorothy is a tool to test security monitoring and detection for Okta environments
Python
141
star
75

ecs-logging-java

Centralized logging for Java applications with the Elastic stack made easy
Java
137
star
76

SWAT

Simple Workspace Attack Tool (SWAT) is a tool for simulating malicious behavior against Google Workspace in reference to the MITRE ATT&CK framework.
Python
135
star
77

go-libaudit

go-libaudit is a library for communicating with the Linux Audit Framework.
Go
133
star
78

ansible-beats

Ansible Beats Role
Ruby
131
star
79

logstash-contrib

THIS REPOSITORY IS NO LONGER USED.
Ruby
128
star
80

elasticsearch-analysis-phonetic

Phonetic Analysis Plugin for Elasticsearch
127
star
81

azure-marketplace

Elasticsearch Azure Marketplace offering + ARM template
Shell
122
star
82

bpfcov

Source-code based coverage for eBPF programs actually running in the Linux kernel
C
115
star
83

anonymize-it

a general utility for anonymizing data
Python
114
star
84

windows-installers

Windows installers for the Elastic stack
C#
113
star
85

terraform-provider-elasticstack

Terraform provider for Elastic Stack
Go
111
star
86

makelogs

JavaScript
108
star
87

golang-crossbuild

Shell
107
star
88

elasticsearch-lang-python

Python language Plugin for elasticsearch
104
star
89

elastic-agent

Elastic Agent - single, unified way to add monitoring for logs, metrics, and other types of data to a host.
Go
102
star
90

go-freelru

GC-less, fast and generic LRU hashmap library for Go
Go
101
star
91

elasticsearch-lang-javascript

JavaScript language Plugin for elasticsearch
93
star
92

stack-docs

Elastic Stack Documentation
Java
92
star
93

elasticsearch-specification

Elasticsearch full specification
TypeScript
89
star
94

elasticsearch-perl

Official Perl low-level client for Elasticsearch.
Perl
87
star
95

next-eui-starter

Start building Kibana protoypes quickly with the Next.js EUI Starter
TypeScript
87
star
96

vue-search-ui-demo

A demo of implementing Elastic's Search UI and App Search using Vue.js
Vue
87
star
97

elasticsearch-transport-thrift

Thrift Transport for elasticsearch (STOPPED)
Java
84
star
98

ecs-dotnet

.NET integrations that use the Elastic Common Schema (ECS)
HTML
82
star
99

generator-kibana-plugin

DEPRECATED Yeoman Generator for Kibana Plugins, please use https://github.com/elastic/template-kibana-plugin/
JavaScript
79
star
100

hipio

A DNS server that parses a domain for an IPv4 Address
Haskell
76
star