• Stars
    star
    351
  • Rank 116,464 (Top 3 %)
  • Language
    Go
  • Created about 5 years ago
  • Updated about 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Kubernetes Operator for Elasticsearch

Elasticsearch Operator

Build Status Coverage Status GitHub release go-doc Go Report Card

This is an operator for running Elasticsearch in Kubernetes with focus on operational aspects, like safe draining and offering auto-scaling capabilities for Elasticsearch data nodes, rather than just abstracting manifest definitions.

License

Starting with v0.1.3 the ES-Operator is dual-licensed under MIT and Apache-2.0 license. You can choose between one of them if you use this work.

SPDX-License-Identifier: MIT OR Apache-2.0

Compatibility

The ES-Operator has been tested with Elasticsearch 7.x and 8.x. Previously, we have also tested ES-Operator with Elasticsearch 6.x, and while it still may be working, please consider this support to be dropped.

How it works

The operator works by managing custom resources called ElasticsearchDataSets (EDS). They are basically a thin wrapper around StatefulSets. One EDS represents a common group of Elasticsearch data nodes. When applying an EDS manifest the operator will create and manage a corresponding StatefulSet.

Do not operate manually on the StatefulSet. The operator is supposed to own this resource on your behalf.

Key features

  • It can scale in two dimensions, shards per node and number of replicas for the indices on that dataset.
  • It works within scaling dimensions known to and long-term tested by teams in Zalando.
  • Target CPU ratio is a safe and well-known metric to scale on in order to avoid latency spikes caused by Garbage Collection.
  • In case of emergency, manual scaling is possible by disabling the auto-scaling feature.

Getting Started

For a quick tutorial how to deploy the ES Operator look at our Getting Started Guide.

Custom Resource

Full Example

apiVersion: zalando.org/v1
kind: ElasticsearchDataSet
spec:
  replicas: 2
  skipDraining: false
  scaling:
    enabled: true
    minReplicas: 1
    maxReplicas: 99
    minIndexReplicas: 2
    maxIndexReplicas: 3
    minShardsPerNode: 2
    maxShardsPerNode: 6
    scaleUpCPUBoundary: 50
    scaleUpThresholdDurationSeconds: 900
    scaleUpCooldownSeconds: 3600
    scaleDownCPUBoundary: 25
    scaleDownThresholdDurationSeconds: 1800
    scaleDownCooldownSeconds: 3600
    diskUsagePercentScaledownWatermark: 80

Custom resource properties

Key Description Type
spec.replicas Initial size of the StatefulSet. If auto-scaling is disabled, this is your desired cluster size. Int
spec.excludeSystemIndices Enable or disable inclusion of system indices like '.kibana' when calculating shard-per-node ratio and scaling index replica counts. Those are usually managed by Elasticsearch internally. Default is false for backwards compatibility Boolean
spec.skipDraining Allows the ES Operator to terminate an Elasticsearch node without re-allocating its data. This is useful for persistent disk setups, like EBS volumes. Beware that the ES Operator does not verify that you have more than one copy of your indices and therefore wouldn't protect you from potential data loss. (default=false) Boolean
spec.scaling.enabled Enable or disable auto-scaling. May be necessary to enforce manual scaling. Boolean
spec.scaling.minReplicas Minimum Pod replicas. Lower bound (inclusive) when scaling down. Int
spec.scaling.maxReplicas Maximum Pod replicas. Upper bound (inclusive) when scaling up. Int
spec.scaling.minIndexReplicas Minimum index replicas. Lower bound (inclusive) when reducing index copies. (reminder: total copies is replicas+1 in Elasticsearch) Int
spec.scaling.maxIndexReplicas Maximum index replicas. Upper bound (inclusive) when increasing index copies. Int
spec.scaling.minShardsPerNode Minimum shard per node ratio. When reached, scaling up also requires adding more index replicas. Int
spec.scaling.maxShardsPerNode Maximum shard per node ratio. Boundary for scaling down. Int
spec.scaling.scaleUpCPUBoundary (Median) CPU consumption/request ratio to consistently exceed in order to trigger scale up. Int
spec.scaling.scaleUpThresholdDurationSeconds Duration in seconds required to meet the scale-up criteria before scaling. Int
spec.scaling.scaleUpCooldownSeconds Minimum duration in seconds between two scale up operations. Int
spec.scaling.scaleDownCPUBoundary (Median) CPU consumption/request ratio to consistently fall below in order to trigger scale down. Int
spec.scaling.scaleDownThresholdDurationSeconds Duration in seconds required to meet the scale-down criteria before scaling. Int
spec.scaling.scaleDownCooldownSeconds Minimum duration in seconds between two scale-down operations. Int
spec.scaling.diskUsagePercentScaledownWatermark If disk usage on one of the nodes exceeds this threshold, scaling down will be prevented. Float
status.lastScaleUpStarted Timestamp of start of last scale-up activity Timestamp
status.lastScaleUpEnded Timestamp of end of last scale-up activity Timestamp
status.lastScaleDownStarted  Timestamp of start of last scale-down activity Timestamp
status.lastScaleDownEnded  Timestamp of end of last scale-down activity Timestamp

How it scales

The operator will collect the median CPU consumption from all Pods of the EDS every 60 seconds. Based on the data it will decide if scale-up or scale-down is necessary. For this to happen all samples within the given period need to meet the configured scaling threshold.

The actual calculation of how many resources to allocate is based on the idea of managing the shard-per-node ratio inside the cluster. Scaling out decreases the shard-to-node ratio, increasing available resources per index, while scaling in increases the shard-to-node ratio. We rely on auto-rebalancing of Elasticsearch to ensure this ratio is equally distributed among the nodes.

At a certain point it's not feasible to only add more nodes. This can be the case if you already reached the lower bound of one shard per node. In other cases you may want to increase concurrent capacity for an index. Consequently the operator is able to add index replicas when scaling out, and removing them before scaling in again. All you need to do is define the upper and lower bound of shards per node.

Example 1

  • One index with 6 shards. minReplicas = 2, maxReplicas=4, minShardsPerNode=1, maxShardsPerNode=3, targetCPU: 40%
  • initial, minimal deployment: 3 copies of index x 6 shards = 18 shards / 3-per-node => 6 nodes
  • Mean cpu-utilization exceeds 40% for more than 20 minutes => scale-up. First by decreasing the shards-per-node ratio to 2: 18 shards / 2 per-node => 9 nodes
  • Mean cpu-utilization exceeds 40% for more than 20 minutes => scale-up by decreasing shard-per-node ratio to 1: 18 shards / 1 per node => 18 nodes
  • Mean cpu-utilization exceeds 40% for more than 20 minutes => scale-up by increasing replica count to 3: 24 shards / 1 per node => 24 nodes
  • Mean cpu-utilization exceeds 40% for more than 20 minutes => scale-up by increasing replica count to 4: 36 shards / 1 per node => 36 nodes
  • No more further scale-up (safety net to avoid cost explosion)
  • Scale-down in reverse order. So, if expected average CPU utilization would be below 40%, e.g. current=20%, expected=20%/24*36=30% => decrease replica count to 3: 24 shards / 1 per node => 24 nodes
  • etc....

Example 2

  • Four indices with 6 shards. minReplicas = 2, maxReplicas=3, minShardsPerNode=2, maxShardsPerNode=4, targetCPU: 40%
  • Initial, minimal deployment: 3 copies x 4 indices x 6 shards = 72 shards / 4-per-node => 18 nodes
  • Mean cpu-utilization exceeds 40% for more than 20 minutes => scale-up. First by decreasing the shards-per-node ratio to 3: 72 shards / 3 per-node => 24 nodes
  • Mean cpu-utilization exceeds 40% for more than 20 minutes => scale-up by decreasing the shards-per-node ratio to 2: 72 shards / 2 per-node => 36 nodes
  • Mean cpu-utilization exceeds 40% for more than 20 minutes => scale-up by increasing replicas. 4 copies x 4 indices x 6 shards = 96 shards / 2 per node => 48 nodes

Scale-up operation

  • If scale-up requires increase of replicas, disable shard-rebalancing
  • Calculate required Pod count by retrieving the current indices, their shard counts and current replica vs. desired replica count counts.
  • Scale up by updating spec.Replicas and start the resource reconciliation process.
  • If scale-up requires increase of replicas, wait for the StatefulSet to stabilize before updating index.number_of_replicas on Elasticsearch.

Scale-down operation

  • Calculate required Pod count by retrieving the current indices, their shard count and current replica vs. desired replica count count.
  • If scale-down requires decrease of replicas, update index.number_of_replicas on each index
  • Scale down

Draining and rolling restarts

The operator will poll for all managed Pods and determine if any of the Pods needs to be drained/updated. It determines if updates are needed based on the following logic and priority:

  1. Pods already marked draining should be drained to completion and be deleted.
  2. Pods on a priority node (e.g. a node about to be terminated) should be drained.
  3. Pods not up to date with StatefulSet revision gets should be drained.

If multiple Pods needs to be updated the update is done based on the above priority where '1' is the highest.

What it does not do

The operator does not manage Elasticsearch master nodes. You can create them on your own, most likey using a standard deployment or a StatefulSet manifest.

Building

This project uses Go modules as introduced in Go 1.11 therefore you need Go >=1.11 installed in order to build. If using Go 1.11 you also need to activate Module support.

Assuming Go has been setup with module support it can be built simply by running:

export GO111MODULE=on # needed if the project is checked out in your $GOPATH.
$ make

Running

The es-operator can be run as a deployment in the cluster. See es-operator.yaml for an example.

By default the operator will manage all ElasticsearchDataSets in the cluster but you can limit it to a certain resources by setting the --operator-id and/or --namespace options.

When the operator is run with --operator-id=my-operator it will only manage ElasticseachDataSets which has the following annotation set:

metadata:
  annotations:
    es-operator.zalando.org/operator: my-operator

Operators which doesn't run with the --operator-id flag will only operate on resources which doesn't have the annotation.

When it's run with --namespace=my-namespace it will only manage resources in the my-namespace namespace.

Can be deployed just by running:

$ kubectl apply -f docs/es-operator.yaml

Running locally

The operator can be run locally and operate on a remote cluster making it simpler to iterate during development.

To run it locally you need to run kubectl proxy in one shell, and then you can start the operator with the following flags:

$ ./build/es-operator \
  --priority-node-selector=lifecycle-status=ready \
  --apiserver=http://127.0.0.1:8001 \
  --operator-id=my-operator \
  --elasticsearch-endpoint=http://127.0.0.1:8001/api/v1/namespaces/default/services/elasticsearch:9200/proxy

This assumes that the elasticsearch-endpoint is exposed via a service running in the default namespace. This uses the kube-apiserver proxy functionality to proxy requests to the Elasticsearch cluster.

Other alternatives

We are not the only ones providing an Elasticsearch operator for Kubernetes. Here are some alternatives you might want to look at.

  • upmc-enterprises/elasticsearch-operator - offers a higher level abstraction of the custom resource definition of an Elasticsearch cluster, snapshotting support, but to our knowledge no scaling support and no draining of nodes.
  • jetstack/navigator - operator that can handle both Cassandra and Elasticsearch clusters, but doesn't offer auto-scaling or draining of nodes.

More Repositories

1

graphql-jit

GraphQL execution using a JIT compiler
TypeScript
1,027
star
2

kopf

A Python framework to write Kubernetes operators in just few lines of code.
Python
971
star
3

kubernetes-on-aws

Deploying Kubernetes on AWS with CloudFormation and Ubuntu
Go
614
star
4

kube-metrics-adapter

General purpose metrics adapter for Kubernetes HPA metrics
Go
482
star
5

kube-ingress-aws-controller

Configures AWS Load Balancers according to Kubernetes Ingress resources
Go
374
star
6

hexo-theme-doc

A documentation theme for the Hexo blog framework
JavaScript
243
star
7

cluster-lifecycle-manager

Cluster Lifecycle Manager (CLM) to provision and update multiple Kubernetes clusters
Go
227
star
8

docker-locust

Docker image for the Locust.io open source load testing tool
Python
201
star
9

remora

Kafka consumer lag-checking application for monitoring, written in Scala and Akka HTTP; a wrap around the Kafka consumer group command. Integrations with Cloudwatch and Datadog. Authentication recently added
Scala
197
star
10

stackset-controller

Opinionated StackSet resource for managing application life cycle and traffic switching in Kubernetes
Go
168
star
11

tessellate

Server-side React render service.
JavaScript
152
star
12

kube-aws-iam-controller

Distribute different AWS IAM credentials to different pods in Kubernetes via secrets.
Go
152
star
13

transformer

A tool to transform/convert web browser sessions (HAR files) into Locust load testing scenarios (locustfile).
Python
98
star
14

bro-q

Chrome Extension for JSON formatting and jq filtering in your browser.
TypeScript
83
star
15

spark-json-schema

JSON schema parser for Apache Spark
Scala
79
star
16

catwatch

A metrics dashboard for GitHub organizations, with results accessible via REST API
Java
59
star
17

authmosphere

A library to support OAuth2 workflows in JavaScript projects
TypeScript
54
star
18

flatjson

A fast JSON parser (and builder)
Java
45
star
19

banknote

A simple JavaScript libary for formatting currency amounts according to Unicode CLDR standards
JavaScript
45
star
20

perron

A sane node.js client for web services
JavaScript
43
star
21

zelt

A command-line tool for orchestrating the deployment of Locust in Kubernetes.
Python
36
star
22

hexo-theme-doc-seed

skeleton structure for a documentation website using Hexo and the hexo-doc-theme
29
star
23

kubernetes-log-watcher

Kubernetes log watcher for Scalyr and AppDynamics
Python
27
star
24

new-project

Template to use when creating a new open source project. It comes with all the standard files which there is expected to be in an open source project on Github.
23
star
25

darty

Data dependency manager
Python
22
star
26

chisel

⚒️ collection of awesome practices for putting things on pedestal
Clojure
20
star
27

fabric-gateway

An API Gateway built on the Skipper Ingress Controller https://github.com/zalando/skipper
Scala
17
star
28

roadblock

A node.js application for pulling github organisation statistics into a database.
JavaScript
16
star
29

ember-dressy-table

An ember addon for dynamic tables
JavaScript
10
star
30

zalando.github.io-dev

The zalando.github.io open-source metrics dashboard
JavaScript
10
star
31

atlas-js-core

JavaScript SDK Core for Zalando Checkout, Guest Checkout, and Catalog APIs
JavaScript
9
star
32

opentracing-sqs-java

An attempt at a simple SQS helper library for OpenTracing support.
Java
8
star
33

clin

Cli for Nakadi for event types and subscriptions management
Python
7
star
34

play-etcd-watcher

Instantaneous etcd directory listener for Scala Play
Scala
6
star
35

Zincr

Zincr is a Github bot built with Probot to enforce approvals, specification and licensing checks
TypeScript
5
star
36

jzon

Apis for working with json
Java
5
star
37

Trafficlight

Node.js CLI for creating and migrating Github projects, ensuring that it follows a consistent model for permissions, teams and boilerplate files.
JavaScript
1
star