• Stars
    star
    227
  • Rank 169,383 (Top 4 %)
  • Language
    Go
  • License
    MIT License
  • Created about 6 years ago
  • Updated about 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Cluster Lifecycle Manager (CLM) to provision and update multiple Kubernetes clusters

Cluster Lifecycle Manager (CLM)

Build Status Coverage Status Go Report Card

The Cluster Lifecycle Manager (CLM) is a component responsible for operating (create, update, delete) Kubernetes clusters. It interacts with a Cluster Registry and a configuration source from which it reads information about the clusters and keep them up to date with the latest configuration.

clm

The CLM is designed to run either as a CLI tool for launching clusters directly from your development machine, or as a controller running as a single instance operating many clusters.

It is designed in a reentrant way meaning it can be killed at any point in time and it will just continue any cluster updates from where it left off. All state is stored in the Cluster Registry and the git configuration repository.

For a better understanding on how we use the CLM within Zalando, see the 2018 KubeCon EU talk:

Current state

The CLM has been developed internally at Zalando since January 2017. It's currently used to operate 80+ clusters on AWS where the oldest clusters has been continuously updated all the way from Kubernetes v1.4 to Kubernetes v1.9 by the CLM.

It is currently tightly coupled with our production cluster configuration, but by making it Open Source and developing it in the open going forward we aim to make the CLM useful as a generic solution for operating Kubernetes clusters at scale.

Features

  • Automatically trigger cluster updates based on changes to a Cluster Registry defined either as an HTTP REST API or a yaml file.
  • Automatically trigger cluster updates based on configuration changes, where configuration is stored in a remote git repository or a local directory.
  • Perform Non-disruptive Rolling Updates of nodes in a cluster especially with respect to stateful applications.
  • Declarative deletion of decommissioned cluster resources.

How to build it

This project uses Go modules as introduced in Go 1.11 therefore you need Go >=1.11 installed in order to build. If using Go 1.11 you also need to activate Module support.

Assuming Go has been setup with module support it can be built simply by running:

export GO111MODULE=on # needed if the project is checked out in your $GOPATH.
$ make

How to run it

To run CLM you need to provide at least the following information:

  • URI to a registry --registry either a file path or a url to a cluster registry.
  • A $TOKEN used for authenticating with the target Kubernetes cluster once it has been provisioned (the $TOKEN is an assumption of the Zalando setup, we should support a generic kubeconfig in the future).
  • URL to repository containing the configuration --git-repository-url or, in alternative, a directory --directory

Run CLM locally

To run CLM locally you can use the following command. This assumes valid AWS credentials on your machine e.g. in ~/.aws/credentials.

$ ./build/clm provision \
  --registry=clusters.yaml \
  --token=$TOKEN \
  --directory=/path/to/configuration-folder \
  --debug

The provision command does a cluster create or update depending on whether the cluster already exists. The other command is decommission which terminates the cluster.

The clusters.yaml is of the following format:

clusters:
- id: cluster-id
  alias: alias-for-cluster-id # human readable alias
  local_id: local-cluster-id  # used for separating clusters in the same AWS account
  api_server_url: https://kube-api.example.org
  config_items:
    custom_config_item: value # custom key/value config items
  criticality_level: 1
  environment: test
  infrastructure_account: "aws:12345678910" # AWS account ID
  region: eu-central-1
  provider: zalando-aws
  node_pools:
  - name: master-default
    profile: master-default
    min_size: 2
    max_size: 2
    instance_type: m5.large
    discount_strategy: none
  - name: worker-default
    profile: worker-default
    min_size: 3
    max_size: 20
    instance_type: m5.large
    discount_strategy: none

Deletions

By default the Cluster Lifecycle Manager will just apply any manifest defined in the manifests folder. In order to support deletion of deprecated resources the CLM will read a deletions.yaml file of the following format:

pre_apply: # everything defined under here will be deleted before applying the manifests
- name: mate
  namespace: kube-system
  kind: deployment
- name: with-options
  namespace: kube-system
  kind: deployment
  propagation_policy: Orphan
  grace_period_seconds: 10
- name: orphan-replicasets
  namespace: kube-system
  kind: ReplicaSet
  labels:
    foo: bar
  has_owner: false
- namespace: kube-system
  kind: deployment
  selector: version != v1
post_apply: # everything defined under here will be deleted after applying the manifests
- namespace: kube-system
  kind: deployment
  labels:
    application: external-dns
    version: "v1.0"

Whatever is defined in this file will be deleted pre/post applying the other manifest files, if the resource exists. If the resource has already been deleted previously it's treated as a no-op.

A resource can be identified either by name, selector or labels and only one of them should be defined.

namespace can be left out, in which case it will default to kube-system.

kind must be one of the kinds defined in kubectl get.

An optional boolean has_owner may be specified to narrow down resources identified by the labels:

  • has_owner: true selects resources with non-empty metadata.ownerReferences
  • has_owner: false selects resources with empty metadata.ownerReferences

It is possible to specify deletion options via optional:

  • propagation_policy - one of "Orphan", "Background" or "Foreground" - corresponds to kubectl delete --cascade flag
  • grace_period_seconds - corresponds to kubectl delete --grace-period flag

Configuration defaults

CLM will look for a config-defaults.yaml file in the cluster configuration directory. If the file exists, it will be evaluated as a Go template with all the usual CLM variables and functions available, and the resulting output will be parsed as a simple key-value map. CLM will use the contents of the file to populate the cluster's configuration items, taking care not to overwrite the existing ones.

For example, you can use the defaults file to have different settings for production and test clusters, while keeping the manifests readable:

  • config-defaults.yaml:

    {{ if eq .Environment "production"}}
    autoscaling_buffer_pods: "3"
    {{else}}
    autoscaling_buffer_pods: "0"
    {{end}}
  • manifests/example/example.yaml:

    โ€ฆ
    spec:
      replicas: {{.ConfigItems.autoscaling_buffer_pods}}
    โ€ฆ

A Note on Using Multiple Config Sources

The CLM supports specifying multiple config-sources on the command line, see here. It's important to understand that the order in which these are provided is important. If the same config-item exists in multiple sources, the value in the source specified later will override the one defined in the earlier source. For instance, consider a CLM deployment that contains the following arguments:

...
          - --config-source=source1:git:example.domain-1
          - --config-source=source2:example.domain-2
          - --config-source=source3:git:example.domain-3
...

Then for any config-item, the value from source3 will be the final value and in case of it existing in the sources specified earlier, the value will be overridden by the value in source3. This is the intended behavior of the CLI flags i.e. when multiple values are specified for a flag, they are appended, in order, to a slice. Later, in the CLM code, when these are merged, the order of the source names in the slice is respected.

Non-disruptive rolling updates

One of the main features of the CLM is the update strategy implemented which is designed to do rolling node updates which are non-disruptive for workloads running in the target cluster. Special care is taken to support stateful applications.

More Repositories

1

graphql-jit

GraphQL execution using a JIT compiler
TypeScript
1,027
star
2

kopf

A Python framework to write Kubernetes operators in just few lines of code.
Python
971
star
3

kubernetes-on-aws

Deploying Kubernetes on AWS with CloudFormation and Ubuntu
Go
614
star
4

kube-metrics-adapter

General purpose metrics adapter for Kubernetes HPA metrics
Go
482
star
5

kube-ingress-aws-controller

Configures AWS Load Balancers according to Kubernetes Ingress resources
Go
374
star
6

es-operator

Kubernetes Operator for Elasticsearch
Go
351
star
7

hexo-theme-doc

A documentation theme for the Hexo blog framework
JavaScript
243
star
8

docker-locust

Docker image for the Locust.io open source load testing tool
Python
201
star
9

remora

Kafka consumer lag-checking application for monitoring, written in Scala and Akka HTTP; a wrap around the Kafka consumer group command. Integrations with Cloudwatch and Datadog. Authentication recently added
Scala
197
star
10

stackset-controller

Opinionated StackSet resource for managing application life cycle and traffic switching in Kubernetes
Go
168
star
11

tessellate

Server-side React render service.
JavaScript
152
star
12

kube-aws-iam-controller

Distribute different AWS IAM credentials to different pods in Kubernetes via secrets.
Go
152
star
13

transformer

A tool to transform/convert web browser sessions (HAR files) into Locust load testing scenarios (locustfile).
Python
98
star
14

bro-q

Chrome Extension for JSON formatting and jq filtering in your browser.
TypeScript
83
star
15

spark-json-schema

JSON schema parser for Apache Spark
Scala
79
star
16

catwatch

A metrics dashboard for GitHub organizations, with results accessible via REST API
Java
59
star
17

authmosphere

A library to support OAuth2 workflows in JavaScript projects
TypeScript
54
star
18

flatjson

A fast JSON parser (and builder)
Java
45
star
19

banknote

A simple JavaScript libary for formatting currency amounts according to Unicode CLDR standards
JavaScript
45
star
20

perron

A sane node.js client for web services
JavaScript
43
star
21

zelt

A command-line tool for orchestrating the deployment of Locust in Kubernetes.
Python
36
star
22

hexo-theme-doc-seed

skeleton structure for a documentation website using Hexo and the hexo-doc-theme
29
star
23

kubernetes-log-watcher

Kubernetes log watcher for Scalyr and AppDynamics
Python
27
star
24

new-project

Template to use when creating a new open source project. It comes with all the standard files which there is expected to be in an open source project on Github.
23
star
25

darty

Data dependency manager
Python
22
star
26

chisel

โš’๏ธ collection of awesome practices for putting things on pedestal
Clojure
20
star
27

fabric-gateway

An API Gateway built on the Skipper Ingress Controller https://github.com/zalando/skipper
Scala
17
star
28

roadblock

A node.js application for pulling github organisation statistics into a database.
JavaScript
16
star
29

ember-dressy-table

An ember addon for dynamic tables
JavaScript
10
star
30

zalando.github.io-dev

The zalando.github.io open-source metrics dashboard
JavaScript
10
star
31

atlas-js-core

JavaScript SDK Core for Zalando Checkout, Guest Checkout, and Catalog APIs
JavaScript
9
star
32

opentracing-sqs-java

An attempt at a simple SQS helper library for OpenTracing support.
Java
8
star
33

clin

Cli for Nakadi for event types and subscriptions management
Python
7
star
34

play-etcd-watcher

Instantaneous etcd directory listener for Scala Play
Scala
6
star
35

Zincr

Zincr is a Github bot built with Probot to enforce approvals, specification and licensing checks
TypeScript
5
star
36

jzon

Apis for working with json
Java
5
star
37

Trafficlight

Node.js CLI for creating and migrating Github projects, ensuring that it follows a consistent model for permissions, teams and boilerplate files.
JavaScript
1
star