• This repository has been archived on 17/Nov/2022
  • Stars
    star
    313
  • Rank 133,714 (Top 3 %)
  • Language
    Go
  • License
    Apache License 2.0
  • Created about 7 years ago
  • Updated almost 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Tries to move K8s Pods from on-demand to spot instances

K8s Spot Rescheduler

NOTE: this repository is currently UNMAINTAINED and is looking for new owner(s). See #74 for more information.

Table of contents

Introduction

K8s Spot rescheduler is a tool that tries to reduce load on a set of Kubernetes nodes. It was designed with the purpose of moving Pods scheduled on AWS on-demand instances to AWS spot instances to allow the on-demand instances to be safely scaled down (By the Cluster Autoscaler).

In reality the rescheduler can be used to remove load from any group of nodes onto a different group of nodes. They just need to be labelled appropriately.

For example, it could also be used to allow controller nodes to take up slack while new nodes are being scaled up, and then rescheduling those pods when the new capacity becomes available, thus reducing the load on the controllers once again.

Attribution

This project was inspired by the Critical Pod Rescheduler and takes portions of code from both the Critical Pod Rescheduler and the Cluster Autoscaler.

Motivation

AWS spot instances are a great way to reduce the cost of your infrastructure running costs. They do however come with a significant drawback; at any point, the spot price for the instances you are using could rise above your bid and your instances will be terminated. To solve this problem, you can use an AutoScaling group backed by on-demand instances and managed by the Cluster Autoscaler to take up the slack when spot instances are removed from your cluster.

The problem however, comes when the spot price drops and you are given new spot instances back into your cluster. At this point you are left with empty spot instances and full, expensive on-demand instances.

By tainting the on-demand instances with the Kubernetes PreferNoSchedule taint, we can ensure that, if at any point the scheduler needs to choose between spot and on-demand instances, it will choose the preferred spot instances to schedule the new Pods onto.

However, the scheduler won't reschedule Pods that are already running on on-demand instances, blocking them from being scaled down. At this point, the K8s Spot Rescheduler is required to start the process of moving Pods from the on-demand instances back onto the spot instances.

Usage

Deploy to Kubernetes

A docker image is available at quay.io/pusher/k8s-spot-rescheduler. These images are currently built on pushes to master. Releases will be tagged as and when releases are made.

Sample Kubernetes manifests are available in the deploy folder.

To deploy in clusters using RBAC, please apply all of the manifests (Deployment, ClusterRole, ClusterRoleBinding and ServiceAccount) in the deploy folder but uncomment the serviceAccountName in the deployment

Requirements

For the K8s Spot Rescheduler to process nodes as expected; you will need identifying labels which can be passed to the program to allow it to distinguish which nodes it should consider as on-demand and which it should consider as spot instances.

For instance you could add labels node-role.kubernetes.io/worker and node-role.kubernetes.io/spot-worker to your on-demand and spot instances respectively.

You should also add the PreferNoSchedule taint to your on-demand instances to ensure that the scheduler prefers spot instances when making it's scheduling decisions.

For example you could add the following flags to your Kubelet:

--register-with-taints="node-role.kubernetes.io/worker=true:PreferNoSchedule"
--node-labels="node-role.kubernetes.io/worker=true"

Building

If you wish to build the binary yourself; first make sure you have go installed and set up. Then clone this repo into your $GOPATH and download the dependencies using dep.

cd $GOPATH/src/github.com # Create this directory if it doesn't exist
git clone [email protected]:pusher/k8s-spot-rescheduler pusher/k8s-spot-rescheduler
dep ensure -v # Installs dependencies to vendor folder.

Then build the code using go build which will produce the built binary in a file k8s-spot-rescheduler.

Flags

-v (default: 0): The log verbosity level the program should run in, currently numeric with values between 2 & 4, recommended to use -v=2

--running-in-cluster (default: true): Optional, if this controller is running in a kubernetes cluster, use the pod secrets for creating a Kubernetes client.

--namespace (deafult: kube-system): Namespace in which k8s-spot-rescheduler is run.

--kube-api-content-type (default: application/vnd.kubernetes.protobuf): Content type of requests sent to apiserver.

--housekeeping-interval (default: 10s): How often rescheduler takes actions.

--node-drain-delay (default: 10m): How long the scheduler should wait between draining nodes.

--pod-eviction-timeout (default: 2m): How long should the rescheduler attempt to retrieve successful pod evictions for.

--max-graceful-termination (default: 2m): How long should the rescheduler wait for pods to shutdown gracefully before failing the node drain attempt.

--listen-address (default: localhost:9235): Address to listen on for serving prometheus metrics.

--on-demand-node-label (default: node-role.kubernetes.io/worker) Name of label on nodes to be considered for draining.

--spot-node-label (default: node-role.kubernetes.io/spot-worker) Name of label on nodes to be considered as targets for pods.

--delete-non-replicated-pods (default: false) Delete non-replicated pods running on on-demand instance. Note that some non-replicated pods will not be rescheduled.

Scope of the project

Does

  • Look for Pods on on-demand instances
  • Look for space for Pods on spot instances
  • Checks the following predicates when determining whether a pod can be moved:
    • CheckNodeMemoryPressure
    • CheckNodeDiskPressure
    • GeneralPredicates
    • MaxAzureDiskVolumeCount
    • MaxGCEPDVolumeCount
    • NoDiskConflict
    • MatchInterPodAffinity
    • PodToleratesNodeTaints
    • MaxEBSVolumeCount
    • NoVolumeZoneConflict
    • ready
  • Checks whether there is enough capacity to move all pods on the on-demand node to spot nodes
  • Evicts all pods on the node if the previous check passes
  • Leaves the node in a schedulable state - in case it's capacity is required again

Does not

  • Schedule pods (The default scheduler handles this)
  • Scale down empty nodes on your cloud provider (Try the Cluster Autoscaler)

Operating logic

The rescheduler logic roughly follows the below:

  1. Gets a list of on-demand and spot nodes and their respective Pods
  • Builds a map of nodeInfo structs
    • Add node to struct
    • Add pods for that node to struct
    • Add requested and free CPU fields to struct
  • Map these structs based on whether they are on-demand or spot instances.
  • Sort on-demand instances by least requested CPU
  • Sort spot instances by most free CPU
  1. Iterate through each on-demand node and try to drain it
  • Iterate through each pod
    • Determine if a spot node has space for the pod
    • Add the pod to the prospective spot node
    • Move onto next node if no spot node space available
  • Drain the node
    • Iterate through pods and evict them in turn
      • Evict pod
      • Wait for deletion and reschedule
    • Cancel all further processing

This process is repeated every housekeeping-interval seconds.

The effect of this algorithm should be, that we take the emptiest nodes first and empty those before we empty a node which is busier, thus resulting in the highest number of 'empty' nodes that can be removed from the cluster.

Related

Communication

  • Found a bug? Please open an issue.
  • Have a feature request. Please open an issue.
  • If you want to contribute, please submit a pull request

Contributing

Please see our Contributing guidelines.

License

This project is licensed under Apache 2.0 and a copy of the license is available here.

More Repositories

1

pusher-js

Pusher Javascript library
JavaScript
1,970
star
2

atom-pair

An Atom package that allows for epic pair programming
JavaScript
1,454
star
3

pusher-http-php

PHP library for interacting with the Pusher Channels HTTP API
PHP
1,355
star
4

pusher-http-ruby

Ruby library for Pusher Channels HTTP API
Ruby
659
star
5

libPusher

An Objective-C interface to Pusher Channels
C
409
star
6

pusher-http-laravel

[DEPRECATED] A Pusher Channels bridge for Laravel
PHP
405
star
7

pusher-http-python

Pusher Channels HTTP API library for Python
Python
368
star
8

pusher-websocket-java

Pusher Channels client library for Java targeting general Java and Android
Java
302
star
9

pusher-websocket-swift

Pusher Channels websocket library for Swift
Swift
267
star
10

build-a-slack-clone-with-react-and-pusher-chatkit

In this tutorial, you'll learn how to build a chat app with React, complete with typing indicators, online status, and more.
JavaScript
235
star
11

pusher-angular

Pusher Angular Library | owner=@leesio
JavaScript
233
star
12

pusher-http-go

Pusher Channels HTTP API library for Go
Go
196
star
13

NWWebSocket

A WebSocket client written in Swift, using the Network framework from Apple.
Swift
123
star
14

k8s-spot-termination-handler

Monitors AWS for spot termination notices when run on spot instances and shuts down gracefully
Makefile
118
star
15

go-interface-fuzzer

Automate the boilerplate of fuzz testing Go interfaces | owner: @willsewell
Go
110
star
16

pusher-http-dotnet

.NET library for interacting with the Pusher HTTP API
C#
109
star
17

pusher-websocket-dotnet

Pusher Channels Client Library for .NET
C#
107
star
18

k8s-auth-example

Example Kubernetes Authentication helper. Performs OIDC login and configures Kubectl appropriately.
Go
107
star
19

faros

Faros is a CRD based GitOps controller
Go
99
star
20

backbone-todo-app

JavaScript
92
star
21

chatkit-client-js

JavaScript client SDK for Pusher Chatkit
JavaScript
90
star
22

pusher-channels-flutter

Pusher Channels client library for Flutter targeting IOS, Android, and WEB
Dart
72
star
23

quack

In-Cluster templating for Kubernetes manifests
Go
70
star
24

pusher-websocket-react-native

React Native official Pusher SDK
TypeScript
61
star
25

websockets-from-scratch-tutorial

Tutorial that shows how to implement a websocket server using Ruby's built-in libs
Ruby
60
star
26

push-notifications-php

Pusher Beams PHP Server SDK
PHP
56
star
27

backpusher

JavaScript
54
star
28

chatkit-android

Android client SDK for Pusher Chatkit
Kotlin
53
star
29

django-pusherable

Real time notification when an object view is accessed via Pusher
Python
52
star
30

cli

A CLI for Pusher (beta)
Go
51
star
31

notify

Ruby
51
star
32

k8s-spot-price-monitor

Monitors the spot prices of instances in a Kubernetes cluster and exposes them as prometheus metrics
Python
44
star
33

chatkit-command-line-chat

A CLI chat, built with Chatkit
JavaScript
41
star
34

pusher-http-java

Java client to interact with the Pusher HTTP API
Java
40
star
35

chatkit-swift

Swift SDK for Pusher Chatkit
Swift
40
star
36

push-notifications-web

Beams Browser notifications
JavaScript
39
star
37

electron-desktop-chat

A desktop chat built with React, React Desktop and Electron
JavaScript
38
star
38

crank

Process slow restarter
Go
37
star
39

pusher-websocket-android

Library built on top of pusher-websocket-java for Android. Want Push Notifications? Check out Pusher Beams!
Java
35
star
40

chameleon

A collection of front-end UI components used across Pusher ✨
CSS
35
star
41

chatkit-server-php

PHP SDK for Pusher Chatkit
PHP
35
star
42

push-notifications-swift

Swift SDK for the Pusher Beams product:
Swift
34
star
43

cide

Isolated test runner with Docker
Ruby
33
star
44

pusher-phonegap-android

JavaScript
30
star
45

push-notifications-python

Pusher Beams Python Server SDK
Python
30
star
46

pusher-websocket-unity

Pusher Channels Unity Client Library
C#
27
star
47

hacktoberfest

24
star
48

laravel-chat

PHP
23
star
49

push-notifications-android

Android SDK for Pusher Beams
Kotlin
21
star
50

push-notifications-node

Pusher Beams Node.js Server SDK
JavaScript
20
star
51

pusher-test-iOS

iOS app for developers to test connections to Pusher
Objective-C
19
star
52

push-notifications-ruby

Pusher Beams Ruby Server SDK
Ruby
18
star
53

chatkit-server-node

Node.js SDK for Pusher Chatkit
TypeScript
16
star
54

rack-headers_filter

Remove untrusted headers from Rack requests | owner=@zimbatm
Ruby
15
star
55

pusher-test-android

Test and diagnostic app for Android, based on pusher-java-client
Java
14
star
56

pusher-realtime-tfl-cameras

Realtime TfL Traffic Camera API, powered by Pusher
JavaScript
14
star
57

buddha

Buddha command execution and health checking | owner: @willsewell
Go
14
star
58

chatkit-server-go

Chatkit server SDK for Golang
Go
13
star
59

pusher-channels-auth-example

A simple server exposing a pusher auth endpoint
JavaScript
13
star
60

pusher-platform-js

Pusher Platform client library for browsers and react native
TypeScript
13
star
61

stronghold

[DEPRECATED] A configuration service | owner: @willsewell
Haskell
12
star
62

sample-chatroom-ios-chatkit

How to make an iOS Chatroom app using Swift and Chatkit
PHP
12
star
63

pusher-twilio-example

CSS
12
star
64

chatkit-server-ruby

Ruby server SDK for Chatkit
Ruby
12
star
65

prom-rule-reloader

Watches configmaps for prometheus rules and keeps prometheus in-sync
Go
12
star
66

electron-desktop-starter-template

JavaScript
11
star
67

realtime-visitor-tracker

Realtime location aware visitor tracker for a web site or application
PHP
11
star
68

push-notifications-server-java

Pusher Beams Java Server SDK
Kotlin
10
star
69

android-slack-clone

Android chat application, built with Chatkit
Kotlin
10
star
70

filtrand

JavaScript
10
star
71

vault

Front-end pattern library
Ruby
9
star
72

git-store

Go git abstraction for use in Kubernetes Controllers
Go
9
star
73

pusher-platform-android

Pusher Platform SDK for Android
Kotlin
9
star
74

push-notifications-go

Pusher Beams Go Server SDK
Go
9
star
75

pusher-platform-swift

Swift SDK for Pusher platform products
Swift
8
star
76

realtime_survey_complete

JavaScript
8
star
77

docs

The all new Pusher docs, powered by @11ty and @vercel
CSS
8
star
78

push-notifications-server-swift

Pusher Beams Swift Server SDK
Swift
8
star
79

pusher-python-rest

Python client to interact with the Pusher REST API. DEPRECATED in favour of https://github.com/pusher/pusher-http-python
Python
8
star
80

real-time-progress-bar-tutorial

Used inthe realtime progress bar tutorial blog post - http://blog.pusher.com
JavaScript
7
star
81

pusher-channels-chunking-example

HTML
7
star
82

pusher-http-swift

Swift library for interacting with the Pusher Channels HTTP API
Swift
7
star
83

feeds-client-js

JS client for Pusher Feeds
JavaScript
6
star
84

pusher-test

Simple website which allows manual testing of pusher-js versions
JavaScript
6
star
85

java-websocket

A fork of https://github.com/TooTallNate/Java-WebSocket | owner=@zmarkan
HTML
6
star
86

navarchos

Node replacing controller
Go
5
star
87

bridge-troll

A Troll that ensures files don't change
Go
5
star
88

realtime-notifications-tutorial

Create realtime notifications in minutes, not days =)
4
star
89

pusher-socket-protocol

Protocol for pusher sockets
HTML
4
star
90

icanhazissues

Github issues kanban
JavaScript
4
star
91

textsync-server-node

[DEPRECATED] A node.js library to simplify token generation for TextSync authorization endpoints.
TypeScript
4
star
92

pusher_tutorial_realtimeresults

JavaScript
3
star
93

pusher-js-diagnostics

JavaScript
3
star
94

react-rest-api-tutorial

Accompanying tutorial for consuming RESTful APIs in React
CSS
3
star
95

feeds-server-node

The server Node SDK for Pusher Feeds
JavaScript
3
star
96

testing

Configuration for Pusher's Open Source Prow instance
Go
3
star
97

spacegame_example

Simple example of a space game using node.js and Pusher
JavaScript
3
star
98

chatkit-quickstart-swift

A project to get started with Chatkit.
Swift
2
star
99

pusher-whos-in

Ruby
2
star
100

healthz-proxy

healthz proxy for zero downtime rollouts
Go
2
star