• Stars
    star
    110
  • Rank 305,609 (Top 7 %)
  • Language
    Python
  • Created over 8 years ago
  • Updated about 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Simple Proof-of-Concept for Scaling Application running on Marathon based on Utilization

marathon-autoscale

Dockerized auto-scaler application that can be run under Marathon management to dynamically scale a service running on DC/OS.

Prerequisites

  1. A running DC/OS cluster
  2. DC/OS CLI installed on your local machine

If running on a DC/OS cluster in Permissive or Strict mode, an user or service account with the appropriate permissions to modify Marathon jobs. An example script for setting up a service account can be found in create-service-account.sh

Installation/Configuration

Building the Docker container

How to build the container:

image_name="mesosphere/marathon-autoscale" make
image_name="mesosphere/marathon-autoscale" make push

There is also a image available at docker hub for mesosphere/marathon-autoscaler

(Optional) Creating a service account

The create_service_account.sh script takes two parameters:

Service-Account-Name #the name of the service account you want to create
Namespace-Path #the path to launch this service under marathon management.  e.g. / or /dev

$ ./create-service-account.sh [service-account-name] [namespace-path]

Install from the DC/OS Catalog

The marathon-autoscaler can be installed as a service from the DC/OS catalog with dcos package install marathon-autoscaler. There is no default installation for this service. The autoscaler needs to know a number of things in order to scale an application. The DC/OS package install process for this service requires a configuration options during installation. Assuming a simple /sleep application running in Marathon and the following config.json file:

{
  "autoscaler": {
    "marathon-app" : "/sleepy",
    "userid": "agent-99",
    "password": "secret"
  }
}

All the configurations listed below are available to be changed via the config.json file. The default name of this service is ${marathon-app}-autoscaler in this case, the service is /sleepy-autoscaler.

Marathon examples

Autoscale examples

Update one of the definitions in the Marathon definitions folder to match your specific configuration. Marathon application names must include the forward slash. This modification was made in order to handle applications within service groups. (e.g. /group/hello-dcos)

Core environment variables available to the application:

AS_DCOS_MASTER # hostname of dcos master
AS_MARATHON_APP # app to autoscale

AS_TRIGGER_MODE # scaling mode (cpu | mem | sqs | and | or)

AS_AUTOSCALE_MULTIPLIER # The number by which current instances will be multiplied (scale-out) or divided (scale-in). This determines how many instances to add during scale-out, or remove during scale-in.
AS_MIN_INSTANCES # min number of instances, donโ€™t make less than 2
AS_MAX_INSTANCES # max number of instances, must be greater than AS_MIN_INSTANCES

AS_COOL_DOWN_FACTOR # how many times should we poll before scaling down
AS_SCALE_UP_FACTOR # how many times should we poll before scaling up
AS_INTERVAL #how often should we poll in seconds

Notes

If you are using an authentication:

AS_USERID # username of the user or service account with access to scale the service
--and either--
AS_PASSWORD: secret0 # password of the userid above ideally from the secret store
AS_SECRET: secret0 # private key of the userid above ideally from the secret store

If you are using CPU as your scaling mode:

AS_MAX_RANGE # max average cpu time as float, e.g. 80 or 80.5
AS_MIN_RANGE # min average cpu time as float, e.g. 55 or 55.5

If you are using Memory as your scaling mode:

AS_MAX_RANGE # max avg mem utilization percent as float, e.g. 75 or 75.0
AS_MIN_RANGE # min avg mem utilization percent as float, e.g. 55 or 55.0

If you are using AND (CPU and Memory) as your scaling mode:

AS_MAX_RANGE # [max average cpu time, max avg mem utilization percent], e.g. 75.0,80.0
AS_MIN_RANGE # [min average cpu time, min avg men utilization percent], e.g. 55.0,55.0

If you are using OR (CPU or Memory) as your scaling mode:

AS_MAX_RANGE # [max average cpu time, max avg mem utilization percent], e.g. 75.0,80.0
AS_MIN_RANGE # [min average cpu time, min avg men utilization percent], e.g. 55.0,55.0

If you are using SQS as your scaling mode:

AS_QUEUE_URL # full URL of the SQS queue
AWS_ACCESS_KEY_ID # aws access key
AWS_SECRET_ACCESS_KEY # aws secret key
AWS_DEFAULT_REGION # aws region
AS_MIN_RANGE # min number of available messages in the queue
AS_MAX_RANGE # max number of available messages in the queue

Target application examples

In order to create artificial stress for an application, use one of the examples located in the Marathon Target Application folder.

Program Execution / Usage

Add your application to Marathon using the DC/OS Marathon CLI.

$ dcos marathon app add marathon_defs/marathon.json

Where the marathon.json has been built from one of the samples:

autoscale-cpu-noauth-marathon.json #security disabled or OSS DC/OS
autoscale-mem-noauth-marathon.json #security disabled or OSS DC/OS
autoscale-sqs-noauth-marathon.json #security disabled or OSS DC/OS
autoscale-cpu-svcacct-marathon.json #security permissive or strict on Enterprise DC/OS, using service account and private key (private key stored as a secret)

Verify the app is added with the command $ dcos marathon app list

Scaling Modes

CPU

In this mode, the system will scale the service up or down when the CPU has been out of range for the number of cycles defined in AS_SCALE_UP_FACTOR (for up) or AS_COOL_DOWN_FACTOR (for down). For AS_MIN_RANGE and AS_MAX_RANGE on multicore containers, the calculation for determining the value is # of CPU * desired CPU utilization percentage = CPU time (e.g. 80 cpu time * 2 cpu = 160 cpu time)

MEM

In this mode, the system will scale the service up or down when the Memory has been out of range for the number of cycles defined in AS_SCALE_UP_FACTOR (for up) or AS_COOL_DOWN_FACTOR (for down). For AS_MIN_RANGE and AS_MAX_RANGE on very small containers, remember that Mesos adds 32MB to the container spec for container overhead (namespace and cgroup), so your target percentages should take that into account. Alternatively, consider using the CPU only scaling mode for containers with very small memory footprints.

SQS

In this mode, the system will scale the service up or down when the Queue available message length has been out of range for the number of cycles defined in AS_SCALE_UP_FACTOR (for up) or AS_COOL_DOWN_FACTOR (for down). For the Amazon Web Services (AWS) Simple Queue Service (SQS) scaling mode, the queue length will be determined by the approximate number of visible messages attribute. The ApproximateNumberOfMessages attribute returns the approximate number of visible messages in a queue.

AND

In this mode, the system will only scale the service up or down when both CPU and Memory have been out of range for the number of cycles defined in AS_SCALE_UP_FACTOR (for up) or AS_COOL_DOWN_FACTOR (for down). For the MIN_RANGE and MAX_RANGE arguments/env vars, you must pass in a comma-delimited list of values. Values at index[0] will be used for CPU range and values at index[1] will be used for Memory range.

OR

In this mode, the system will only scale the service up or down when either CPU or Memory have been out of range for the number of cycles defined in AS_SCALE_UP_FACTOR (for up) or AS_COOL_DOWN_FACTOR (for down). For the MIN_RANGE and MAX_RANGE arguments/env vars, you must pass in a comma-delimited list of values. Values at index[0] will be used for CPU range and values at index[1] will be used for Memory range.

Extending the autoscaler (adding a new scaling mode)

In order to create a new scaling mode, you must create a new subclass in the modes directory/module and implement all abstract methods (e.g. scale_direction) of the abstract class AbstractMode.

Please note. The scale_direction function MUST return one of three values:

  • Scaling mode above thresholds MUST return 1
  • Scaling mode within thresholds MUST return 0
  • Scaling mode below thresholds MUST return -1

An example skeleton is below:

class ScaleByExample(AbstractMode):

    def __init__(self, api_client=None, app=None, dimension=None):
        super().__init__(api_client, app, dimension)

    def scale_direction(self):
         try:
            value = self.get_value()
            return super().scale_direction(value)
         except ValueError:
            raise

Once the new subclass is created, add the new mode to the MODES dictionary in Marathon AutoScaler.

# Dict defines the different scaling modes available to autoscaler
MODES = {
    'sqs': ScaleBySQS,
    'cpu': ScaleByCPU,
    'mem': ScaleByMemory,
    'and': ScaleByCPUAndMemory,
    'or': ScaleByCPUOrMemory,
    'exp': ScaleByExample
}

Examples

The following examples execute the python application from the command line.

(Optional) Only if using username/password or a service account

export AS_USERID=some-user-id
export AS_PASSWORD=some-password
-or-
export AS_SECRET=dc-os-secret-formatted-json

SQS message queue length as autoscale trigger

export AS_QUEUE_URL=
export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=
export AWS_DEFAULT_REGION=us-east-1

python marathon_autoscaler.py --dcos-master https://leader.mesos --trigger_mode sqs --autoscale_multiplier 1.5 --max_instances 5 --marathon-app /test/stress-sqs --min_instances 1 --cool_down_factor 4 --scale_up_factor 3 --interval 10 --min_range 2.0 --max_range 10.0

CPU as autoscale trigger

python marathon_autoscaler.py --dcos-master https://leader.mesos --trigger_mode cpu --autoscale_multiplier 1.5 --max_instances 5 --marathon-app /test/stress-cpu --min_instances 1 --cool_down_factor 4 --scale_up_factor 3 --interval 10 --min_range 55.0 --max_range 80.0

Memory as autoscale trigger

python marathon_autoscaler.py --dcos-master https://leader.mesos --trigger_mode mem --autoscale_multiplier 1.5 --max_instances 5 --marathon-app /test/stress-memory --min_instances 1 --cool_down_factor 4 --scale_up_factor 3 --interval 10 --min_range 55.0 --max_range 75.0

AND (CPU and Memory) as autoscale trigger

python marathon_autoscaler.py --dcos-master https://leader.mesos --trigger_mode and --autoscale_multiplier 1.5 --max_instances 5 --marathon-app /test/stress-cpu --min_instances 1 --cool_down_factor 4 --scale_up_factor 3 --interval 10 --min_range 55.0,2.0 --max_range 75.0,8.0

OR (CPU or Memory) as autoscale trigger

python marathon_autoscaler.py --dcos-master https://leader.mesos --trigger_mode or --autoscale_multiplier 1.5 --max_instances 5 --marathon-app /test/stress-cpu --min_instances 1 --cool_down_factor 4 --scale_up_factor 3 --interval 10 --min_range 55.0,10.0 --max_range 75.0,20.0

More Repositories

1

marathon

Deploy and manage containers (including Docker) on top of Apache Mesos at scale.
Scala
4,068
star
2

kubernetes-mesos

A Kubernetes Framework for Apache Mesos
641
star
3

cloudkeeper

Resoto creates an inventory of your cloud, provides deep visibility, and reacts to changes in your infrastructure. โšก๏ธ
Python
637
star
4

mesos-dns

DNS-based service discovery for Mesos.
Go
483
star
5

marathon-lb

Marathon-lb is a service discovery & load balancing tool for DC/OS
Python
449
star
6

playa-mesos

Quickly build Mesos sandbox environments using Vagrant. Run apps on top!
Shell
441
star
7

universe

The Mesosphere Universe package repository.
Mustache
304
star
8

chaos

A lightweight framework for writing REST services in Scala.
Scala
251
star
9

RENDLER

A rendering web crawler for Apache Mesos.
Python
246
star
10

marathon-ui

The web-ui for Marathon (https://github.com/mesosphere/marathon)
JavaScript
223
star
11

traefik-forward-auth

Go
205
star
12

mesos-docker

Project has been superseded by native docker support in Mesos
Python
177
star
13

dcos-kubernetes-quickstart

Quickstart guide for Kubernetes on DC/OS
HCL
168
star
14

dcos-commons

DC/OS SDK is a collection of tools, libraries, and documentation for easy integration of technologies such as Kafka, Cassandra, HDFS, Spark, and TensorFlow with DC/OS.
Java
157
star
15

reactjs-components

๐ŸŽจ A library of reusable React components
JavaScript
136
star
16

dcos-jenkins-service

Jenkins on DC/OS
Python
73
star
17

serenity

Intel:Mesosphere oversubscription technologies for Apache Mesos
C++
71
star
18

tweeter

A tiny Twitter clone for DC/OS
CSS
68
star
19

mesosaurus

Mesos task load simulator framework for (cluster and Mesos) performance analysis
Scala
59
star
20

reactive-graphql

A GraphQL implementation based around RxJS, very well suited for client side only GraphQL usage
TypeScript
57
star
21

net-modules

Apache Mesos modules for network isolation.
Python
55
star
22

konvoy-training

55
star
23

dcos-vagrant-box

Vagrant box packer for building boxes for dcos-vagrant
Shell
54
star
24

csilvm

A LVM2 CSI plugin
Go
53
star
25

spark-build

Used to build the mesosphere/spark docker image and the DC/OS Spark package
Python
53
star
26

docker-mesos-marathon-screencast

The scripts used in the Docker Clustering on Mesos with Marathon screencast.
Shell
51
star
27

dcos-docs-site

D2iQ Product Documentation and Docs Website Code
SCSS
51
star
28

mindthegap

Easily create and use bundles for air-gapped environments
Go
45
star
29

mesos-rxjava

RxJava client for Apache Mesos HTTP APIs
Java
42
star
30

letsencrypt-dcos

Let's Encrypt DC/OS!
Python
39
star
31

cd-demo

A continuous delivery demo using Jenkins on DC/OS.
Python
36
star
32

etcd-top

etcd realtime workload analyzer
Go
34
star
33

exelixi

Exelixi is a distributed framework for running genetic algorithms at scale. The framework is based on Apache Mesos and the code is mostly implemented in Python using gevent.
Python
34
star
34

tachyon-mesos

A Mesos Framework for Tachyon, a memory-centric distributed file system.
Scala
32
star
35

dcos-kafka-service

Open source Apache Kafka running on DC/OS
Python
32
star
36

kubernetes-security-benchmark

A simple way to evaluate the security of your Kubernetes deployment against sets of best practices defined by various community sources
Go
29
star
37

coreos-setup

Deprecated. See DCOS Community Edition for how to currently deploy Mesos on CoreOS
28
star
38

cnvs

CNVS (pronounced "Canvas") is a system of user interface elements and components built for use across Mesosphere sites and products. CNVS defines stylistic guidelines for the design and structure of digital interfaces in an effort to ensure consistency in brand and interaction.
CSS
28
star
39

mesos-utils

Utilities for building distributed systems on top of mesos
Scala
24
star
40

scala-sbt-mesos-framework.g8

Scala
23
star
41

marathon-example-plugins

Example Plugins for Marathon Plugin Interface
Scala
22
star
42

star

Test program for network policies.
Rust
19
star
43

charts

D2IQ Helm Chart Repository
Mustache
17
star
44

marathon-client

Java Integration Library for Mesosphere Marathon
Java
17
star
45

marathon-pkg

Packaging utilities for Marathon.
17
star
46

mesos-dns-pkg

Packaging utilities for Mesos-DNS
Makefile
16
star
47

konvoy-image-builder

Go
15
star
48

mom

Mesos on Mesos
Go
15
star
49

dcos-openvpn

14
star
50

sample_mesos_executor

Sample mesos executor
Scala
13
star
51

dklb

Expose Kubernetes services and ingresses through EdgeLB.
Go
12
star
52

kommander-applications

Go
12
star
53

usi

Deploy and manage containers (including Docker) on top of Apache Mesos at scale.
Scala
12
star
54

dcos-flink-service

Shell
11
star
55

edgerouter

DCOS edgerouter
Python
11
star
56

dcosdev

Python
10
star
57

kubernetes-base-addons

Kubernetes Addon Repository for KSphere
Go
10
star
58

kudo-spark-operator

KUDO Spark Operator
Shell
10
star
59

jackson-case-class-module

Deserialization support for Scala case classes, including proper handling of default values.
Scala
10
star
60

kudo-cassandra-operator

KUDO Cassandra Operator
Go
10
star
61

mesos-http-adapter

Java
8
star
62

exhibitor-dcos

Exhibitor on DCOS
Shell
8
star
63

ANAGRAMMER

An anagram finder for Apache Mesos
Python
8
star
64

field-notes

7
star
65

cake-builder

Cake Docker Builder
Go
7
star
66

kubeaddons-kommander

Kommander Addon Repository
Go
7
star
67

d2iq-daggers

Collection of tasks and utilities to manage ci-cd pipelines
Go
7
star
68

dcos-helloworld

DCOS HelloWorld subcommand.
Python
6
star
69

mesos-tracing

JavaScript
6
star
70

docker-screencasts

Shell
6
star
71

chronos-pkg

Makefile
5
star
72

mesos-website-container

Scripts for building docker image for generating mesos.apache.org from sources
Shell
5
star
73

ip_vs_conn

Erlang
5
star
74

docker-mac-network

Shell
5
star
75

d2iq-engineering-blog

Just a techblog test repo for showcasing
SCSS
5
star
76

bun

Command-line program which detects the most common problems in a DC/OS cluster by analyzing its diagnostics bundle
Go
4
star
77

marathon-storage-tool

Marathon Storage Tool
Scala
4
star
78

kubeaddons-enterprise

Enterprise Addon Repository
Python
4
star
79

kubernetes-keygen

Scripts for generating RSA keys and SSL certificates/authorities for use by Kubernetes cluster deployments
Shell
4
star
80

dispatch-catalog

Dispatch Official Catalog
Python
3
star
81

aurora_tutorial

Shell
3
star
82

health-checks-scale-tests

Marathon and Mesos-native health checks testing rig
Python
3
star
83

kubeaddons-kaptain

Kubeflow Addons
3
star
84

golang-repository-template

Go
3
star
85

terraform-provider-dcos

a Terraform (http://terraform.io) provider for interacting with Mesosphere DC/OS
Go
3
star
86

marathon-ui-example-plugin

JavaScript
3
star
87

dcos-sdk-service-diagnostics

Fetches "SDK Service"-related diagnostics artifacts. Owned by the Data Services and Orchestration teams.
Python
2
star
88

mesosphere-zookeeper

Makefile
2
star
89

mesos-build-images

Shell
2
star
90

dkp-catalog-applications

Makefile
2
star
91

kubernetes-sre-addons

Go
2
star
92

marathon-demo

Resources for Marathon demos
Shell
2
star
93

kubeaddons-community

Community Addon Repository
2
star
94

marathon-integration-tests

A collection of Gatling simulations for Marathon.
Scala
2
star
95

marathon-perf-measurement

2
star
96

marathon-ui-plugin-sdk

2
star
97

dcos-perf-test-driver

๐Ÿ’ช The DC/OS Performance and Scale Test Driver
Python
2
star
98

mesos-state-backed-collections

Persistent collection types backed by implementations of the Mesos state API.
Scala
2
star
99

dynamic-credential-provider

Simplifies using the Kubelet image credential provider feature with multiple cloud infrastructures
Go
1
star
100

kubeaddons-tests

tests for kubeaddons-enterprise catalog addons
Shell
1
star