• Stars
    star
    355
  • Rank 115,311 (Top 3 %)
  • Language
    Shell
  • License
    Other
  • Created over 9 years ago
  • Updated about 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Real-time monitoring of critical metrics & KPIs via elegant dashboards, Grafana3 visualizations & more

ZMON source code on GitHub is no longer in active development. Zalando will no longer actively review issues or merge pull-requests.

ZMON is still being used at Zalando and serves us well for many purposes. We are now deeper into our observability journey and understand better that we need other telemetry sources and tools to elevate our understanding of the systems we operate. We support the OpenTelemetry initiative and recommended others starting their journey to begin there.

If members of the community are interested in continuing developing ZMON, consider forking it. Please review the licence before you do.

Documentation Status

ZMON

ZMON is Zalando's open-source platform monitoring tool, used in production since early 2014. It supports our many engineering teams in observing their services and metrics on various layers, from low level system metrics to team's business KPIs.

Demo

Head over to demo.zmon.io to take a quick peek into the UI including Grafana3 (login first).

Introduction

To get familiar with the ideas behind ZMON and how things work, you can take a quick dive in: Intro

Talks / Blog

Take a look at the slides from our talk at the DevOps Ireland meetup for background information on ZMON.

First post about ZMON: Monitoring the platform

Features

  • Define checks as data sources executed on self-defined entities
  • Define alerts on checks and entities, with thresholds, as it suits your and your teams needs
  • Define custom dashboards with widgets and alert filters based on teams and tags
  • Check commands and alert conditions are arbitrary Python expressions, giving you a lot of power
  • All metric/check data is stored as time series in KairosDB for later use
  • Grafana3 is included, enabling you to build rich data driven dashboards
  • Powerful REST API to integrate nicely into other tools: e.g. cmdb/deploy tools
  • Entity service to store entities of any kind describing your environment
  • Trial run in the UI to develop your checks/alerts with quick feedback
  • Auto discovery of AWS services using ZMON's aws agent and entity service, great for AWS deployments
  • Authentication via OAuth 2 e.g. GitHub
  • Frontend incl. Grafana 3 requires full authentication, no need for VPN. incl. onetime tokens for office TV displays
  • Command line client for easy automation and interaction with the REST API
  • ZMON data service allows you to connect DCs/Regions via HTTP for federated monitoring
  • Supports SQL for PostgreSQL incl. sharded deployments, MySQL, Redis, Scalyr, ...
  • Supports desktop and mobile notifications via Firebase Cloud Messaging
  • More on connectivity here: Check commands

Local demo and single host deployment

We suggest to use docker compose for deploying zmon locally or on a single host:

More here: compose

The docker compose is also the most convient way to setup a development environment.

In cases where docker compose is not an options continue on (or fall back to obsolete vagrant box).

Manual Deployment

You best head for the documentation now: Component overview

Requirements

ZMON relies on a few great open source products to run, which you will need to operate.

  • Redis
  • PostgreSQL
  • Cassandra + KairosDB

This seems to be a lot, but we provide both a Vagrant box and the deployment scripts for our demo host, lowering the bar to get started :)

Components

Frontend / Controller UI and REST API

Scheduler Schedules check/alert execution

Worker Executes check/alert commands and data acquisition

Optional components

Data service Used for distributed monitoring where sites don't share network connectivity other than the Internet.

Metric cache Fast special purpose cache for REST API metric data for ZMON's REST metrics/cloud UI

Vagrant Box (deprecated)

Install a recent Vagrant version (at least 1.7.4) and simply do:

$ vagrant up

Please note that the provisioning process will take some time (~15min) while it downloads the Docker images.

Frontend

https://localhost:8443/

Login with your own GitHub credentials (OAuth redirect).

Grafana

https://localhost:8443/grafana/

You will be able to create/save dashboards.

KairosDB

KairosDB frontend, i.e. for manually query of metrics:

http://localhost:38083/

Issues

  • If single containers do not start up ssh into the vagrant box and run the start.sh script again manually or use the start-services.sh script to restart single components. Later one takes parameters like controller or worker.

Install the Command Line Interface

Use PIP to install the zmon executable from PyPI.

$ pip3 install --upgrade zmon-cli

Use the ZMON CLI to push/create/update entities (hosts, databases, etc.), check definitions and create optional alerts (also possible via UI).

$ zmon entities push examples/entities/local-postgresql.yaml

$ zmon entities push examples/entities/local-scheduler-instance.json

Push your first check definition:

$ zmon check-definitions update examples/check-definitions/zmon-scheduler-rates.yaml

Modify the alert definition to point to the right check id before doing:

$ zmon alert-definitions update examples/alert-definitions/scheduler-rate-too-low.yaml

Build Environment

If you want to compile everything from source, you can do so with our separate "build-env" Vagrant box:

$ cd build-env
$ vagrant up

Thanks

Docker images/scripts used in slightly modified versions are:

  • abh1nav/cassandra:latest
  • wangdrew/kairosdb
  • official Redis and PostgreSQL

Thanks to the original authors!

License

Copyright 2013-2016 Zalando SE

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

More Repositories

1

patroni

A template for PostgreSQL High Availability with Etcd, Consul, ZooKeeper, or Kubernetes
Python
6,058
star
2

postgres-operator

Postgres operator creates and manages PostgreSQL clusters running in Kubernetes
Go
3,686
star
3

skipper

An HTTP router and reverse proxy for service composition, including use cases like Kubernetes Ingress
Go
3,005
star
4

zalenium

A flexible and scalable container based Selenium Grid with video recording, live preview, basic auth & dashboard.
Java
2,380
star
5

restful-api-guidelines

A model set of guidelines for RESTful APIs and Events, created by Zalando
CSS
2,067
star
6

SwiftMonkey

A framework for doing randomised UI testing of iOS apps
Swift
1,945
star
7

tailor

A streaming layout service for front-end microservices
JavaScript
1,726
star
8

logbook

An extensible Java library for HTTP request and response logging
Java
1,684
star
9

tech-radar

Visualizing our technology choices
1,491
star
10

spilo

Highly available elephant herd: HA PostgreSQL cluster using Docker
Python
1,225
star
11

intellij-swagger

A plugin to help you easily edit Swagger and OpenAPI specification files inside IntelliJ IDEA
Java
1,160
star
12

problem-spring-web

A library for handling Problems in Spring Web MVC
Java
997
star
13

nakadi

A distributed event bus that implements a RESTful API abstraction on top of Kafka-like queues
Java
928
star
14

zally

A minimalistic, simple-to-use API linter
Kotlin
873
star
15

problem

A Java library that implements application/problem+json
Java
851
star
16

zalando-howto-open-source

Open Source guidance from Zalando, Europe's largest online fashion platform
799
star
17

go-keyring

Cross-platform keyring interface for Go
Go
689
star
18

gin-oauth2

Middleware for Gin Framework users who also want to use OAuth2
Go
556
star
19

zappr

An agent that enforces guidelines for your GitHub repositories
JavaScript
543
star
20

pg_view

Get a detailed, real-time view of your PostgreSQL database and system metrics
Python
488
star
21

engineering-principles

Our guidelines for building new applications and managing legacy systems
363
star
22

gulp-check-unused-css

A build tool for checking your HTML templates for unused CSS classes
CSS
359
star
23

expan

Open-source Python library for statistical analysis of randomised control trials (A/B tests)
Python
325
star
24

PGObserver

A battle-tested, flexible & comprehensive monitoring solution for your PostgreSQL databases
Python
315
star
25

riptide

Client-side response routing for Spring
Java
285
star
26

jackson-datatype-money

Extension module to properly support datatypes of javax.money
Java
240
star
27

grafter

Grafter is a library to configure and wire Scala applications
Scala
240
star
28

opentracing-toolbox

Best-of-breed OpenTracing utilities, instrumentations and extensions
Java
178
star
29

elm-street-404

A fun WebGL game built with Elm
Elm
176
star
30

tokens

Java library for conveniently verifying and storing OAuth 2.0 service access tokens
Java
169
star
31

innkeeper

Simple route management API for Skipper
Scala
166
star
32

public-presentations

List of public talks by Zalando Tech: meetup presentations, recorded conference talks, slides
165
star
33

python-nsenter

Enter kernel namespaces from Python
Python
139
star
34

dress-code

The official style guide and framework for all Zalando Brand Solutions products
CSS
129
star
35

faux-pas

A library that simplifies error handling for Functional Programming in Java
Java
128
star
36

beard

A lightweight, logicless templating engine, written in Scala and inspired by Mustache
Scala
121
star
37

friboo

Utility library for writing microservices in Clojure, with support for Swagger and OAuth
Clojure
117
star
38

spring-cloud-config-aws-kms

Spring Cloud Config add-on that provides encryption via AWS KMS
Java
99
star
39

zalando.github.io

Open Source Documentation and guidelines for Zalando developers
HTML
80
star
40

failsafe-actuator

Endpoint library for the failsafe framework
Java
53
star
41

package-build

A toolset for building system packages using Docker and fpm-cookery
Ruby
35
star
42

ghe-backup

Github Enterprise backup at ZalandoTech (Kubernetes, AWS, Docker)
Shell
30
star
43

rds-health

discover anomalies, performance issues and optimization within AWS RDS
Go
18
star
44

backstage-plugin-api-linter

API Linter is a quality assurance tool that checks the compliance of API's specifications to Zalando's API rules.
TypeScript
12
star
45

.github

Standard github health files
1
star