• Stars
    star
    341
  • Rank 123,998 (Top 3 %)
  • Language
  • License
    MIT License
  • Created over 4 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Prometheus-Grafana with Docker-compose

Prometheus-Grafana

A monitoring solution for Docker hosts and containers with Prometheus, Grafana, cAdvisor, NodeExporter and alerting with AlertManager.

This is a forked repository. So, you may want to visit the original repo at stefanprodan / dockprom

Additional info: Docker - Prometheus and Grafana

Install

Create .env:

ADMIN_USER=admin  
ADMIN_PASSWORD=admin

Clone this repository on your Docker host, cd into test directory and run compose up:

git clone https://github.com/Einsteinish/Docker-Compose-Prometheus-and-Grafana.git
cd Docker-Compose-Prometheus-and-Grafana
docker-compose up -d

Prerequisites:

  • Docker Engine >= 1.13
  • Docker Compose >= 1.11

Containers:

  • Prometheus (metrics database) http://<host-ip>:9090
  • Prometheus-Pushgateway (push acceptor for ephemeral and batch jobs) http://<host-ip>:9091
  • AlertManager (alerts management) http://<host-ip>:9093
  • Grafana (visualize metrics) http://<host-ip>:3000
  • NodeExporter (host metrics collector)
  • cAdvisor (containers metrics collector)
  • Caddy (reverse proxy and basic auth provider for prometheus and alertmanager)

Setup Grafana

Navigate to http://<host-ip>:3000 and login with user admin password admin. You can change the credentials in the compose file or by supplying the ADMIN_USER and ADMIN_PASSWORD environment variables via .env file on compose up. The config file can be added directly in grafana part like this

grafana:
  image: grafana/grafana:5.2.4
  env_file:
    - config

and the config file format should have this content

GF_SECURITY_ADMIN_USER=admin
GF_SECURITY_ADMIN_PASSWORD=changeme
GF_USERS_ALLOW_SIGN_UP=false

If you want to change the password, you have to remove this entry, otherwise the change will not take effect

- grafana_data:/var/lib/grafana

Grafana is preconfigured with dashboards and Prometheus as the default data source:

Docker Host Dashboard

Host

The Docker Host Dashboard shows key metrics for monitoring the resource usage of your server:

  • Server uptime, CPU idle percent, number of CPU cores, available memory, swap and storage
  • System load average graph, running and blocked by IO processes graph, interrupts graph
  • CPU usage graph by mode (guest, idle, iowait, irq, nice, softirq, steal, system, user)
  • Memory usage graph by distribution (used, free, buffers, cached)
  • IO usage graph (read Bps, read Bps and IO time)
  • Network usage graph by device (inbound Bps, Outbound Bps)
  • Swap usage and activity graphs

For storage and particularly Free Storage graph, you have to specify the fstype in grafana graph request. You can find it in grafana/dashboards/docker_host.json, at line 480 :

  "expr": "sum(node_filesystem_free_bytes{fstype=\"btrfs\"})",

I work on BTRFS, so i need to change aufs to btrfs.

You can find right value for your system in Prometheus http://<host-ip>:9090 launching this request :

  node_filesystem_free_bytes

Docker Containers Dashboard

Containers

The Docker Containers Dashboard shows key metrics for monitoring running containers:

  • Total containers CPU load, memory and storage usage
  • Running containers graph, system load graph, IO usage graph
  • Container CPU usage graph
  • Container memory usage graph
  • Container cached memory usage graph
  • Container network inbound usage graph
  • Container network outbound usage graph

Note that this dashboard doesn't show the containers that are part of the monitoring stack.

Monitor Services Dashboard

Monitor Services Monitor Services Monitor Services

The Monitor Services Dashboard shows key metrics for monitoring the containers that make up the monitoring stack:

  • Prometheus container uptime, monitoring stack total memory usage, Prometheus local storage memory chunks and series
  • Container CPU usage graph
  • Container memory usage graph
  • Prometheus chunks to persist and persistence urgency graphs
  • Prometheus chunks ops and checkpoint duration graphs
  • Prometheus samples ingested rate, target scrapes and scrape duration graphs
  • Prometheus HTTP requests graph
  • Prometheus alerts graph

Define alerts

Three alert groups have been setup within the alert.rules configuration file:

You can modify the alert rules and reload them by making a HTTP POST call to Prometheus:

curl -X POST http://admin:admin@<host-ip>:9090/-/reload

Monitoring services alerts

Trigger an alert if any of the monitoring targets (node-exporter and cAdvisor) are down for more than 30 seconds:

- alert: monitor_service_down
    expr: up == 0
    for: 30s
    labels:
      severity: critical
    annotations:
      summary: "Monitor service non-operational"
      description: "Service {{ $labels.instance }} is down."

Docker Host alerts

Trigger an alert if the Docker host CPU is under high load for more than 30 seconds:

- alert: high_cpu_load
    expr: node_load1 > 1.5
    for: 30s
    labels:
      severity: warning
    annotations:
      summary: "Server under high load"
      description: "Docker host is under high load, the avg load 1m is at {{ $value}}. Reported by instance {{ $labels.instance }} of job {{ $labels.job }}."

Modify the load threshold based on your CPU cores.

Trigger an alert if the Docker host memory is almost full:

- alert: high_memory_load
    expr: (sum(node_memory_MemTotal_bytes) - sum(node_memory_MemFree_bytes + node_memory_Buffers_bytes + node_memory_Cached_bytes) ) / sum(node_memory_MemTotal_bytes) * 100 > 85
    for: 30s
    labels:
      severity: warning
    annotations:
      summary: "Server memory is almost full"
      description: "Docker host memory usage is {{ humanize $value}}%. Reported by instance {{ $labels.instance }} of job {{ $labels.job }}."

Trigger an alert if the Docker host storage is almost full:

- alert: high_storage_load
    expr: (node_filesystem_size_bytes{fstype="aufs"} - node_filesystem_free_bytes{fstype="aufs"}) / node_filesystem_size_bytes{fstype="aufs"}  * 100 > 85
    for: 30s
    labels:
      severity: warning
    annotations:
      summary: "Server storage is almost full"
      description: "Docker host storage usage is {{ humanize $value}}%. Reported by instance {{ $labels.instance }} of job {{ $labels.job }}."

Docker Containers alerts

Trigger an alert if a container is down for more than 30 seconds:

- alert: jenkins_down
    expr: absent(container_memory_usage_bytes{name="jenkins"})
    for: 30s
    labels:
      severity: critical
    annotations:
      summary: "Jenkins down"
      description: "Jenkins container is down for more than 30 seconds."

Trigger an alert if a container is using more than 10% of total CPU cores for more than 30 seconds:

- alert: jenkins_high_cpu
    expr: sum(rate(container_cpu_usage_seconds_total{name="jenkins"}[1m])) / count(node_cpu_seconds_total{mode="system"}) * 100 > 10
    for: 30s
    labels:
      severity: warning
    annotations:
      summary: "Jenkins high CPU usage"
      description: "Jenkins CPU usage is {{ humanize $value}}%."

Trigger an alert if a container is using more than 1.2GB of RAM for more than 30 seconds:

- alert: jenkins_high_memory
    expr: sum(container_memory_usage_bytes{name="jenkins"}) > 1200000000
    for: 30s
    labels:
      severity: warning
    annotations:
      summary: "Jenkins high memory usage"
      description: "Jenkins memory consumption is at {{ humanize $value}}."

Setup alerting

The AlertManager service is responsible for handling alerts sent by Prometheus server. AlertManager can send notifications via email, Pushover, Slack, HipChat or any other system that exposes a webhook interface. A complete list of integrations can be found here.

You can view and silence notifications by accessing http://<host-ip>:9093.

The notification receivers can be configured in alertmanager/config.yml file.

To receive alerts via Slack you need to make a custom integration by choose incoming web hooks in your Slack team app page. You can find more details on setting up Slack integration here.

Copy the Slack Webhook URL into the api_url field and specify a Slack channel.

route:
    receiver: 'slack'

receivers:
    - name: 'slack'
      slack_configs:
          - send_resolved: true
            text: "{{ .CommonAnnotations.description }}"
            username: 'Prometheus'
            channel: '#<channel>'
            api_url: 'https://hooks.slack.com/services/<webhook-id>'

Slack Notifications

Sending metrics to the Pushgateway

The pushgateway is used to collect data from batch jobs or from services.

To push data, simply execute:

echo "some_metric 3.14" | curl --data-binary @- http://user:password@localhost:9091/metrics/job/some_job

Please replace the user:password part with your user and password set in the initial configuration (default: admin:admin).

Updating Grafana to v5.2.2

In Grafana versions >= 5.1 the id of the grafana user has been changed. Unfortunately this means that files created prior to 5.1 won’t have the correct permissions for later versions.

Version User User ID
< 5.1 grafana 104
>= 5.1 grafana 472

There are two possible solutions to this problem.

  • Change ownership from 104 to 472
  • Start the upgraded container as user 104
Specifying a user in docker-compose.yml

To change ownership of the files run your grafana container as root and modify the permissions.

First perform a docker-compose down then modify your docker-compose.yml to include the user: root option:

  grafana:
    image: grafana/grafana:5.2.2
    container_name: grafana
    volumes:
      - grafana_data:/var/lib/grafana
      - ./grafana/datasources:/etc/grafana/datasources
      - ./grafana/dashboards:/etc/grafana/dashboards
      - ./grafana/setup.sh:/setup.sh
    entrypoint: /setup.sh
    user: root
    environment:
      - GF_SECURITY_ADMIN_USER=${ADMIN_USER:-admin}
      - GF_SECURITY_ADMIN_PASSWORD=${ADMIN_PASSWORD:-admin}
      - GF_USERS_ALLOW_SIGN_UP=false
    restart: unless-stopped
    expose:
      - 3000
    networks:
      - monitor-net
    labels:
      org.label-schema.group: "monitoring"

Perform a docker-compose up -d and then issue the following commands:

docker exec -it --user root grafana bash

# in the container you just started:
chown -R root:root /etc/grafana && \
chmod -R a+r /etc/grafana && \
chown -R grafana:grafana /var/lib/grafana && \
chown -R grafana:grafana /usr/share/grafana

To run the grafana container as user: 104 change your docker-compose.yml like such:

  grafana:
    image: grafana/grafana:5.2.2
    container_name: grafana
    volumes:
      - grafana_data:/var/lib/grafana
      - ./grafana/datasources:/etc/grafana/datasources
      - ./grafana/dashboards:/etc/grafana/dashboards
      - ./grafana/setup.sh:/setup.sh
    entrypoint: /setup.sh
    user: "104"
    environment:
      - GF_SECURITY_ADMIN_USER=${ADMIN_USER:-admin}
      - GF_SECURITY_ADMIN_PASSWORD=${ADMIN_PASSWORD:-admin}
      - GF_USERS_ALLOW_SIGN_UP=false
    restart: unless-stopped
    expose:
      - 3000
    networks:
      - monitor-net
    labels:
      org.label-schema.group: "monitoring"

More Repositories

1

Docker-compose-Nginx-Reverse-Proxy-II

HTML
89
star
2

Artificial-Neural-Networks-with-Jupyter

Artificial Neural Networks - Gradient descent, BFGS, Regularization with Jupyter notebook
Jupyter Notebook
55
star
3

ahaman-Flask-with-Machine-Learning-Sentiment-Analysis

Python
38
star
4

bogotobogo-Machine-Learning

Code repository - Jupyter notebook
Jupyter Notebook
38
star
5

Docker-compose-Nginx-Reverse-Proxy

Dockerfile
33
star
6

Docker-compose-Hashicorp-Vault-Consul

Hashicorp's Vault and Console local deploy
Dockerfile
28
star
7

Einsteinish-ELK-Stack-with-docker-compose

Dockerfile
26
star
8

Terraform-Turotials

Introductory Terraform Tutorials
HCL
24
star
9

FlaskBlogApp

HTML
22
star
10

Einstein

Qunatum to Universe
Python
20
star
11

node-mongodb-docker-compose

JavaScript
17
star
12

private-tls-cert-terraform

Creating Private TLS Cert with Terraform
HCL
14
star
13

docker-nginx-hello-world

SIngle page docker nginx
HTML
14
star
14

Ansible-101

Ansible playbooks
HTML
13
star
15

PyThonFlaskjQueryAJAX

HTML
11
star
16

docker-elk

ELK docker compose
Dockerfile
10
star
17

kubernetes_django

Python
9
star
18

Ansible-Playbooks-Samples

Shell
8
star
19

aws-sqs-node-js-example

JavaScript
8
star
20

mongo-mongoexpress-minikube

7
star
21

Ansible-Minikube-GoApp

Go
6
star
22

react-nodejs-mysql-docker-compose

JavaScript
6
star
23

akaML

JavaScript
5
star
24

MonolithicToMicroServices-GKE

Shell
5
star
25

Using-Ansible-with-Terraform

HCL
5
star
26

AWS-Terraform-Introduction-Samples

HCL
5
star
27

akadrone-flask

Python
3
star
28

AWS-Terraform-Introduction-Loops

HCL
3
star
29

HashCorp-Vault-and-Consul-on-Minikube

Vault & Consul on Kubernetes (minikube)_
3
star
30

webhook-demo

2
star
31

GitHub-API

Python
2
star
32

docker-compose-flask-rest-api-service-container-and-apache-container

Python
2
star
33

PyTune3

Python
2
star
34

k8s-node-express-mysql-api-deploy-via-helm

JavaScript
2
star
35

nodejs-express-sequelize-mysql-docker

JavaScript
2
star
36

Vault-Consul-on-AWS-with-Terraform

Shell
2
star
37

Django-Haystack-Elasticsearch

Python
2
star
38

Continuous-Delivery-Pipelines-with-Spinnaker-and-Kubernetes-Engine

Go
1
star
39

MEAN-Docker

TypeScript
1
star
40

Scala-Play-Framework-with-Angular

JavaScript
1
star
41

Continuous-Deployment-to-GKE-using-Jenkins-MultibranchPipeline-with-Helm

Go
1
star
42

xophist

PHP
1
star
43

terraform-cloud-demo

HCL
1
star
44

flask-vagrant-ansible

Python
1
star
45

Jenkinsfile-Multibranch

1
star
46

doremilyrics

Python
1
star
47

Deploying-JBossWildFly-Application-container-to-AWS-Beanstalk

Dockerfile
1
star
48

Ansible-101B

Python
1
star
49

AngularJS_Shopping_Cart_Sample

TypeScript
1
star
50

terraform-aws-vpc-elb-nginx

HCL
1
star
51

DumbNode

JavaScript
1
star
52

PySpark

PySpark Tutorials
Jupyter Notebook
1
star
53

react-minikube

Deploying a react app in Kubernetes cluster - minikube
JavaScript
1
star
54

bogo-deploy-flaskapp-on-vagrant-via-ansible

1
star
55

Jenkins-Pipeline

1
star
56

aifoci-Meteor

JavaScript
1
star
57

flask-hello-minikube

flask app with Kubernetes (minikube)
Dockerfile
1
star