• Stars
    star
    170
  • Rank 223,357 (Top 5 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created almost 10 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

PostgreSQL replication monitoring and failover daemon

pglookout BuildStatus Codecov

pglookout is a PostgreSQL® replication monitoring and failover daemon. pglookout monitors PG database nodes and their replication status and acts according to that status, for example calling a predefined failover command to promote a new primary in case the previous one goes missing.

pglookout supports two different node types, ones that are installed on the db nodes themselves, and observer nodes that can be installed anywhere. The purpose of pglookout on the PostgreSQL DB nodes is to monitor the replication status of the cluster and act accordingly, the observers have a more limited remit: they just observe the cluster status to give another viewpoint to the cluster state.

A single observer can observe any number of PostgreSQL replication clusters simultaneously. This makes it possible to share an observer between multiple replication clusters. In general it is recommended that you run with at least one external observer giving an additional viewpoint on the health of the cluster.

Requirements

pglookout can monitor PostgreSQL versions 10 and above.

pglookout has been developed and tested on modern Linux x86-64 systems, but should work on other platforms that provide the required modules. pglookout is implemented in Python and works with CPython versions 3.9 or newer. pglookout depends on the Requests and Psycopg2 Python modules.

Building

To build an installation package for your distribution, go to the root directory of a pglookout Git checkout and then run:

Debian:

make deb

This will produce a .deb package into the parent directory of the Git checkout.

Fedora:

make rpm

This will produce a .rpm package into rpm/RPMS/noarch/.

Python/Other:

python setup.py bdist_egg

This will produce an egg file into a dist directory within the same folder.

Installation

To install it run as root:

Debian:

dpkg -i ../pglookout*.deb

Fedora:

dnf install rpm/RPMS/noarch/*

On Linux systems it is recommended to simply run pglookout under systemd:

systemctl enable pglookout.service

and eventually after the setup section, you can just run:

systemctl start pglookout.service

Python/Other:

easy_install dist/pglookout-1.4.0-py3.8.egg

On systems without systemd it is recommended that you run pglookout under Supervisor or other similar process control system.

Setup

After this you need to create a suitable JSON configuration file for your installation.

  1. Create a suitable PostgreSQL user account for pglookout:

    CREATE USER pglookout PASSWORD 'putyourpasswordhere';
    
  2. Edit the local pg_hba.conf to allow access for the newly created account to the postgres (or other suitable database of your choice) from the primary, replica and possible observer nodes. While pglookout will only need to run a few builtin functions within the database, it is still recommended to setup a separate empty database for this use. Remember to reload the configuration with either:

    SELECT pg_reload_conf();
    

    or by sending directly a SIGHUP to the PostgreSQL postmaster process.

  3. Fill in the created user account and primary/replica/observer addresses into the configuration file pglookout.json to the section remote_conns.

  4. Create a failover script and add the path to it into the configuration key failover_command. As an example failover script, a shell script that uses IP aliasing is provided in the examples. It is recommended to provide some way to provide STONITH (Shoot The Other Node In The Head) capability in the script. Other common methods of doing the failover and getting DB traffic diverted to the newly promoted primary are the switching of PgBouncer (or other poolers) traffic, or changes in PL/Proxy configuration.

    You should try to run the failover script you provide with pglookout's user privileges to see that it does indeed work.

  5. Now copy the same pglookout.json configuration to the replica and possible observer nodes but you need to edit the configuration on the other nodes so that the own_db configuration variable matches the remote_conns key of the node. For observer nodes, you can leave it as an empty '' value, since they don't have a DB of their own.

Other possible configuration settings are covered in more detail under the Configuration keys section of this README.

  1. If all has been set up correctly up to this point, pglookout should now be ready to be started.

Alert files

Alert files are created whenever an error condition that requires human intervention to solve. You're recommended to add checks for the existence of these files to your alerting system.

authentication_error

There has been a problem in the authentication of at least one of the PostgreSQL connections. This usually denotes either a wrong username/password or incorrect pg_hba.conf settings.

multiple_master_warning

This alert file is created when multiple primaries are detected in the same cluster.

replication_delay_warning

This alert file is created when replication delay goes over the set warning limit. (this is warning is an exception to the rule that human intervention is required. It is only meant as an informative heads up alert that a failover may be imminent. In case the replication delay drops below the warning threshold again, the alert will be removed)

failover_has_happened

This alert file is created whenever the failover command has been issued.

General notes

If correctly installed, pglookout comes with two executables, pglookout and pglookout_current_master that both take as their arguments the path to the node's JSON configuration file.

pglookout is the main process that should be run under systemd or supervisord.

pglookout_current_master is a helper that will simply parse the state file and return which node is the current primary.

While pglookout is running it may be useful to read the JSON state file that exists where json_state_file_path points. The JSON state file is human readable and should give an understandable description of the current state of the cluster which is under monitoring.

Configuration keys

autofollow (default false)

Do you want pglookout to try to start following the new primary. Useful in scenarios where you have a primary and two replicas, primary dies and another replica is promoted. This will allow the remaining replica to start following the new primary. Requires pg_data_directory, pg_start_command and pg_stop_command configuration keys to be set.

db_poll_interval (default 5.0)

Interval on how often should the connections defined in remote_conns be polled for information on DB replication state.

remote_conns (default {})

PG database connection strings that the pglookout process should monitor. Keys of the object should be names of the remotes and values must be valid PostgreSQL connection strings or connection info objects.

primary_conninfo_template

Connection string or connection info object template to use when setting a new primary_conninfo value for recovery.conf after a failover has happened. Any provided hostname and database name in the template is ignored and they are replaced with a replication connection to the new primary node.

Required when autofollow is true.

observers (default {})

This object contains key value pairs like {"1.2.3.4": "http://2.3.4.5:15000"}. They are used to determine the location of pglookout observer processes. Observers are processes that don't take any actions, but simply give a third party viewpoint on the state of the cluster. Useful especially during net splits.

poll_observers_on_warning_only (default False)

this allows observers to be polled only when replication lag is over warning_replication_time_lag

http_address (default "")

HTTP webserver address, by default pglookout binds to all interfaces.

http_port (default 15000)

HTTP webserver port.

replication_state_check_interval (default 10.0)

How often should pglookout check the replication state in order to make decisions on should the node be promoted.

failover_sleep_time (default 0.0)

Time to sleep after a failover command has been issued.

maintenance_mode_file (default "/tmp/pglookout_maintenance_mode_file")

If a file exists in this location, this node will not be considered for promotion to primary.

missing_master_from_config_timeout (default 15)

In seconds the amount of time before we do a failover decision if a previously existing primary has been removed from the config file and we have gotten a SIGHUP.

alert_file_dir (default os.getcwd())

Directory in which alert files for replication warning and failover are created.

json_state_file_path (default "/tmp/pglookout_state.json")

Location of a JSON state file which describes the state of the pglookout process.

max_failover_replication_time_lag (default 120.0)

Replication time lag after which failover_command will be executed and a failover_has_happened file will be created.

warning_replication_time_lag (default 30.0)

Replication time lag at which point to execute over_warning_limit_command and to create a warning file.

failover_command (default "")

Shell command to execute in case the node has deemed itself in need of promotion

known_gone_nodes (default [])

Lists nodes that are explicitly known to have left the cluster. If the old primary is removed in a controlled manner it should be added to this list to ensure there's no extra delay when making promotion decision.

never_promote_these_nodes (default [])

Lists the nodes that will never be considered valid for promotion. As in if you have primary p which fails and replicas a and `b, even if b is ahead but is listed in never_promote_these_nodes, a will be promoted.

over_warning_limit_command (default null)

Shell command to be executed once replication lag is warning_replication_time_lag

own_db

The key of the entry in remote_conns that matches this node.

log_level (default "INFO")

Determines log level of pglookout.

pg_data_directory (default "/var/lib/pgsql/data")

PG data directory that needs to be set when autofollow has been turned on. Note that pglookout needs to have the permissions to write there. (specifically to recovery.conf)

pg_start_command (default "")

Command to start a PostgreSQL process on a node which has autofollow set to true. Usually something like "sudo systemctl start postgresql".

pg_stop_command (default "")

Command to stop a PostgreSQL process on a node which has autofollow set to true. Usually something like "sudo systemctl start postgresql".

syslog (default false)

Determines whether syslog logging should be turned on or not.

syslog_address (default "/dev/log")

Determines syslog address to use in logging (requires syslog to be true as well)

syslog_facility (default "local2")

Determines syslog log facility. (requires syslog to be true as well)

statsd (default: disabled)

Enables metrics sending to a statsd daemon that supports the StatsD / Telegraf syntax with tags.

The value is a JSON object:

{
    "host": "<statsd address>",
    "port": "<statsd port>",
    "tags": {
        "<tag>": "<value>"
    }
}

The tags setting can be used to enter optional tag values for the metrics.

Metrics sending follows the Telegraf spec.

cluster_monitor_health_timeout_seconds (default: 2 * replication_state_check_interval)

If set, it will increase the statsd counter cluster_monitor_health_timeout if the cluster_monitor thread has not successfully completed a check since cluster_monitor_health_timeout_seconds.

failover_on_disconnect (default true)

Determines if we take a fail-over decision if we're not connected to the primary anymore.

License

pglookout is licensed under the Apache License, Version 2.0. Full license text is available in the LICENSE file and at http://www.apache.org/licenses/LICENSE-2.0.txt

Credits

pglookout was created by Hannu Valtonen & the Ohmu team for F-Secure and is now maintained by Aiven developers <[email protected]>.

Recent contributors are listed on the GitHub project page, https://github.com/aiven/pglookout/graphs/contributors

Trademarks

Postgres, PostgreSQL and the Slonik Logo are trademarks or registered trademarks of the PostgreSQL Community Association of Canada, and used with their permission.

Debian, Fedora, Python, Telegraf are trademarks and property of their respective owners. All product and service names used in this website are for identification purposes only and do not imply endorsement.

Contact

Bug reports and patches are very welcome, please post them as GitHub issues and pull requests at https://github.com/aiven/pglookout . Any possible vulnerabilities or other serious issues should be reported directly to the maintainers <[email protected]>.

More Repositories

1

pghoard

PostgreSQL® backup and restore service
Python
1,224
star
2

karapace

Karapace - Your Apache Kafka® essentials in one tool
HTML
313
star
3

python-notebooks-for-apache-kafka

A Series of Notebooks on how to start with Kafka and Python
Jupyter Notebook
120
star
4

terraform-provider-aiven

Aiven Terraform Provider
Go
106
star
5

aiven-client

aiven-client (avn) is the official command-line client for Aiven
Python
84
star
6

pgtracer

Tracing tools for PostgreSQL, using eBPF
Python
83
star
7

myhoard

MySQL Backup and Point-in-time Recovery service
Python
81
star
8

jdbc-connector-for-apache-kafka

Aiven's JDBC Sink and Source Connectors for Apache Kafka®
Java
70
star
9

prometheus-exporter-plugin-for-opensearch

Prometheus exporter plugin for OpenSearch & OpenSearch Mixin
Java
61
star
10

devportal

Resources for users of the projects on the Aiven platform
PLpgSQL
60
star
11

python-fake-data-producer-for-apache-kafka

The Python fake data producer for Apache Kafka® is a complete demo app allowing you to quickly produce JSON fake streaming datasets and push it to an Apache Kafka topic.
Python
59
star
12

gcs-connector-for-apache-kafka

Aiven's GCS Sink Connector for Apache Kafka®
Java
54
star
13

journalpump

systemd journald to aws_cloudwatch, elasticsearch, google cloud logging, kafka, rsyslog or logplex log sender
Python
51
star
14

http-connector-for-apache-kafka

Apache Kafka Connect sink connector for HTTP
Java
50
star
15

transforms-for-apache-kafka-connect

Aiven's collection of Single Message Transformations (SMTs) for Apache Kafka Connect
Java
48
star
16

klaw

Klaw, the latest OS tool by Aiven, helps enterprises cope with Apache Kafka(r) topics, schema registry and connectors governance by introducing roles/authorizations to users of various teams of an org.
Java
48
star
17

opensearch-connector-for-apache-kafka

Aiven's OpenSearch® Connector for Apache Kafka®
Java
44
star
18

s3-connector-for-apache-kafka

Aiven's S3 Sink Connector for Apache Kafka®
Java
43
star
19

aiven-examples

Aiven "getting started" code examples
Python
31
star
20

sql-cli-for-apache-flink-docker

SQL CLI for Apache Flink® via docker-compose
Dockerfile
31
star
21

astacus

Clustered database backup
Python
28
star
22

aiven-operator

Provision and manage Aiven Services from your Kubernetes cluster.
Go
20
star
23

aiven-db-migrate

Python
17
star
24

aiven-go-client

Aiven Golang API Client
Go
17
star
25

tsc-output-parser

Parses errors from tsc output to a structured JSON format
TypeScript
17
star
26

rohmu

Python library for database backups
Python
14
star
27

aiven-extras

Aiven PostgreSQL® extras
PLpgSQL
12
star
28

presentations

Public presentations given by the Aiven staff
Go
12
star
29

metadata-parser

A python tool scraping Aiven services metadata and building a connected graph
Python
12
star
30

tiered-storage-for-apache-kafka

RemoteStorageManager for Apache Kafka® Tiered Storage
Java
10
star
31

guardian-for-apache-kafka

Set of tools for creating backups, compaction and restoration of Apache Kafka® Clusters
HTML
10
star
32

aiven-laravel

Extend Laravel PHP framework to make working with Aiven databases simpler
PHP
8
star
33

aiven-benchmark

Aiven Benchmark Tools
Python
8
star
34

encrypted-repository-opensearch

Client side encryption plugin for Opensearch
Java
8
star
35

thingum-industries

Imaginary manufacturing company event-driven application examples
HTML
8
star
36

aiven-mysql-migrate

MySQL® migration tool
Python
8
star
37

aiven-pg-security

Aiven PostgreSQL® add-on for adding some DBaaS security
C
7
star
38

bigquery-connector-for-apache-flink

Apache Flink® connector for BigQuery
Java
7
star
39

pghostile

Pghostile is a tool to automate the exploitation of PostgreSQL® specific vulnerabilities that could lead to privilege escalation. It can be used to identify security issues in PostgreSQL extensions, to test system hardening and for security research in general.
Python
7
star
40

aiven-docs

The repository for the public documentation.
PLpgSQL
6
star
41

fake-data-producer-for-apache-kafka-docker

Fake Data Producer for Aiven for Apache Kafka® in a Docker Image
Shell
4
star
42

demo-opensearch-python

This repository contains code example in how to write search queries with OpenSearch Python client
Python
4
star
43

klaw-docs

Klaw documentation site
4
star
44

commons-for-apache-kafka-connect

Shared common functionality among Aiven's connectors for Apache Kafka®
Java
4
star
45

remirepo-redis

Shell
3
star
46

auth-for-apache-kafka

Aiven Authentication and Authorization Plugins for Apache Kafka®
Java
3
star
47

mastodon-to-kafka

Bringing data from Mastodon public timeline into Apache Kafka® topic with TypeScript
TypeScript
3
star
48

aiven-kafka-restore

Aiven Kafka Restore Utility
Python
3
star
49

aiven-charts

The official @Aiven helm repository
Smarty
3
star
50

rpm-s3-mirror

Python
2
star
51

elasticsearch-connector-for-apache-kafka

Apache Kafka Connect sink connector for Elasticsearch
Java
2
star
52

opensearch-migration-examples

Code examples showing how to change from Elasticsearch to OpenSearch® library dependencies.
Java
2
star
53

slack-connector-for-apache-flink

Apache Flink® connector for Slack
Java
2
star
54

uptime-conference-2022

Uptime 2022 - the conference on open source data in the cloud.
2
star
55

demo-open-search-node-js

This repository contains code examples from the tutorial on how to use OpenSearch with NodeJS
JavaScript
1
star
56

k8s-logging-demo

Quick getting started guide to send logs from Kubernetes to Elasticsearch
Dockerfile
1
star
57

aiven-repositories-for-opensearch

Java
1
star
58

influxql-to-m3-dashboard-converter

Grafana® dashboard converter tool to convert InfluxQL® using dashboards to use M3 (subset of) PromQL™ instead.
Python
1
star
59

go-api-schemas

A tool for generating and persisting user configuration option schemas from Aiven APIs
Go
1
star
60

cloud-cookie-recipe

The best all-in-one open source cloud cookie recipe for the best all-in-one open source cloud data platform.
1
star