• This repository has been archived on 08/Jan/2024
  • Stars
    star
    112
  • Rank 301,101 (Top 7 %)
  • Language
    Go
  • License
    MIT License
  • Created almost 6 years ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Postgres load testing tool

pgreplay-go CircleCI

See a discussion of building this tool at https://blog.lawrencejones.dev/building-a-postgresql-load-tester/

This tool is a different take on the existing pgreplay project. Where pgreplay will playback Postgres log files while respecting relative chronological order, pgreplay-go plays back statements at approximately the same rate they were originally sent to the database.

When benchmarking database performance, it is better for a replay tool to continue sending traffic than to halt execution until a strangling query has complete. If your new cluster isn't performing, users won't politely wait for your service to catch-up before issuing new commands.

Benchmark strategy

You have an existing cluster and want to trial new hardware/validate configuration changes/move infrastructure providers. Your production services depend on ensuring the new change is safe and doesn't degrade performance.

1. Configure production logging

First capture the logs from your running Postgres instance. You may need to add additional storage to your machine before starting- we often attach a new disk and change our log location to that disk for experiments.

You can turn on logging in a compatible format like so:

ALTER SYSTEM SET log_directory='/postgres-logs';
ALTER SYSTEM SET log_connections='on';
ALTER SYSTEM SET log_disconnections='on';
ALTER SYSTEM SET log_line_prefix='%m|%u|%d|%c|';
ALTER SYSTEM SET log_min_error_statement='log';
ALTER SYSTEM SET log_min_messages='error';
ALTER SYSTEM SET log_statement='all';
ALTER SYSTEM SET log_min_duration_statement=0; 
SELECT pg_reload_conf();

2. Take snapshot

Now we're emitting logs we need to snapshot the database so that we can later restore it to the same moment in time on our benchmark clusters. If you're running in a cloud provider with disk snapshot facilities then this is likely the easiest of options (remember to checkpoint first, to reduce recovery time) but you can also achieve this using pg_basebackup and point-in-time recovery configuration, or by taking a physical copy of a paused replica.

Whatever method used must produce an image that can be restored into new machines later.

3. Extract and process logs

Once you've captured logs for your desired benchmark window, you can optionally pre-process them to create a more realistic sample. Complex database interactions are likely to have transactions that may fail when we play them back out-of-order. Recalling that our primary goal is to create representative load on the database, not a totally faithful replay, we suggest applying the following filters:

$ cat postgresql.log \
| pv --progress --rate --size "$(du postgresql.log | cut -f1)" \
| grep -v "LOG:  statement: BEGIN" \
| grep -v "LOG:  statement: COMMIT" \
| grep -v "LOG:  statement: ROLLBACK TO SAVEPOINT" \
| grep -v "LOG:  statement: SAVEPOINT" \
| grep -v "LOG:  statement: RELEASE SAVEPOINT" \
| grep -v "LOG:  statement: SET LOCAL" \
| sed 's/pg_try_advisory_lock/bool/g' \
| sed 's/pg_advisory_unlock/pg_advisory_unlock_shared/g' \
> postgresql-filtered.log

By removing transactions we avoid skipping work if any of the transaction queries were to fail - the same goes for savepoints. We remove any SET LOCAL statements, as having removed transactions these configuration settings would be present for the duration of the connection. We then modify any advisory lock queries to be shared, preventing us from needlessly blocking while still requiring the database to perform similar levels of work as the exclusive locking.

These transformations mean our replayed queries won't exactly simulate what we saw in production, but that's why we'll compare the performance of these filtered logs against the two clusters rather than the original performance of the production cluster.

4. pgreplay-go against copy of production cluster

Now create a copy of the original production cluster using the snapshot from (2). The aim is to have a cluster that exactly replicates production, providing a reliable control for our experiment.

The goal of this run will be to output Postgres logs that can be parsed by pgBadger to provide an analysis of the benchmark run. See the pgBadger readme for details, or apply the following configuration for defaults that will work:

ALTER SYSTEM SET log_min_duration_statement = 0;
ALTER SYSTEM SET log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d,app=%a,client=%h '
ALTER SYSTEM SET log_checkpoints = on;
ALTER SYSTEM SET log_connections = on;
ALTER SYSTEM SET log_disconnections = on;
ALTER SYSTEM SET log_lock_waits = on;
ALTER SYSTEM SET log_temp_files = 0;
ALTER SYSTEM SET log_autovacuum_min_duration = 0;
ALTER SYSTEM SET log_error_verbosity = default;
SELECT pg_reload_conf();

Run the benchmark using the pgreplay-go binary from the primary Postgres machine under test. pgreplay-go requires minimal system resource and is unlikely to affect the benchmark when running on the same machine, though you can run it from an alternative location if this is a concern. An example run would look like this:

$ pgreplay \
    --errlog-file /data/postgresql-filtered.log \
    --host 127.0.0.1 \
    --metrics-address 0.0.0.0 \
    --start <benchmark-start-time> \
    --finish <benchmark-finish-time>

If you run Prometheus then pgreplay-go exposes a metrics that can be used to report progress on the benchmark. See Observability for more details.

Once the benchmark is complete, store the logs somewhere for safekeeping. We usually upload the logs to Google cloud storage, though you should use whatever service you are most familiar with.

5. pgreplay-go against new cluster

After provisioning your new candidate cluster, with the hardware/software changes you wish to test, we repeat step (4) for our new cluster. You should use exactly the same pgreplay logs and run the benchmark for the same time-window.

As in step (4), upload your logs in preparation for the next step.

6. User pgBadger to compare performance

We use pgBadger to perform analysis of our performance during the benchmark, along with taking measurements from the node_exporter and postgres_exporter running during the experiment.

pgBadger reports can be used to calculate a query duration histogram for both clusters - these can indicate general speed-up/degradation. Digging into specific queries and the worst performers can provide more insight into which type of queries have degraded, and explaining those query plans on your clusters can help indicate what might have caused the change.

This is the least prescriptive part of our experiment, and answering whether the performance changes are acceptable - and what they may be - will depend on your knowledge of the applications using your database. We've found pgBadger to provide sufficient detail for us to be confident in answering this question, and hope you do too.

Observability

Running benchmarks can be a long process. pgreplay-go provides Prometheus metrics that can help determine how far through the benchmark has progressed, along with estimating how long remains.

Hooking these metrics into a Grafana dashboard can give the following output:

pgreplay-go Grafana dashboard

We'd suggest that you integrate these panels into your own dashboard; for example by showing them alongside key PostgreSQL metrics such as transaction commits, active backends and buffer cache hit ratio, as well as node-level metrics to show CPU and IO saturation.

A sample dashboard with the pgreplay-go-specific panels has been provided that may help you get started. Import it into your Grafana dashboard by downloading the dashboard JSON file.

Types of Log

Simple

This is a basic query that is executed directly against Postgres with no other steps.

2010-12-31 10:59:52.243 UTC|postgres|postgres|4d1db7a8.4227|LOG:  statement: set client_encoding to 'LATIN9'

Prepared statement

2010-12-31 10:59:57.870 UTC|postgres|postgres|4d1db7a8.4227|LOG:  execute einf"ug: INSERT INTO runtest (id, c, t, b) VALUES ($1, $2, $3, $4)
2010-12-31 10:59:57.870 UTC|postgres|postgres|4d1db7a8.4227|DETAIL:  parameters: $1 = '6', $2 = 'mit    Tabulator', $3 = '2050-03-31 22:00:00+00', $4 = NULL

More Repositories

1

statesman

A statesmanlike state machine library.
Ruby
1,700
star
2

angularjs-style-guide

AngularJS style guide used at GoCardless
1,446
star
3

business

Ruby business day calculations
Ruby
494
star
4

http-api-design

HTTP Design Guidelines
413
star
5

airflow-dbt

Apache Airflow integration for dbt
Python
367
star
6

es6-angularjs

JavaScript
180
star
7

coach

Alternative controllers with middleware
Ruby
164
star
8

logjam

a log shipping tool
Go
134
star
9

our-postgresql-setup

PostgreSQL clustering with corosync/pacemaker test environment
Shell
126
star
10

nandi

Fear free PostgreSQL migrations for Rails
Ruby
123
star
11

activerecord-safer_migrations

Safer ActiveRecord migrations for Postgres
Ruby
116
star
12

amqpc

AMQP CLI tool
Go
115
star
13

rspec-activejob

RSpec matchers for testing ActiveJob
Ruby
98
star
14

ibandit

Convert national banking details into IBANs, and vice-versa.
Ruby
95
star
15

gocardless-pro-php

GoCardless Pro PHP Client
PHP
89
star
16

gocardless-legacy-php

The PHP client library for the GoCardless Legacy API
PHP
66
star
17

stolon-pgbouncer

Add-on to stolon for providing zero-downtime failover and PgBouncer integration
Go
59
star
18

gocardless-legacy-ruby

The Ruby client library for the GoCardless API
Ruby
52
star
19

gc-http-factory

A factory for creating $http services in Angular.
JavaScript
44
star
20

draupnir

Anonymised database instances as-a-service
Go
41
star
21

bump

Automated dependency management for Ruby, Python and Javascript
Ruby
35
star
22

gocardless-pro-python

GoCardless Pro Python Client
Python
33
star
23

stubby

Your favourite pretender stubber
JavaScript
29
star
24

gocardless-pro-ruby

GoCardless Pro Ruby Client
Ruby
27
star
25

gocardless-dotnet

GoCardless .NET Client
C#
25
star
26

utopia-getting-started

Sharing a copy of our getting-started tutorial, as a demonstration of how our infrastructure works with utopia
24
star
27

resque-sentry

A Resque failure backend that sends errors to Sentry
Ruby
23
star
28

statesman-events

Event support for Statesman (UNMAINTAINED)
Ruby
23
star
29

prius

Environmentally-friendly application config
Ruby
22
star
30

systemjs-assetgraph

AssetGraph transform for optimizing SystemJS pages for production
JavaScript
22
star
31

gocardless-legacy-python

The Python client library for the GoCardless API
Python
22
star
32

theatre

GoCardless' collection of Kubernetes extensions
Go
20
star
33

gocardless-nodejs

GoCardless Node.js client
TypeScript
19
star
34

javascript-style-guide

The GoCardless JavaScript styleguide
18
star
35

pgsql-cluster-manager

Daemon and migration tool that manages Postgres cluster using etcd/corosync/pacemaker
Go
17
star
36

anony

A small library that defines how ActiveRecord models should be anonymised for deletion purposes.
Ruby
17
star
37

drydock

DryDock is a utility to clean up Docker images
Go
16
star
38

gocardless-pro-java

GoCardless Pro Java Client
Java
15
star
39

ng-gc-components

JavaScript
14
star
40

slo-builder

Templates for building SLOs with Prometheus rules and alerts
Go
14
star
41

legacy-api-docs

Docs for GoCardless legacy API
JavaScript
14
star
42

business-python

Python business day calculations
Python
13
star
43

gocardless-legacy-dotnet

The .NET client library for the GoCardless Legacy API
C#
12
star
44

companies-house-rest

Ruby wrapper for the Companies House REST API.
Ruby
12
star
45

gocardless-legacy-node

The Node.js client library for the GoCardless Legacy API
JavaScript
11
star
46

html-style-guide

How we write HTML at GoCardless
11
star
47

bump-core

The core logic powering Bump
Ruby
11
star
48

atum

Ruby HTTP client generator for APIs represented with JSON schema
Ruby
9
star
49

gocardless-legacy-java

The Java client library for the GoCardless Legacy API
Java
9
star
50

codeigniter-gocardless

The CodeIgniter spark for the GoCardless API
PHP
8
star
51

logsearch

Search Logstash / Elasticsearch logs from the command line
Go
7
star
52

sample-legacy-django-app

A sample Django app demonstrating the use of the GoCardless Legacy API and the Python client.
Python
7
star
53

companies-house-gateway-ruby

Ruby wrapper for the Companies House XML Gateway
Ruby
6
star
54

airflow-looker

A collection of Airflow extensions to provide integration with Looker
Python
6
star
55

uk_phone_numbers

A Ruby library for validating and formatting UK phone numbers.
Ruby
6
star
56

callcredit-ruby

Ruby wrapper for Callcredit's CallValidate API
Ruby
6
star
57

sample-legacy-rails-app

A sample Rails app demonstrating the use of GoCardless Legacy API and the Ruby client.
Ruby
6
star
58

react-dropin

React bindings for the GoCardless Dropin checkout flow
TypeScript
5
star
59

creditsafe-ruby

Ruby library for the Creditsafe SOAP API
Ruby
5
star
60

github-archive

Easy way to archive an entire organisation repos on S3
Go
5
star
61

gocardless-pro-go

Go
5
star
62

simple-swag

Dead simple swagger/openapi docs server
Go
3
star
63

bank-webfont

A webfont of prominent UK banks
CSS
3
star
64

cli-releases

Release repo for the gocardless cli
Dockerfile
3
star
65

gc_ruboconfig

GoCardless Engineering shared rubocop config
Ruby
3
star
66

gocardless-pro-ruby-example

Example of using the GoCardless Pro Ruby client library
HTML
3
star
67

gocardless-legacy-example-django

An example site using the GoCardless Legacy API
Python
3
star
68

belongs-to-one-of

Gem to support activemodel relations where one model can be a child of one of many models
Ruby
3
star
69

coach-demo

Ruby
3
star
70

gocardless-php

Placeholder explaining our PHP API libraries.
2
star
71

prometheus-client-ruby-data-stores-experiments

Ruby
2
star
72

que

A Ruby job queue that uses PostgreSQL's advisory locks for speed and reliability.
Ruby
2
star
73

slackify

Update your Slack status with what you're listening to in Spotify
Elixir
2
star
74

slackify-dot-rb

Take 2: this time without a language barrier
Ruby
1
star
75

gocardless-pro-java-maven-example

An example java app that handles webhooks
Java
1
star
76

open-charities

A Ruby library for querying the OpenCharities database
Ruby
1
star
77

bucket-store

Helper library to access cloud storage services
Ruby
1
star
78

publish-techdocs-action

Action to generate and publish TechDocs
Shell
1
star
79

passfort

Ruby client library for the PassFort API
Ruby
1
star
80

salesforce_wrapper

A wrapper around Restforce, catching exceptions and performing a configurable action with them (e.g. sending an email).
Ruby
1
star
81

gocardless-pro-java-example

Example of using the GoCardless Pro Java client library
Java
1
star
82

homebrew-taps

Ruby
1
star
83

gocardless-legacy-partner-example-ruby

An example GoCardless partner app, written in Sinatra
Ruby
1
star
84

gocardless-pro-php-demo

Pro client PHP demo
PHP
1
star
85

rspec-que

RSpec matchers for testing Que
Ruby
1
star