our-postgresql-setup
ARCHIVED: This configuration is no longer used by GoCardless, see the replacement at https://github.com/gocardless/stolon-pgbouncer
Overview
This repo is an extracted version of how we run PostgreSQL clusters at GoCardless.
It helps you quickly spin up a 3-node cluster of PostgreSQL, managed by Pacemaker, and proxied by PgBouncer.
It's intended as a playground for us, and a learning resource that we wanted to share with the community.
You can hear more about how the cluster works in our talk - Zero-downtime Postgres upgrades.
What's in the cluster?
When you start the cluster, you get 3 nodes, each running:
- PostgreSQL
- Pacemaker
- PgBouncer
All packages are from Ubuntu 14.04, except for PostgreSQL itself, which is at version 9.4.
The cluster is configured with a single primary, one synchronous replica, and one asynchronous replica.
Dependencies
- Virtualbox
- Vagrant
git clone https://github.com/gocardless/our-postgresql-setup.git
- [Recommended] tmux
Getting started
With tmux (recommended)
./tmux-session.sh start
By hand
- On 3 separate windows:
vagrant up pg01 && vagrant ssh pg01
vagrant up pg02 && vagrant ssh pg02
vagrant up pg03 && vagrant ssh pg03
Viewing cluster status
You can run crm_mon -Afr
on any node to see the current state of the cluster and all resources in it. Press ^c
to quit.
Connecting to PostgreSQL
Once the cluster is up, you have two options:
- Connect directly to Postgres on the PostgresqlVIP at 172.28.33.10
- Connect via PgBouncer at 172.28.33.9
Note: The migrator.py script will only give you zero-downtime migrations if you connect via PgBouncer.
Running a zero-downtime migration
- Ensure clients are connected to the PgBouncerVIP.
- Run
/vagrant/migrator.py
on the node that has the PgBouncerVIP (you can find out where the PgBouncerVIP is by viewing the cluster status). - Follow the prompts.
- It is safe to ignore the
Make sure you have the following command ready...
prompt. This is aimed at cases where you'd want to quickly re-enable traffic, and doesn't matter when running locally.
- It is safe to ignore the
- Assuming everything went well, the primary will migrate to the synchronous replica, and the clients won't have received any connection resets.
I have a question!
We're happy to receive questions as issues on this repo, so don't be shy!
It's hard to know exactly what documentation/guidance is useful to people, so we'll use the questions we answer to improve this README and link out to more places you can read up on the technologies we're using.