About
Nancy helps to conduct automated database experiments.
The Nancy Command Line Interface is a unified way to manage automated database experiments either in clouds or on-premise.
What is a Database Experiment?
Database experiment is a set of actions performed to test
- (a) specified SQL queries ("workload")
- (b) on specified machine / OS / Postgres version ("environment")
- (c) against specified database ("object")
- (d) with an optional change – some DDL or config change ("target" or "delta").
Two main goals for any database experiment:
- (1) validation – check that the specified workload is valid,
- (2) benchmark – perform deep SQL query analysis.
Database experiments are needed when you:
- add or remove indexes;
- for a new DB schema change, want to validate it and estimate migration time;
- want to verify some query optimization ideas;
- tune database configuration parameters;
- do capacity planning and want to stress-test your DB in some environment;
- plan to upgrade your DBMS to a new major version;
- want to train ML model related to DB optimization.
Currently Supported Features
- Works anywhere where Docker can run (checked: Linux Ubuntu/Debian, macOS)
- Experiments are conducted in a Docker container with extended Postgres setup
- Supported Postgres versions: 12 (default), 11, 10, 9.6
- Postgres config specified via options, may be partial
- Supported locations for experimental runs:
- Any machine with Docker installed
- AWS EC2:
- Run on AWS EC2 Spot Instances (using Docker Machine)
- Allow to specify EC2 instance type
- Auto-detect and use current lowest EC2 Spot Instance prices
- Support i3 instances (with NVMe SSD drives)
- Support arbitrary-size EBS volumes
- Support local or remote (S3) files – config, dump, etc
- The object (database) can be specified in various ways:
- Plain text
- Synthetic database generated by pgbench
- Dump file (.sql, .gz, .bz2)
- What to test (a.k.a. "target" or "delta"):
- Test Postgres parameters change
- Test DDL change (specified as "do" and "undo" SQL to return state)
- Supported types of workload:
- For "real workload", allow replaying it with increased speed
- Allow to keep container alive for specified time after all steps are done
- Collected artifacts:
pg_stat_statements
snapshotpg_stat_database
, ...- Workload SQL logs
- Deep SQL query analysis report
Requirements
-
To use Nancy CLI you need Linux or MacOS with installed Docker.
-
To run on AWS EC2 instances, you also need:
- AWS CLI https://aws.amazon.com/en/cli/
- Docker Machine https://docs.docker.com/machine/
- jq https://stedolan.github.io/jq/
Installation
In the minimal configuration, only a few steps are needed:
NOTICE: The Additional notes section contains
instructions useful in case of docker-related errors during nancy run
calls.
Alternatively, see Docker's official post-installation instructions for Linux.
- Install Docker
Ubuntu/Debian:
sudo apt-get -y install docker
sudo systemctl enable docker
sudo systemctl start docker
RHEL7:
yum -y install docker
systemctl enable docker
systemctl start docker
MacOS (assuming that Homebrew is installed):
brew install docker
See also: https://docs.docker.com/docker-for-mac/install/
- Clone this repo and adjust
$PATH
:
git clone https://gitlab.com/postgres.ai/nancy.git
echo "export PATH=\$PATH:"$(pwd)"/nancy" >> ~/.bashrc
source ~/.bashrc
- Install jq
- Ubuntu/Debian:
sudo apt-get -y install jq
- CentOS/RHEL:
sudo yum install jq
- MacOS:
brew install jq
Additionally, to allow use of AWS EC2 instances:
-
Install AWS CLI https://docs.aws.amazon.com/cli/latest/userguide/installing.html
-
Install Docker Machine tools https://docs.docker.com/machine/install-machine/
Getting started
Start with these commands:
nancy help
nancy run help
"Hello World!"
Locally, on any Linux or macOS machine:
echo "create table hello_world as select i from generate_series(1, (10^6)::int) _(i);" \
| bzip2 > ./sample.dump.bz2
# "Clean run": w/o index
# (seqscan is expected, total time ~150ms, depending on resources)
nancy run \
--db-dump file://$(pwd)/sample.dump.bz2 \
--workload-custom-sql "select i from hello_world where i between 10 and 20;"
# Now check how a regular btree index affects performance
# (expected total time: ~0.05ms)
nancy run \
--db-dump file://$(pwd)/sample.dump.bz2 \
--workload-custom-sql "select i from hello_world where i between 10 and 20;" \
--delta-sql-do "create index i_hello_world_i on hello_world(i);" \
--delta-sql-undo "drop index i_hello_world_i;"
AWS EC2:
nancy run \
--run-on aws \
--aws-ec2-type "i3.large" \
--aws-keypair-name awskey \
--aws-ssh-key-path file://$(echo ~)/.ssh/awskey.pem \
--db-dump "create table hello_world as select i from generate_series(1, (10^6)::int) _(i);" \
--workload-custom-sql "select i from hello_world where i between 10 and 20;"
Additional notes
On Linux, if you experience issues with running (locally) nancy run
inside screen
or
tmux
, double-check that Docker is running and add your user to the docker
group, as described below. See also: https://docs.docker.com/install/linux/linux-postinstall/.
Ubuntu/Debian:
# Ubuntu/Debian
sudo usermod -aG docker ${USER}
newgrp docker
CentOS/RHEL:
sudo usermod -aG dockerroot ${USER}
newgrp dockerroot
On MacOS, it is recommended to specify --tmp-path
explicitly, similar to this:
mkdir ./tmp
nancy run ... --tmp-path "$(pwd)/tmp"