• This repository has been archived on 16/Aug/2021
  • Stars
    star
    104
  • Rank 330,604 (Top 7 %)
  • Language
    Shell
  • License
    BSD 3-Clause "New...
  • Created over 6 years ago
  • Updated over 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Fully automated database experiments. THIS IS A MIRROR OF https://gitlab.com/postgres.ai/nancy

Postgres.AI   GitHub code size in bytes     CircleCI

⚠️ Nancy CLI project is on hold now. See details: https://gitlab.com/postgres-ai/nancy/-/issues/228

About

Nancy helps to conduct automated database experiments.

The Nancy Command Line Interface is a unified way to manage automated database experiments either in clouds or on-premise.

What is a Database Experiment?

Database experiment is a set of actions performed to test

  • (a) specified SQL queries ("workload")
  • (b) on specified machine / OS / Postgres version ("environment")
  • (c) against specified database ("object")
  • (d) with an optional change – some DDL or config change ("target" or "delta").

Two main goals for any database experiment:

  • (1) validation – check that the specified workload is valid,
  • (2) benchmark – perform deep SQL query analysis.

Database experiments are needed when you:

  • add or remove indexes;
  • for a new DB schema change, want to validate it and estimate migration time;
  • want to verify some query optimization ideas;
  • tune database configuration parameters;
  • do capacity planning and want to stress-test your DB in some environment;
  • plan to upgrade your DBMS to a new major version;
  • want to train ML model related to DB optimization.

Currently Supported Features

  • Works anywhere where Docker can run (checked: Linux Ubuntu/Debian, macOS)
  • Experiments are conducted in a Docker container with extended Postgres setup
  • Supported Postgres versions: 12 (default), 11, 10, 9.6
  • Postgres config specified via options, may be partial
  • Supported locations for experimental runs:
    • Any machine with Docker installed
    • AWS EC2:
      • Run on AWS EC2 Spot Instances (using Docker Machine)
      • Allow to specify EC2 instance type
      • Auto-detect and use current lowest EC2 Spot Instance prices
      • Support i3 instances (with NVMe SSD drives)
      • Support arbitrary-size EBS volumes
  • Support local or remote (S3) files – config, dump, etc
  • The object (database) can be specified in various ways:
    • Plain text
    • Synthetic database generated by pgbench
    • Dump file (.sql, .gz, .bz2)
  • What to test (a.k.a. "target" or "delta"):
    • Test Postgres parameters change
    • Test DDL change (specified as "do" and "undo" SQL to return state)
  • Supported types of workload:
    • Any custom SQL
    • Synthetic workload generated by pgbench
    • "Real workload" based on Postgres logs (using pgreplay)
  • For "real workload", allow replaying it with increased speed
  • Allow to keep container alive for specified time after all steps are done
  • Collected artifacts:
    • pg_stat_statements snapshot
    • pg_stat_database, ...
    • Workload SQL logs
    • Deep SQL query analysis report

Requirements

  1. To use Nancy CLI you need Linux or MacOS with installed Docker.

  2. To run on AWS EC2 instances, you also need:

Installation

In the minimal configuration, only a few steps are needed:

NOTICE: The Additional notes section contains instructions useful in case of docker-related errors during nancy run calls. Alternatively, see Docker's official post-installation instructions for Linux.

  1. Install Docker

Ubuntu/Debian:

sudo apt-get -y install docker
sudo systemctl enable docker
sudo systemctl start docker

RHEL7:

yum -y install docker
systemctl enable docker
systemctl start docker

MacOS (assuming that Homebrew is installed):

brew install docker

See also: https://docs.docker.com/docker-for-mac/install/

  1. Clone this repo and adjust $PATH:
git clone https://gitlab.com/postgres.ai/nancy.git
echo "export PATH=\$PATH:"$(pwd)"/nancy" >> ~/.bashrc
source ~/.bashrc
  1. Install jq
  • Ubuntu/Debian: sudo apt-get -y install jq
  • CentOS/RHEL: sudo yum install jq
  • MacOS: brew install jq

Additionally, to allow use of AWS EC2 instances:

  1. Install AWS CLI https://docs.aws.amazon.com/cli/latest/userguide/installing.html

  2. Install Docker Machine tools https://docs.docker.com/machine/install-machine/

Getting started

Start with these commands:

nancy help
nancy run help

"Hello World!"

Locally, on any Linux or macOS machine:

echo "create table hello_world as select i from generate_series(1, (10^6)::int) _(i);" \
  | bzip2 > ./sample.dump.bz2

# "Clean run": w/o index
# (seqscan is expected, total time ~150ms, depending on resources)
nancy run \
  --db-dump file://$(pwd)/sample.dump.bz2 \
  --workload-custom-sql "select i from hello_world where i between 10 and 20;"

# Now check how a regular btree index affects performance
# (expected total time: ~0.05ms)
nancy run \
  --db-dump file://$(pwd)/sample.dump.bz2 \
  --workload-custom-sql "select i from hello_world where i between 10 and 20;" \
  --delta-sql-do "create index i_hello_world_i on hello_world(i);" \
  --delta-sql-undo "drop index i_hello_world_i;"

AWS EC2:

nancy run \
  --run-on aws \
  --aws-ec2-type "i3.large" \
  --aws-keypair-name awskey \
  --aws-ssh-key-path file://$(echo ~)/.ssh/awskey.pem  \
  --db-dump "create table hello_world as select i from generate_series(1, (10^6)::int) _(i);" \
  --workload-custom-sql "select i from hello_world where i between 10 and 20;"

Additional notes

On Linux, if you experience issues with running (locally) nancy run inside screen or tmux, double-check that Docker is running and add your user to the docker group, as described below. See also: https://docs.docker.com/install/linux/linux-postinstall/.

Ubuntu/Debian:

# Ubuntu/Debian
sudo usermod -aG docker ${USER}
newgrp docker

CentOS/RHEL:

sudo usermod -aG dockerroot ${USER}
newgrp dockerroot

On MacOS, it is recommended to specify --tmp-path explicitly, similar to this:

mkdir ./tmp
nancy run ... --tmp-path "$(pwd)/tmp"