• Stars
    star
    2,787
  • Rank 16,349 (Top 0.4 %)
  • Language
    Ruby
  • License
    MIT License
  • Created almost 9 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Sync data from one Postgres database to another

pgsync

Sync data from one Postgres database to another (like pg_dump/pg_restore). Designed for:

  • speed - tables are transferred in parallel
  • security - built-in methods to prevent sensitive data from ever leaving the server
  • flexibility - gracefully handles schema differences, like missing columns and extra columns
  • convenience - sync partial tables, groups of tables, and related records

🍊 Battle-tested at Instacart

Build Status

Installation

pgsync is a command line tool. To install, run:

gem install pgsync

This will give you the pgsync command. If installation fails, you may need to install dependencies.

You can also install it with Homebrew:

brew install pgsync

Setup

In your project directory, run:

pgsync --init

This creates .pgsync.yml for you to customize. We recommend checking this into your version control (assuming it doesn’t contain sensitive information). pgsync commands can be run from this directory or any subdirectory.

How to Use

First, make sure your schema is set up in both databases. We recommend using a schema migration tool for this, but pgsync also provides a few convenience methods. Once that’s done, you’re ready to sync data.

Sync tables

pgsync

Sync specific tables

pgsync table1,table2

Works with wildcards as well

pgsync "table*"

Sync specific rows (existing rows are overwritten)

pgsync products "where store_id = 1"

You can also preserve existing rows

pgsync products "where store_id = 1" --preserve

Or truncate them

pgsync products "where store_id = 1" --truncate

Tables

Exclude specific tables

pgsync --exclude table1,table2

Add to .pgsync.yml to exclude by default

exclude:
  - table1
  - table2

Sync tables from all schemas or specific schemas (by default, only the search path is synced)

pgsync --all-schemas
# or
pgsync --schemas public,other
# or
pgsync public.table1,other.table2

Groups

Define groups in .pgsync.yml:

groups:
  group1:
    - table1
    - table2

And run:

pgsync group1

Variables

You can also use groups to sync a specific record and associated records in other tables.

To get product 123 with its reviews, last 10 coupons, and store, use:

groups:
  product:
    products: "where id = {1}"
    reviews: "where product_id = {1}"
    coupons: "where product_id = {1} order by created_at desc limit 10"
    stores: "where id in (select store_id from products where id = {1})"

And run:

pgsync product:123

Schema

Sync the schema before the data (this wipes out existing data)

pgsync --schema-first

Specify tables

pgsync table1,table2 --schema-first

Sync the schema without data (this wipes out existing data)

pgsync --schema-only

pgsync does not try to sync Postgres extensions.

Sensitive Data

Prevent sensitive data like email addresses from leaving the remote server.

Define rules in .pgsync.yml:

data_rules:
  email: unique_email
  last_name: random_letter
  birthday: random_date
  users.auth_token:
    value: secret
  visits_count:
    statement: "(RANDOM() * 10)::int"
  encrypted_*: null

last_name matches all columns named last_name and users.last_name matches only the users table. Wildcards are supported, and the first matching rule is applied.

Options for replacement are:

  • unique_email
  • unique_phone
  • unique_secret
  • random_letter
  • random_int
  • random_date
  • random_time
  • random_ip
  • value
  • statement
  • null
  • untouched

Rules starting with unique_ require the table to have a single column primary key. unique_phone requires a numeric primary key.

Foreign Keys

Foreign keys can make it difficult to sync data. Three options are:

  1. Defer constraints (recommended)
  2. Manually specify the order of tables
  3. Disable foreign key triggers, which can silently break referential integrity (not recommended)

To defer constraints, use:

pgsync --defer-constraints

To manually specify the order of tables, use --jobs 1 so tables are synced one-at-a-time.

pgsync table1,table2,table3 --jobs 1

To disable foreign key triggers and potentially break referential integrity, use:

pgsync --disable-integrity

This requires superuser privileges on the to database. If syncing to (not from) Amazon RDS, use the rds_superuser role. If syncing to (not from) Heroku, there doesn’t appear to be a way to disable integrity.

Triggers

Disable user triggers with:

pgsync --disable-user-triggers

Sequences

Skip syncing sequences with:

pgsync --no-sequences

Append-Only Tables

For extremely large, append-only tables, sync in batches.

pgsync large_table --in-batches

The script will resume where it left off when run again, making it great for backfills.

Connection Security

Always make sure your connection is secure when connecting to a database over a network you don’t fully trust. Your best option is to connect over SSH or a VPN. Another option is to use sslmode=verify-full. If you don’t do this, your database credentials can be compromised.

Safety

To keep you from accidentally overwriting production, the destination is limited to localhost or 127.0.0.1 by default.

To use another host, add to_safe: true to your .pgsync.yml.

Multiple Databases

To use with multiple databases, run:

pgsync --init db2

This creates .pgsync-db2.yml for you to edit. Specify a database in commands with:

pgsync --db db2

Integrations

Django

If you run pgsync --init in a Django project, migrations will be excluded in .pgsync.yml.

exclude:
  - django_migrations

Heroku

If you run pgsync --init in a Heroku project, the from database will be set in .pgsync.yml.

from: $(heroku config:get DATABASE_URL)?sslmode=require

Laravel

If you run pgsync --init in a Laravel project, migrations will be excluded in .pgsync.yml.

exclude:
  - migrations

Rails

If you run pgsync --init in a Rails project, Active Record metadata and schema migrations will be excluded in .pgsync.yml.

exclude:
  - ar_internal_metadata
  - schema_migrations

Debugging

To view the SQL that’s run, use:

pgsync --debug

Other Commands

Help

pgsync --help

Version

pgsync --version

List tables

pgsync --list

Scripts

Use groups when possible to take advantage of parallelism.

For Ruby scripts, you may need to do:

Bundler.with_unbundled_env do
  system "pgsync ..."
end

Docker

Get the Docker image with:

docker pull ankane/pgsync
alias pgsync="docker run -ti ankane/pgsync"

This will give you the pgsync command.

Dependencies

If installation fails, your system may be missing Ruby or libpq.

On Mac, run:

brew install libpq

On Ubuntu, run:

sudo apt-get install ruby-dev libpq-dev build-essential

Upgrading

Run:

gem install pgsync

To use master, run:

gem install specific_install
gem specific_install https://github.com/ankane/pgsync.git

With Homebrew, run:

brew upgrade pgsync

With Docker, run:

docker pull ankane/pgsync

Related Projects

Also check out:

  • Dexter - The automatic indexer for Postgres
  • PgHero - A performance dashboard for Postgres
  • pgslice - Postgres partitioning as easy as pie

Thanks

Inspired by heroku-pg-transfer.

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

To get started with development:

git clone https://github.com/ankane/pgsync.git
cd pgsync
bundle install

createdb pgsync_test1
createdb pgsync_test2
createdb pgsync_test3

bundle exec rake test

More Repositories

1

pghero

A performance dashboard for Postgres
Ruby
7,123
star
2

searchkick

Intelligent search made easy
Ruby
6,257
star
3

chartkick

Create beautiful JavaScript charts with one line of Ruby
Ruby
6,157
star
4

blazer

Business intelligence made simple
Ruby
4,511
star
5

ahoy

Simple, powerful, first-party analytics for Rails
Ruby
3,872
star
6

strong_migrations

Catch unsafe migrations in development
Ruby
3,662
star
7

groupdate

The simplest way to group temporal data
Ruby
3,617
star
8

the-ultimate-guide-to-ruby-timeouts

Timeouts for popular Ruby gems
Ruby
2,212
star
9

production_rails

Best practices for running Rails in production
1,975
star
10

dexter

The automatic indexer for Postgres
Ruby
1,491
star
11

lockbox

Modern encryption for Ruby and Rails
Ruby
1,290
star
12

chartkick.js

Create beautiful charts with one line of JavaScript
JavaScript
1,211
star
13

react-chartkick

Create beautiful JavaScript charts with one line of React
JavaScript
1,183
star
14

pretender

Log in as another user in Rails
Ruby
1,124
star
15

ahoy_email

First-party email analytics for Rails
Ruby
1,051
star
16

secure_rails

Rails security best practices
954
star
17

pgslice

Postgres partitioning as easy as pie
Ruby
953
star
18

mailkick

Email subscriptions for Rails
Ruby
847
star
19

vue-chartkick

Create beautiful JavaScript charts with one line of Vue
JavaScript
747
star
20

eps

Machine learning for Ruby
Ruby
609
star
21

awesome-legal

Awesome free legal documents for companies
589
star
22

searchjoy

Search analytics made easy
Ruby
579
star
23

polars-ruby

Blazingly fast DataFrames for Ruby
Ruby
563
star
24

torch.rb

Deep learning for Ruby, powered by LibTorch
Ruby
552
star
25

blind_index

Securely search encrypted database fields
Ruby
470
star
26

safely

Rescue and report exceptions in non-critical code
Ruby
470
star
27

authtrail

Track Devise login activity
Ruby
466
star
28

multiverse

Multiple databases for Rails 🎉
Ruby
463
star
29

ahoy.js

Simple, powerful JavaScript analytics
JavaScript
463
star
30

hightop

A nice shortcut for group count queries
Ruby
462
star
31

field_test

A/B testing for Rails
Ruby
460
star
32

s3tk

A security toolkit for Amazon S3
Python
439
star
33

disco

Recommendations for Ruby and Rails using collaborative filtering
Ruby
431
star
34

active_median

Median and percentile for Active Record, Mongoid, arrays, and hashes
Ruby
427
star
35

informers

State-of-the-art natural language processing for Ruby
Ruby
417
star
36

notable

Track notable requests and background jobs
Ruby
402
star
37

shorts

Short, random tutorials and posts
379
star
38

tensorflow-ruby

Deep learning for Ruby
Ruby
350
star
39

distribute_reads

Scale database reads to replicas in Rails
Ruby
328
star
40

slowpoke

Rack::Timeout enhancements for Rails
Ruby
327
star
41

prophet-ruby

Time series forecasting for Ruby
Ruby
321
star
42

rover

Simple, powerful data frames for Ruby
Ruby
311
star
43

groupdate.sql

The simplest way to group temporal data
PLpgSQL
280
star
44

kms_encrypted

Simple, secure key management for Lockbox and attr_encrypted
Ruby
235
star
45

jetpack

A friendly package manager for R
R
234
star
46

neighbor

Nearest neighbor search for Rails and Postgres
Ruby
230
star
47

rollup

Rollup time-series data in Rails
Ruby
230
star
48

hypershield

Shield sensitive data in Postgres and MySQL
Ruby
227
star
49

logstop

Keep personal data out of your logs
Ruby
218
star
50

pdscan

Scan your data stores for unencrypted personal data (PII)
Go
213
star
51

delete_in_batches

Fast batch deletes for Active Record and Postgres
Ruby
202
star
52

vega-ruby

Interactive charts for Ruby, powered by Vega and Vega-Lite
Ruby
192
star
53

mapkick

Create beautiful JavaScript maps with one line of Ruby
Ruby
173
star
54

dbx

A fast, easy-to-use database library for R
R
171
star
55

fastText-ruby

Efficient text classification and representation learning for Ruby
Ruby
162
star
56

autosuggest

Autocomplete suggestions based on what your users search
Ruby
162
star
57

swipeout

Swipe-to-delete goodness for the mobile web
JavaScript
159
star
58

pghero.sql

Postgres insights made easy
PLpgSQL
154
star
59

mainstreet

Address verification for Ruby and Rails
Ruby
149
star
60

or-tools-ruby

Operations research tools for Ruby
Ruby
139
star
61

mapkick.js

Create beautiful, interactive maps with one line of JavaScript
JavaScript
138
star
62

trend-ruby

Anomaly detection and forecasting for Ruby
Ruby
128
star
63

mitie-ruby

Named-entity recognition for Ruby
Ruby
122
star
64

barkick

Barcodes made easy
Ruby
120
star
65

ownership

Code ownership for Rails
Ruby
111
star
66

anomaly

Easy-to-use anomaly detection for Ruby
Ruby
98
star
67

errbase

Common exception reporting for a variety of services
Ruby
87
star
68

tokenizers-ruby

Fast state-of-the-art tokenizers for Ruby
Rust
81
star
69

ip_anonymizer

IP address anonymizer for Ruby and Rails
Ruby
79
star
70

str_enum

String enums for Rails
Ruby
75
star
71

faiss-ruby

Efficient similarity search and clustering for Ruby
C++
73
star
72

trend-api

Anomaly detection and forecasting API
R
71
star
73

archer

Rails console history for Heroku, Docker, and more
Ruby
70
star
74

onnxruntime-ruby

Run ONNX models in Ruby
Ruby
70
star
75

xgboost-ruby

High performance gradient boosting for Ruby
Ruby
69
star
76

secure-spreadsheet

Encrypt and password protect sensitive CSV and XLSX files
JavaScript
66
star
77

active_hll

HyperLogLog for Rails and Postgres
Ruby
66
star
78

guess

Statistical gender detection for Ruby
Ruby
60
star
79

morph

An encrypted, in-memory, key-value store
C++
59
star
80

lightgbm-ruby

High performance gradient boosting for Ruby
Ruby
56
star
81

midas-ruby

Edge stream anomaly detection for Ruby
Ruby
54
star
82

moves

Ruby client for Moves
Ruby
54
star
83

blingfire-ruby

High speed text tokenization for Ruby
Ruby
54
star
84

vowpalwabbit-ruby

Fast online machine learning for Ruby
Ruby
52
star
85

xlearn-ruby

High performance factorization machines for Ruby
Ruby
51
star
86

tomoto-ruby

High performance topic modeling for Ruby
C++
51
star
87

trove

Deploy machine learning models in Ruby (and Rails)
Ruby
50
star
88

mapkick-static

Create beautiful static maps with one line of Ruby
Ruby
42
star
89

ahoy_events

Simple, powerful event tracking for Rails
Ruby
42
star
90

practical-search

Let’s make search a better experience for our users
40
star
91

breakout-ruby

Breakout detection for Ruby
Ruby
40
star
92

plu

Price look-up codes made easy
Ruby
40
star
93

ngt-ruby

High-speed approximate nearest neighbors for Ruby
Ruby
39
star
94

gindex

Concurrent index migrations for Rails
Ruby
39
star
95

clockwork_web

A web interface for Clockwork
Ruby
38
star
96

ahoy_guide

A foundation of knowledge and libraries for solid analytics
38
star
97

notable_web

A web interface for Notable
HTML
36
star
98

AnomalyDetection.rb

Time series anomaly detection for Ruby
Ruby
34
star
99

khiva-ruby

High-performance time series algorithms for Ruby
Ruby
34
star
100

immudb-ruby

Ruby client for immudb, the immutable database
Ruby
34
star