• Stars
    star
    161
  • Rank 233,470 (Top 5 %)
  • Language
    Python
  • Created almost 13 years ago
  • Updated almost 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

PG Extractor - Advanced PostgreSQL Dump Filter

PGExtractor is both a command line script and python class that can be used to provide more finely detailed filtering options for PostgreSQL's pg_dump program. Each database object is extracted into its own file organized into folders by type. This makes having the database schema as a reference easier and allows for better checking into version control. By default data is not dumped, but is easily done with a single option and can be done as plaintext or pg_dump's custom format. Object filtering can be done directly via command line options or fed in with external text files. Regex pattern matching is also possible.

See --help & --examples for a full list of available options and how to use the script.

The script only uses pg_dump/all to touch the database. pg_restore is only used for generating ddl and does not ever touch the database.

This script natively requires Python 3. The 3to2 script can be used to allow it work work with Python 2.7, but it will not always be guarenteed to work. https://pypi.python.org/pypi/3to2

$ 3to2 -w pg_extractor.py

Python 3 was chosen for its more consistent treatment of plaintext and binary file formats. Since this is a text processing script, that consistency makes development easier and more predictable. Also, Python 3 has been out since 2008 and all major OS distributions have packages available, so I'm doing my small part to help drive adoption to the new major version.

Several of the class methods are public and can be used to inspect a custom format binary dump file or apply some of the editing options.

Python 3.3.1 (default, Sep 25 2013, 19:29:01) 
[GCC 4.7.3] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from pg_extractor import PGExtractor
>>> p = PGExtractor()
>>> object_list = p.build_main_object_list("dumpfile.pgr")
>>> table_object_list = p.build_type_object_list(object_list, ['TABLE'])
>>> for t in table_object_list:
...     print(t)
... 
{'objname': 'job_detail_p0', 'objid': '238; 1259 596233', 'objowner': 'keith', 'objtype': 'TABLE', 'objschema': 'jobmon'}
{'objname': 'job_detail_p10', 'objid': '239; 1259 596244', 'objowner': 'keith', 'objtype': 'TABLE', 'objschema': 'jobmon'}
...

Remove the password hashes from an existing "pg_dumpall -r" roles file:

>>> p.remove_passwords("pg_dumpall_roles.sql")

New Version 2.x

Version 2.x is a complete rewrite of PG Extractor in python. Most of the configuration options are the same, but many have been changed for clarity, so please check the --help.

Non-compatibilities with 1.x to be aware of when dropping in 2.x to replace it

  • Requires Python 3
  • The "hostname" is no longer a default part of the directory structure created. If this is still desired, set the --hostnamedir option with whatever the existing directory is.
  • Built in version control options are gone. They were rather fragile options and could easily lead to a whole lot of things getting checked into version control that should not have been. I've found it's easier (and safer) to manage version control check-ins separately. If these are really wanted please create an Issue on github and I'll consider it if there's enough interest.
  • Removed --rolesdir option

New features:

  • Full Python 3 class object with public methods that may possibly be useful on existing dump files
  • --jobs option to allow parallel object extraction
  • --remove_passwords option can remove the password hashes from an extracted roles file
  • --getdefaultprivs extracts the default privileges set for any roles that used ALTER DEFAULT PRIVILEGES
  • --delete cleans up empty folders properly
  • --wait option to allow a pause in object extraction. Helps reduce load when data is included in extraction.
  • --temp option to allow setting custom temporary working space
  • Sequences files can now include the statement to set the current value if data is output
  • Better support for when objects have mixed case names or special characters. Special characters in an object name turn into ,hexcode, to allow a valid system filename.
  • Rules & Triggers on views are now always included in the view file itself properly.

The version 1.x series written in perl will no longer be developed. Only bug fixes to the existing code will be accepted.

More Repositories

1

jsend

JSend is a specification for a simple, no-frills, JSON based format for application-level communication.
1,468
star
2

pg_amqp

AMQP Support for Postgres
C
203
star
3

omnipitr

Advanced WAL File Management Tools for PostgreSQL
Perl
177
star
4

pg_jobmon

PostgreSQL extension which provides persistent logging within transactions and functions.
PLpgSQL
168
star
5

pgtreats

Tasty treats for PostgreSQL
Perl
111
star
6

ansible-dk

An omnibus-based toolkit for working on Ansible-based infrastructure code.
Ruby
85
star
7

mimeo

Extension for specialized, per-table replication between PostgreSQL instances
PLpgSQL
78
star
8

jlog

JLog - Journaled Log
C
53
star
9

zetaback

Zetaback is a thin-agent based ZFS backup tool. It is designed to help simplify the task of backing up thousands of filesystems on hundreds of machines across and organization. It simplifies the task of automatically picking up newly created ZFS filesystems for backup purposes and restoring any given backup (host/FS/timestamp) to a target ZFS capable host.
Perl
39
star
10

omnios-build

Build system for OmniOS - Note, this is a quasi-private archive for OmniTI, you probably want https://omniosce.org
Shell
39
star
11

curo

postgres command-line tools
Shell
38
star
12

portableumem

This is a port of the Solaris umem memory allocator to other popular operating systems, such as Linux, Windows and BSDish systems (including Darwin/OSX).
C
30
star
13

Net--RabbitMQ

Perl bindings to the librabbitmq-c AMQP library.
C
25
star
14

resmon

Resmon is a lightweight utility for local host monitoring that can be queried by tools such as nagios over http. One of the main design goals is portability: that resmon should require nothing more than a default install of Perl. Built with the philosophy that "we are smart because we are dumb," that is, local requirements should be minimal to ease deployment on multiple platforms.
Perl
25
star
15

pg_query_statsd

Send query statistics to statsd
C
19
star
16

chef-solo-helper

Helper scripts for running chef-solo with a git backed repository
Shell
16
star
17

system_monitoring

Script for system monitoring
Perl
13
star
18

circus2

Various helper tools making use of the circonus v2 api
Python
6
star
19

db-failover

Postgres Failover Script
Perl
6
star
20

kayak

Kayak (PXE-enabled network imaging of OmniOS) - Note, this is a quasi-private archive for OmniTI, you probably want https://omniosce.org
Shell
6
star
21

circus

Python API client app and library for Circonus v1 API
Python
5
star
22

pgbrew

Pg Brew
Shell
4
star
23

pgOtter

PostgreSQL query log analyzer
Perl
4
star
24

svr42pkgsrc

SVR4 to pkgsrc conversion tools.
Shell
4
star
25

omnifab

Library of useful helper functions for fabric
Python
3
star
26

template-maker

Script to help automate the creation of configuration management templates from raw configuration files
Python
2
star
27

spreadlogd

Spreadlogd is a simple tool to log messages received via the Spread Group Communication system. It is primary used in conjuction with mod_log_spread to journal a centralized log file for a cluster of Apache servers.
C
2
star
28

chef-sudo

Cookbook to manage sudoers file
Ruby
1
star
29

revealjs_omniti_template

Template for creating OmniTI reveal.js themed presentations
JavaScript
1
star
30

openssh-securid

OpenSSH with SecurID integration
1
star
31

omniti-ms

OmniTI OmniOS package repo build scripts
Shell
1
star
32

mungo-PSGI

Perl
1
star