• Stars
    star
    539
  • Rank 82,402 (Top 2 %)
  • Language
    C
  • License
    MIT License
  • Created about 8 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

ZSON is a PostgreSQL extension for transparent JSONB compression

ZSON

ZSON Logo

About

ZSON is a PostgreSQL extension for transparent JSONB compression. Compression is based on a shared dictionary of strings most frequently used in specific JSONB documents (not only keys, but also values, array elements, etc).

In some cases ZSON can save half of your disk space and give you about 10% more TPS. Memory is saved as well. See docs/benchmark.md. Everything depends on your data and workload, though. Don't believe any benchmarks, re-check everything on your data, configuration, hardware, workload and PostgreSQL version.

ZSON was originally created in 2016 by Postgres Professional team: researched and coded by Aleksander Alekseev; ideas, code review, testing, etc by Alexander Korotkov and Teodor Sigaev.

See also discussions on pgsql-general@, Hacker News, Reddit and HabraHabr.

Install

Build and install ZSON:

cd /path/to/zson/source/code
make
sudo make install

Run tests:

make installcheck

Connect to PostgreSQL:

psql my_database

Enable extension:

create extension zson;

Uninstall

Disable extension:

drop extension zson;

Uninstall ZSON:

cd /path/to/zson/source/code
sudo make uninstall

Usage

First ZSON should be trained on common data using zson_learn procedure:

zson_learn(
    tables_and_columns text[][],
    max_examples int default 10000,
    min_length int default 2,
    max_length int default 128,
    min_count int default 2
)

Example:

select zson_learn('{{"table1", "col1"}, {"table2", "col2"}}');

You can create a temporary table and write some common JSONB documents into it manually or use the existing tables. The idea is to provide a subset of real data. Let's say some document type is twice as frequent as another document type. ZSON expects that there will be twice as many documents of the first type as those of the second one in a learning set.

Resulting dictionary could be examined using this query:

select * from zson_dict;

Now ZSON type could be used as a complete and transparent replacement of JSONB type:

zson_test=# create table zson_example(x zson);
CREATE TABLE

zson_test=# insert into zson_example values ('{"aaa": 123}');
INSERT 0 1

zson_test=# select x -> 'aaa' from zson_example;
-[ RECORD 1 ]-
?column? | 123

Migrating to a new dictionary

When a schema of JSONB documents evolves ZSON could be re-learned:

select zson_learn('{{"table1", "col1"}, {"table2", "col2"}}');

This time second dictionary will be created. Dictionaries are cached in memory so it will take about a minute before ZSON realizes that there is a new dictionary. After that old documents will be decompressed using the old dictionary and new documents will be compressed and decompressed using the new dictionary.

To find out which dictionary is used for a given ZSON document use zson_info procedure:

zson_test=# select zson_info(x) from test_compress where id = 1;
-[ RECORD 1 ]---------------------------------------------------
zson_info | zson version = 0, dict version = 1, ...

zson_test=# select zson_info(x) from test_compress where id = 2;
-[ RECORD 1 ]---------------------------------------------------
zson_info | zson version = 0, dict version = 0, ...

If all ZSON documents are migrated to the new dictionary the old one could be safely removed:

delete from zson_dict where dict_id = 0;

In general, it's safer to keep old dictionaries just in case. Gaining a few KB of disk space is not worth the risk of losing data.

When it's a time to re-learn?

Unfortunately, it's hard to recommend a general approach.

A good heuristic could be:

select pg_table_size('tt') / (select count(*) from tt)

... i.e. average document size. When it suddenly starts to grow it's time to re-learn.

However, developers usually know when they change a schema significantly. It's also easy to re-check whether the current schema differs a lot from the original one using zson_dict table.

Known limitations

Intalling ZSON in a schema other than public is not supported (i.e. CREATE EXTENSION zson WITH SCHEMA ...).

More Repositories

1

rum

RUM access method - inverted index with additional information in posting lists
C
725
star
2

pg_probackup

Backup and recovery manager for PostgreSQL
Python
711
star
3

jsquery

JsQuery – json query language with GIN indexing support
C
702
star
4

pg_pathman

Partitioning tool for PostgreSQL
C
583
star
5

aqo

Adaptive query optimization for PostgreSQL
C
428
star
6

imgsmlr

Similar images search for PostgreSQL
C
255
star
7

mamonsu

Python
186
star
8

vops

C
165
star
9

postgres_cluster

Various experiments with PostgreSQL clustering
C
151
star
10

pg_query_state

Tool for query progress monitoring in PostgreSQL
C
150
star
11

pg_wait_sampling

Sampling based statistics of wait events
C
144
star
12

testgres

Testing framework for PostgreSQL and its extensions
Python
141
star
13

hunspell_dicts

Hunspell dictionaries for PostgreSQL
TSQL
63
star
14

pg_credereum

Prototype of PostgreSQL extension bringing some properties of blockchain to the relational DBMS
C
62
star
15

sr_plan

Save and restore query plans in PostgreSQL
C
61
star
16

mmts

multimaster
C
57
star
17

raft

Raft protocol implementation in C
C
49
star
18

ptrack

Block-level incremental backup engine for PostgreSQL
C
45
star
19

pg_trgm_pro

C
44
star
20

sqljson

C
38
star
21

postgresql.pthreads

Port of postgresql for pthreads
C
31
star
22

postgresql.builtin_pool

Version of PostgreSQL with built-in connection pooling
C
29
star
23

pg_dtm

Distributed transaction manager
C
27
star
24

postgrespro

Postgres Professional fork of PostgreSQL
C
27
star
25

lsm3

LSM tree implementation based on standard B-Tree
C
26
star
26

lsm

RocksDB FDW for PostgreSQL
C
24
star
27

tsvector2

Extended tsvector type for PostgreSQL
C
20
star
28

pg_backtrace

Show backtrace for errors and signals
C
20
star
29

pgwininstall

PostgreSQL Windows installer
Roff
19
star
30

monq

MonQ - PostgreSQL extension for MongoDB-like queries to jsonb data
C
17
star
31

pg_tsparser

pg_tsparser - parser for text search
C
16
star
32

pgsphere

PgSphere provides spherical data types, functions, operators, and indexing for PostgreSQL.
C
16
star
33

hstore_ops

Better operator class for hstore: smaller index and faster @> queries.
C
16
star
34

undam

Undo storage implementation
C
15
star
35

pg_logging

PostgreSQL logging interface
C
15
star
36

pg_ycsb

YCSB-like benchmark for pgbench
PLpgSQL
15
star
37

tsexample

Example of custom postgresql full text search parser, dictionaries and configuration
C
14
star
38

libblobstamper

Framework for Structure Aware Fuzzing. Allows to build own stamps that would convert pulp-data that came from fuzzer to data with structure you need
C++
14
star
39

pg_oltp_bench

Extension and scripts to run analogue of sysbench OLTP test using pgbench
PLpgSQL
13
star
40

pg_grab_statement

PostgreSQL extension for recoding workload of specific database
C
12
star
41

tsexact

PostgreSQL fulltext search addon
C
11
star
42

jsonbd

JSONB compression method for PostgreSQL
C
10
star
43

rusmorph

Russian morphological dictionary (rusmorph) for Postgres based on libmorph library: https://github.com/big-keva/libmorph
C++
10
star
44

pg_parallizator

C
9
star
45

memstat

C
9
star
46

plantuner

C
8
star
47

pg_pageprep

PostgreSQL extension which helps to prepare heap pages for migration to 64bit XID page format (PostgresPro Enterprise)
C
8
star
48

wildspeed

C
7
star
49

pgbouncer

C
6
star
50

bztree

C++
6
star
51

pg_pathman_build

Prerequisites for pg_pathman building
Shell
5
star
52

snapfs

Fast recoverry and snapshoting
C
4
star
53

pq2jdbc

Java
4
star
54

jsonb_schema

Store jsonb schema separately from data
C
4
star
55

postgrespro-os-templates

Packer templates for building minimal baseboxes
Shell
3
star
56

pg_variables

Session wide variables for PostgreSQL
C
3
star
57

pg_hint_plan

C
2
star
58

pgpro_redefinition

PLpgSQL
2
star
59

snowball_ext

The Snowball dictionary template extension for PostgreSQL
C
2
star
60

jsonb_plpython

PLpgSQL
1
star
61

dict_regex

C
1
star
62

pg-mark

Postgres benchmarking framework
R
1
star
63

anyarray

contrib package for working with 1-D arrays
C
1
star
64

libpq_compression

C
1
star