• Stars
    star
    274
  • Rank 147,281 (Top 3 %)
  • Language
    C
  • License
    MIT License
  • Created over 5 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

generator of sequential UUIDs

Sequential UUID generators

make installcheck

This PostgreSQL extension implements two UUID generators with sequential patterns, which helps to reduce random I/O patterns associated with regular entirely-random UUID.

Regular random UUIDs are distributed uniformly over the whole range of possible values. This results in poor locality when inserting data into indexes - all index leaf pages are equally likely to be hit, forcing the whole index into memory. With small indexes that's fine, but once the index size exceeds shared buffers (or RAM), the cache hit ratio quickly deteriorates.

Compare this to sequences and timestamps, which have a more sequential pattern and the new data almost always end up in the right-most part of the index (new sequence value is larger than all preceding values, same for timestamp). This results in a nicer and cache-friendlier behavior, but the values are predictable and may easily collide cross machines.

The main goal of the two generators implemented by this extension, is generating UUIDS in a more sequential pattern, but without reducing the randomness too much (which could increase the probability of collision and predictability of the generated UUIDs). This idea is not new, and it's pretty much what the UUID wikipedia article [1] calls COMB (combined-time GUID) and is more more thoroughly explained in [2].

The benefits (performance, reduced amount of WAL, ...) are demonstrated in a blog post on 2ndQuadrant site [3].

Generators

The extension provides two functions generating sequential UUIDs using either a sequence or timestamp.

  • uuid_sequence_nextval(sequence regclass, block_size int default 65536, block_count int default 65536)

  • uuid_time_nextval(interval_length int default 60, interval_count int default 65536) RETURNS uuid

The default values for parameters are selected to work well for a range of workloads. See the next section explaining the design for additional information about the meaning of those parameters.

Design

The easiest way to make UUIDs more sequential is to use some sequential value as a prefix. For example, we might take a sequence or a timestamp and add random data until we have 16B in total. The resulting values would be almost perfectly sequential, but there are two issues with it:

  • reduction of randomness - E.g. with a sequence producing bigint values this would reduce the randomness from 16B to 8B. Timestamps do reduce the randomness in a similar way, depending on the accuracy. This increases both the collision probability and predictability (e.g. it allows determining which UUIDs were generated close to each other, and perhaps the exact timestamp).

  • bloat - If the values only grow, this may result in bloat in indexes after deleting historical data. This is a well-known issue e.g. with indexes on timestamps in log tables.

To address both of these issues, the implemented generators are designed to wrap-around regularly, either after generating a certain number of UUIDs or some amount of time. In both cases, the UUIDs are generates in blocks and have the form of

(block ID; random data)

The size of the block ID depends on the number of blocks and is fixed (depends on generator parameters). For example with the default 64k blocks we need 2 bytes to store it. The block ID increments regularly, and eventually wraps around.

For sequence-based generators the block size is determined by the number of UUIDs generated. For example we may use blocks of 256 values, in which case the two-byte block ID may be computed like this:

(nextval('s') / 256) % 65536

So the generator wraps-around every ~16M UUIDs (because 256 * 65536).

For timestamp-based generators, the block size is defined as interval length, with the default value 60 seconds. As the default number of blocks is 64k (same as for sequence-based generators), the block may be computed like this

(timestamp / 60) % 65536

Which means the generator wraps around every ~45 days.

Supported Releases

Currently, this extension works only on releases since PostgreSQL 10. It can be made working on older releases with some minor code tweaks if someone wants to spend a bit of time on that.

[1] https://en.wikipedia.org/wiki/Universally_unique_identifier

[2] http://www.informit.com/articles/article.aspx?p=25862

[3] https://www.2ndquadrant.com/en/blog/sequential-uuid-generators/

More Repositories

1

pg_tpch

TPC-H like benchmark for PostgreSQL
PHP
95
star
2

tdigest

PostgreSQL extension for estimating percentiles using t-digest
C
85
star
3

count_distinct

An extension with alternative to COUNT(DISTINCT ...) aggregate in PostgreSQL.
C
76
star
4

gdbpg

GDB command(s) making debugging PostgreSQL somewhat easier
Python
55
star
5

pg_check

a tool to verify integrity of PostgreSQL data files
C
48
star
6

geoip

an extension for GeoIP geolocation
PLpgSQL
47
star
7

quantile

an aggregation function for PostgreSQL
C
41
star
8

ispell_czech

Czech ispell dictionary (UTF-8), suitable for PostgreSQL full-text
OpenEdge ABL
28
star
9

distinct_estimators

distinct counters and aggregate functions for distinct estimation in PostgreSQL
C
18
star
10

ccnumber

experimental PostgreSQL data type with encryption off-loaded to a trusted component
C
17
star
11

explain_training

Training explaining execution plans in PostgreSQL. Currently in Czech only.
PLpgSQL
12
star
12

connection_limits

a PostgreSQL extension that allows you to set quotas on connections (per user, database or IP)
C
12
star
13

shared_ispell

Shared ispell dictionary (stored in shared segment, used by multiple connections)
C
11
star
14

perf_tuning_training

Training explaining basics of PostgreSQL performance tuning. See branches for a version in English
10
star
15

trimmed_aggregates

provides trimmed aggregates for PostgreSQL (average, variance, stddev)
C
8
star
16

query_recorder

an extension for recording all executed queries (so that you may replay them later)
C
8
star
17

index_analyzer

Analyzes existing indexes in a PostgreSQL database and suggests which of them to drop.
4
star
18

pg_uuid

UUID
C
4
star
19

pg_twitter

PostgreSQL Twitter example
4
star
20

md5hash

A custom data type for storing MD5 hashes (instead of the native TEXT varlena type).
C
4
star
21

create-statistics-talk

slides for talk about CREATE STATISTICS
4
star
22

query_histogram

provides a simple query execution time histogram for PostgreSQL
C
4
star
23

pg_vault

experimental extension for key-management of encryption keys used in pgcrypto
C
4
star
24

pg-block-bench-pgbench

Gnuplot
3
star
25

block_benchmark

scripts used to benchmark PostgreSQL with various block sizes (db and fs)
3
star
26

hashset

hashset
C
2
star
27

fdw_figures

graphviz figures illustrating FDW API internals
Graphviz (DOT)
2
star
28

scrub

C
2
star
29

pgxn-tester-server

server part (API + UI) of the PGXN Tester
JavaScript
2
star
30

ddsketch

a PostgreSQL extension implementing a fast and fully-mergeable quantile sketch with relative-error guarantees
PLpgSQL
2
star
31

memory-accounting-benchmarks

Shell
1
star
32

fuzzy_logic

Extension with basic fuzzy logic operators (Gödel, Łukasiewicz and product)
1
star
33

pgcommitfest2

pgcommitfest2 fork
JavaScript
1
star
34

pg-fs-benchmark

Postgres benchmark on different filesystems
Shell
1
star
35

kernel-bench

results of pgbench on different kernel versions
Shell
1
star
36

user_info

Extension for listing various user-related info (granted roles, owned objects, ...)
1
star