• Stars
    star
    702
  • Rank 64,499 (Top 2 %)
  • Language
    C
  • License
    Other
  • Created over 10 years ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

JsQuery – json query language with GIN indexing support

Build Status codecov GitHub license

JsQuery – json query language with GIN indexing support

Introduction

JsQuery – is a language to query jsonb data type, introduced in PostgreSQL release 9.4.

It's primary goal is to provide an additional functionality to jsonb (currently missing in PostgreSQL), such as a simple and effective way to search in nested objects and arrays, more comparison operators with indexes support. We hope, that jsquery will be eventually a part of PostgreSQL.

Jsquery is released as jsquery data type (similar to tsquery) and @@ match operator for jsonb.

Authors

Availability

JsQuery is realized as an extension and not available in default PostgreSQL installation. It is available from github under the same license as PostgreSQL and supports PostgreSQL 9.4+.

Regards

Development was sponsored by Wargaming.net.

Installation

JsQuery is PostgreSQL extension which requires PostgreSQL 9.4 or higher. Before build and install you should ensure following:

  • PostgreSQL version is 9.4 or higher.
  • You have development package of PostgreSQL installed or you built PostgreSQL from source.
  • You have flex and bison installed on your system. JsQuery was tested on flex 2.5.37-2.5.39, bison 2.7.12.
  • Your PATH variable is configured so that pg_config command available, or set PG_CONFIG variable.

Typical installation procedure may look like this:

$ git clone https://github.com/postgrespro/jsquery.git
$ cd jsquery
$ make USE_PGXS=1
$ sudo make USE_PGXS=1 install
$ make USE_PGXS=1 installcheck
$ psql DB -c "CREATE EXTENSION jsquery;"

JSON query language

JsQuery extension contains jsquery datatype which represents whole JSON query as a single value (like tsquery does for fulltext search). The query is an expression on JSON-document values.

Simple expression is specified as path binary_operator value or path unary_operator. See following examples.

  • x = "abc" – value of key "x" is equal to "abc";
  • $ @> [4, 5, "zzz"] – the JSON document is an array containing values 4, 5 and "zzz";
  • "abc xyz" >= 10 – value of key "abc xyz" is greater than or equal to 10;
  • volume IS NUMERIC – type of key "volume" is numeric.
  • $ = true – the whole JSON document is just a true.
  • similar_ids.@# > 5 – similar_ids is an array or object of length greater than 5;
  • similar_product_ids.# = "0684824396" – array "similar_product_ids" contains string "0684824396".
  • *.color = "red" – there is object somewhere which key "color" has value "red".
  • foo = * – key "foo" exists in object.

Path selects set of JSON values to be checked using given operators. In the simplest case path is just an key name. In general path is key names and placeholders combined by dot signs. Path can use following placeholders:

  • # – any index of array;
  • #N – N-th index of array;
  • % – any key of object;
  • * – any sequence of array indexes and object keys;
  • @# – length of array or object, could be only used as last component of path;
  • $ – the whole JSON document as single value, could be only the whole path.

Expression is true when operator is true against at least one value selected by path.

Key names could be given either with or without double quotes. Key names without double quotes shouldn't contain spaces, start with number or concur with jsquery keyword.

The supported binary operators are:

  • Equality operator: =;
  • Numeric comparison operators: >, >=, <, <=;
  • Search in the list of scalar values using IN operator;
  • Array comparison operators: && (overlap), @> (contains), <@ (contained in).

The supported unary operators are:

  • Check for existence operator: = *;
  • Check for type operators: IS ARRAY, IS NUMERIC, IS OBJECT, IS STRING and IS BOOLEAN.

Expressions could be complex. Complex expression is a set of expressions combined by logical operators (AND, OR, NOT) and grouped using braces.

Examples of complex expressions are given below.

  • a = 1 AND (b = 2 OR c = 3) AND NOT d = 1
  • x.% = true OR x.# = true

Prefix expressions are expressions given in the form path (subexpression). In this case path selects JSON values to be checked using given subexpression. Check results are aggregated in the same way as in simple expressions.

  • #(a = 1 AND b = 2) – exists element of array which a key is 1 and b key is 2
  • %($ >= 10 AND $ <= 20) – exists object key which values is between 10 and 20

Path also could contain following special placeholders with "every" semantics:

  • #: – every indexes of array;
  • %: – every key of object;
  • *: – every sequence of array indexes and object keys.

Consider following example.

%.#:($ >= 0 AND $ <= 1)

This example could be read as following: there is at least one key which value is array of numerics between 0 and 1.

We can rewrite this example in the following form with extra braces.

%(#:($ >= 0 AND $ <= 1))

The first placeholder % checks that expression in braces is true for at least one value in object. The second placeholder #: checks value to be array and all its elements satisfy expressions in braces.

We can rewrite this example without #: placeholder as follows.

%(NOT #(NOT ($ >= 0 AND $ <= 1)) AND $ IS ARRAY)

In this example we transform assertion that every element of array satisfy some condition to assertion that there is no one element which doesn't satisfy the same condition.

Some examples of using paths are given below.

  • numbers.#: IS NUMERIC – every element of "numbers" array is numeric.
  • *:($ IS OBJECT OR $ IS BOOLEAN) – JSON is a structure of nested objects with booleans as leaf values.
  • #:.%:($ >= 0 AND $ <= 1) – each element of array is object containing only numeric values between 0 and 1.
  • documents.#:.% = * – "documents" is array of objects containing at least one key.
  • %.#: ($ IS STRING) – JSON object contains at least one array of strings.
  • #.% = true – at least one array element is objects which contains at least one "true" value.

Usage of path operators and braces need some explanation. When same path operators are used multiple times they may refer different values while you can refer same value multiple time by using braces and $ operator. See following examples.

  • # < 10 AND # > 20 – exists element less than 10 and exists another element greater than 20.
  • #($ < 10 AND $ > 20) – exists element which both less than 10 and greater than 20 (impossible).
  • #($ >= 10 AND $ <= 20) – exists element between 10 and 20.
  • # >= 10 AND # <= 20 – exists element great or equal to 10 and exists another element less or equal to 20. Query can be satisfied by array with no elements between 10 and 20, for instance [0,30].

Same rules apply when you search inside objects and branchy structures.

Type checking operators and "every" placeholders are useful for document schema validation. JsQuery matchig operator @@ is immutable and can be used in CHECK constraint. See following example.

CREATE TABLE js (
    id serial,
    data jsonb,
    CHECK (data @@ '
        name IS STRING AND
        similar_ids.#: IS NUMERIC AND
        points.#:(x IS NUMERIC AND y IS NUMERIC)'::jsquery));

In this example check constraint validates that in "data" jsonb column: value of "name" key is string, value of "similar_ids" key is array of numerics, value of "points" key is array of objects which contain numeric values in "x" and "y" keys.

See our pgconf.eu presentation for more examples.

GIN indexes

JsQuery extension contains two operator classes (opclasses) for GIN which provide different kinds of query optimization.

  • jsonb_path_value_ops
  • jsonb_value_path_ops

In each of two GIN opclasses jsonb documents are decomposed into entries. Each entry is associated with particular value and it's path. Difference between opclasses is in the entry representation, comparison and usage for search optimization.

For example, jsonb document {"a": [{"b": "xyz", "c": true}, 10], "d": {"e": [7, false]}} would be decomposed into following entries:

  • "a".#."b"."xyz"
  • "a".#."c".true
  • "a".#.10
  • "d"."e".#.7
  • "d"."e".#.false

Since JsQuery doesn't support search in particular array index, we consider all array elements to be equivalent. Thus, each array element is marked with same # sign in the path.

Major problem in the entries representation is its size. In the given example key "a" is presented three times. In the large branchy documents with long keys size of naive entries representation becomes unreasonable. Both opclasses address this issue but in a slightly different way.

jsonb_path_value_ops

jsonb_path_value_ops represents entry as pair of path hash and value. Following pseudocode illustrates it.

(hash(path_item_1.path_item_2. ... .path_item_n); value)

In comparison of entries path hash is the higher part of entry and value is its lower part. This determines the features of this opclass. Since path is hashed and it is higher part of entry we need to know the full path to the value in order to use it for search. However, once path is specified we can use both exact and range searches very efficiently.

jsonb_value_path_ops

jsonb_value_path_ops represents entry as pair of value and bloom filter of path.

(value; bloom(path_item_1) | bloom(path_item_2) | ... | bloom(path_item_n))

In comparison of entries value is the higher part of entry and bloom filter of path is its lower part. This determines the features of this opclass. Since value is the higher part of entry we can perform only exact value search efficiently. Range value search is possible as well but we would have to filter all the the different paths where matching values occur. Bloom filter over path items allows index usage for conditions containing % and * in their paths.

Query optimization

JsQuery opclasses perform complex query optimization. Thus it's valuable for developer or administrator to see the result of such optimization. Unfortunately, opclasses aren't allowed to do any custom output to the EXPLAIN. That's why JsQuery provides following functions which allows to see how particular opclass optimizes given query.

  • gin_debug_query_path_value(jsquery) – for jsonb_path_value_ops
  • gin_debug_query_value_path(jsquery) – for jsonb_value_path_ops

Result of these functions is a textual representation of query tree which leafs are GIN search entries. Following examples show different results of query optimization by different opclasses.

# SELECT gin_debug_query_path_value('x = 1 AND (*.y = 1 OR y = 2)');
 gin_debug_query_path_value
----------------------------
 x = 1 , entry 0           +

# SELECT gin_debug_query_value_path('x = 1 AND (*.y = 1 OR y = 2)');
 gin_debug_query_value_path
----------------------------
 AND                       +
   x = 1 , entry 0         +
   OR                      +
     *.y = 1 , entry 1     +
     y = 2 , entry 2       +

Unfortunately, jsonb have no statistics yet. That's why JsQuery optimizer has to do imperative decision while selecting conditions to be evaluated using index. This decision is made by assumtion that some condition types are less selective than others. Optimizer divides conditions into following selectivity class (listed by descending of selectivity).

  1. Equality (x = c)
  2. Range (c1 < x < c2)
  3. Inequality (x > c)
  4. Is (x is type)
  5. Any (x = *)

Optimizer evades index evaluation of less selective conditions when possible. For example, in the x = 1 AND y > 0 query x = 1 is assumed to be more selective than y > 0. That's why index isn't used for evaluation of y > 0.

# SELECT gin_debug_query_path_value('x = 1 AND y > 0');
 gin_debug_query_path_value
----------------------------
 x = 1 , entry 0           +

With lack of statistics decisions made by optimizer can be inaccurate. That's why JsQuery supports hints. Comments /*-- index */ and /*-- noindex */ placed in the conditions forces optimizer to use and not use index correspondingly.

SELECT gin_debug_query_path_value('x = 1 AND y /*-- index */ > 0');
 gin_debug_query_path_value
----------------------------
 AND                       +
   x = 1 , entry 0         +
   y > 0 , entry 1         +

SELECT gin_debug_query_path_value('x /*-- noindex */ = 1 AND y > 0');
 gin_debug_query_path_value
 ----------------------------
  y > 0 , entry 0           +

Contribution

Please, notice, that JsQuery is still under development and while it's stable and tested, it may contains some bugs. Don't hesitate to raise issues at github with your bug reports.

If you're lacking of some functionality in JsQuery and feeling power to implement it then you're welcome to make pull requests.

More Repositories

1

rum

RUM access method - inverted index with additional information in posting lists
C
725
star
2

pg_probackup

Backup and recovery manager for PostgreSQL
Python
711
star
3

pg_pathman

Partitioning tool for PostgreSQL
C
583
star
4

zson

ZSON is a PostgreSQL extension for transparent JSONB compression
C
539
star
5

aqo

Adaptive query optimization for PostgreSQL
C
428
star
6

imgsmlr

Similar images search for PostgreSQL
C
255
star
7

mamonsu

Python
186
star
8

vops

C
165
star
9

postgres_cluster

Various experiments with PostgreSQL clustering
C
151
star
10

pg_query_state

Tool for query progress monitoring in PostgreSQL
C
150
star
11

pg_wait_sampling

Sampling based statistics of wait events
C
144
star
12

testgres

Testing framework for PostgreSQL and its extensions
Python
141
star
13

hunspell_dicts

Hunspell dictionaries for PostgreSQL
TSQL
63
star
14

pg_credereum

Prototype of PostgreSQL extension bringing some properties of blockchain to the relational DBMS
C
62
star
15

sr_plan

Save and restore query plans in PostgreSQL
C
61
star
16

mmts

multimaster
C
57
star
17

raft

Raft protocol implementation in C
C
49
star
18

ptrack

Block-level incremental backup engine for PostgreSQL
C
45
star
19

pg_trgm_pro

C
44
star
20

sqljson

C
38
star
21

postgresql.pthreads

Port of postgresql for pthreads
C
31
star
22

postgresql.builtin_pool

Version of PostgreSQL with built-in connection pooling
C
29
star
23

pg_dtm

Distributed transaction manager
C
27
star
24

postgrespro

Postgres Professional fork of PostgreSQL
C
27
star
25

lsm3

LSM tree implementation based on standard B-Tree
C
26
star
26

lsm

RocksDB FDW for PostgreSQL
C
24
star
27

tsvector2

Extended tsvector type for PostgreSQL
C
20
star
28

pg_backtrace

Show backtrace for errors and signals
C
20
star
29

pgwininstall

PostgreSQL Windows installer
Roff
19
star
30

monq

MonQ - PostgreSQL extension for MongoDB-like queries to jsonb data
C
17
star
31

pg_tsparser

pg_tsparser - parser for text search
C
16
star
32

pgsphere

PgSphere provides spherical data types, functions, operators, and indexing for PostgreSQL.
C
16
star
33

hstore_ops

Better operator class for hstore: smaller index and faster @> queries.
C
16
star
34

undam

Undo storage implementation
C
15
star
35

pg_logging

PostgreSQL logging interface
C
15
star
36

pg_ycsb

YCSB-like benchmark for pgbench
PLpgSQL
15
star
37

tsexample

Example of custom postgresql full text search parser, dictionaries and configuration
C
14
star
38

libblobstamper

Framework for Structure Aware Fuzzing. Allows to build own stamps that would convert pulp-data that came from fuzzer to data with structure you need
C++
14
star
39

pg_oltp_bench

Extension and scripts to run analogue of sysbench OLTP test using pgbench
PLpgSQL
13
star
40

pg_grab_statement

PostgreSQL extension for recoding workload of specific database
C
12
star
41

tsexact

PostgreSQL fulltext search addon
C
11
star
42

jsonbd

JSONB compression method for PostgreSQL
C
10
star
43

rusmorph

Russian morphological dictionary (rusmorph) for Postgres based on libmorph library: https://github.com/big-keva/libmorph
C++
10
star
44

pg_parallizator

C
9
star
45

memstat

C
9
star
46

plantuner

C
8
star
47

pg_pageprep

PostgreSQL extension which helps to prepare heap pages for migration to 64bit XID page format (PostgresPro Enterprise)
C
8
star
48

wildspeed

C
7
star
49

pgbouncer

C
6
star
50

bztree

C++
6
star
51

pg_pathman_build

Prerequisites for pg_pathman building
Shell
5
star
52

snapfs

Fast recoverry and snapshoting
C
4
star
53

pq2jdbc

Java
4
star
54

jsonb_schema

Store jsonb schema separately from data
C
4
star
55

postgrespro-os-templates

Packer templates for building minimal baseboxes
Shell
3
star
56

pg_variables

Session wide variables for PostgreSQL
C
3
star
57

pg_hint_plan

C
2
star
58

pgpro_redefinition

PLpgSQL
2
star
59

snowball_ext

The Snowball dictionary template extension for PostgreSQL
C
2
star
60

jsonb_plpython

PLpgSQL
1
star
61

dict_regex

C
1
star
62

pg-mark

Postgres benchmarking framework
R
1
star
63

anyarray

contrib package for working with 1-D arrays
C
1
star
64

libpq_compression

C
1
star