• Stars
    star
    778
  • Rank 58,431 (Top 2 %)
  • Language
    C
  • License
    BSD 3-Clause "New...
  • Created over 10 years ago
  • Updated about 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Ruby extension to parse, deparse and normalize SQL queries using the PostgreSQL query parser

pg_query

This Ruby extension uses the actual PostgreSQL server source to parse SQL queries and return the internal PostgreSQL parsetree.

In addition the extension allows you to normalize queries (replacing constant values with $n) and parse these normalized queries into a parsetree again.

When you build this extension, it builds parts of the PostgreSQL server source (see libpg_query), and then statically links it into this extension.

This may seem like a lot of complexity, but is the only reliable way of parsing all valid PostgreSQL queries.

You can find further examples and a longer rationale here: https://pganalyze.com/blog/parse-postgresql-queries-in-ruby.html

Installation

gem install pg_query

Due to compiling parts of PostgreSQL, installation might take a while on slower systems. Expect up to 5 minutes.

Usage

Parsing a query

PgQuery.parse("SELECT 1")

=> #<PgQuery::ParserResult:0x000000012000c438
  @query="SELECT 1",
  @tree=<PgQuery::ParseResult:
    version: 150001,
    stmts: [
      <PgQuery::RawStmt:
        stmt: <PgQuery::Node:
          select_stmt: <PgQuery::SelectStmt:
            distinct_clause: [],
            target_list: [
              <PgQuery::Node:
                res_target: <PgQuery::ResTarget:
                  name: "",
                  indirection: [],
                  val: <PgQuery::Node:
                    a_const: <PgQuery::A_Const:
                      isnull: false,
                      location: 7,
                      ival: <PgQuery::Integer:
                        ival: 1
                      >
                    >
                  >,
                  location: 7
                >
              >
            ],
            from_clause: [],
            group_clause: [],
            group_distinct: false,
            window_clause: [],
            values_lists: [],
            sort_clause: [],
            limit_option: :LIMIT_OPTION_DEFAULT,
            locking_clause: [],
            op: :SETOP_NONE,
            all: false
          >
        >,
        stmt_location: 0,
        stmt_len: 0
      >
    ]
  >,
  @warnings=[],
  @tables=nil,
  @aliases=nil,
  @cte_names=nil,
  @functions=nil
>

Modifying a parsed query and turning it into SQL again

This is a simple example for deparse, for more complex modification, use walk!.

parsed_query = PgQuery.parse("SELECT * FROM users")

# Modify the parse tree in some way
parsed_query.tree.stmts[0].stmt.select_stmt.from_clause[0].range_var.relname = 'other_users'

# Turn it into SQL again
parsed_query.deparse
=> "SELECT * FROM other_users"

Parsing a normalized query

# Normalizing a query (like pg_stat_statements in Postgres 10+)
PgQuery.normalize("SELECT 1 FROM x WHERE y = 'foo'")

=> "SELECT $1 FROM x WHERE y = $2"

Extracting tables from a query

PgQuery.parse("SELECT $1 FROM x JOIN y USING (id) WHERE z = $2").tables

=> ["x", "y"]

Extracting columns from a query

PgQuery.parse("SELECT $1 FROM x WHERE x.y = $2 AND z = $3").filter_columns

=> [["x", "y"], [nil, "z"]]

Fingerprinting a query

PgQuery.parse("SELECT 1").fingerprint

=> "50fde20626009aba"

PgQuery.parse("SELECT 2; --- comment").fingerprint

=> "50fde20626009aba"

# Faster fingerprint method that is implemented inside the native C library
PgQuery.fingerprint("SELECT $1")

=> "50fde20626009aba"

Scanning a query into tokens

PgQuery.scan('SELECT 1 --comment')

=> [<PgQuery::ScanResult: version: 150001, tokens: [
<PgQuery::ScanToken: start: 0, end: 6, token: :SELECT, keyword_kind: :RESERVED_KEYWORD>,
<PgQuery::ScanToken: start: 7, end: 8, token: :ICONST, keyword_kind: :NO_KEYWORD>,
<PgQuery::ScanToken: start: 9, end: 18, token: :SQL_COMMENT, keyword_kind: :NO_KEYWORD>]>,
 []]

Walking the parse tree

For generalized use, PgQuery provides walk! as a means to recursively work with the parsed query.

This can be used to create a bespoke pretty printer:

parsed_query = PgQuery.parse "SELECT * FROM tbl"
parsed_query.walk! { |node, k, v, location| puts k }

More usefully, this can be used to rewrite a query. For example:

parsed_query.walk! do |node, k, v, location| puts k
  next unless k.eql?(:range_var) || k.eql?(:relation)
  next if v.relname.nil?
  v.relname = "X_" + v.relname
end

parsed_query.deparse

There are some caveats, and limitations, in this example.

First, some of the tree nodes are frozen. You can replace them, but you cannot modify in place.

Second, table rewriting is a bit more nuanced than this example. While this will rewrite the table names, it will not correctly handle all CTEs, or rewrite columns with explicit table names.

Supported Ruby Versions

Currently tested and officially supported Ruby versions:

  • CRuby 2.6
  • CRuby 2.7
  • CRuby 3.0
  • CRuby 3.1
  • CRuby 3.2

Not supported:

  • JRuby: pg_query relies on a C extension, which is discouraged / not properly supported for JRuby
  • TruffleRuby: GraalVM does not support sigjmp, which is used by the Postgres error handling code (pg_query uses a copy of the Postgres parser & error handling code)

Developer tasks

Update libpg_query source

In order to update to a newer Postgres parser, first update libpg_query to the new Postgres version and tag a release.

Once that is done, follow the following steps:

  1. Update LIB_PG_QUERY_TAG and LIB_PG_QUERY_SHA256SUM in Rakefile

  2. Run rake update_source to update the source code

  3. Commit the Rakefile and the modified files in ext/pg_query to this source tree and make a PR

Resources

See libpg_query for pg_query in other languages, as well as products/tools built on pg_query.

Original Author

Special Thanks to

License

PostgreSQL server source code, used under the PostgreSQL license.
Portions Copyright (c) 1996-2023, The PostgreSQL Global Development Group
Portions Copyright (c) 1994, The Regents of the University of California

All other parts are licensed under the 3-clause BSD license, see LICENSE file for details.
Copyright (c) 2015, Lukas Fittl [email protected]
Copyright (c) 2016-2023, Duboce Labs, Inc. (pganalyze) [email protected]

More Repositories

1

libpg_query

C library for accessing the PostgreSQL parser outside of the server environment
C
1,173
star
2

pg_query_go

Go library to parse and normalize SQL queries using the PostgreSQL query parser
C
642
star
3

collector

pganalyze statistics collector for gathering PostgreSQL metrics and log data
Go
333
star
4

pg_query.rs

Rust library to parse, deparse and normalize SQL queries using the PostgreSQL query parser
Rust
125
star
5

pg-query-emscripten

Emscripten Port of pg_query to easily play with it in the browser
C++
76
star
6

activerecord-copy

Supports binary COPY into PostgreSQL with activerecord
Ruby
41
star
7

queryparser

DEPRECATED - use libpg_query instead! https://github.com/lfittl/libpg_query
C
38
star
8

lint

Check for missing indexes during development using constraint programming
Rust
35
star
9

materialized-views-demo

Example for using Materialized Views in Rails
Ruby
18
star
10

sidekiq_server.rs

A Sidekiq server implemented in Rust
Rust
16
star
11

pg_simulator

PostgreSQL Schema Simulator - Load schema and statistics information into a database and run EXPLAIN on queries
Ruby
15
star
12

react-svg-example

Example of how to build UI components with React and SVG
TypeScript
14
star
13

subqueries-rails-example

Ruby
11
star
14

pgcon2023

Simple index selection model using constraint programming presented at PGCon 2023
Python
10
star
15

pganalyze-docs

Documentation for pganalyze - PostgreSQL Performance Monitoring
MDX
10
star
16

graphql-batch-example

Code example for "Efficient GraphQL queries in Ruby on Rails & Postgres" blog post
Ruby
9
star
17

cp-sat-python-example

Python
9
star
18

postgis_mock

Simplified version of the PostGIS extension functions, types and operators, to be used for parsing/planning queries that depend on PostGIS APIs, but without calling actual PostGIS code.
PLpgSQL
3
star
19

pgday-chicago-2024

Python
3
star
20

full-text-search-rails

Example for article "Full Text Search in Milliseconds with Rails and PostgreSQL"
Ruby
2
star
21

collector-snapshot

Protocol Buffers definition for the pganalyze collector
1
star
22

heroku-plugin-psql

Like heroku pg:psql but for any Postgres database
TypeScript
1
star
23

postgres-local-sync

Heroku helper app to synchronize your staging database to local development, via S3
Ruby
1
star
24

index-advisor-feedback

1
star