• Stars
    star
    809
  • Rank 56,370 (Top 2 %)
  • Language
    Python
  • License
    MIT License
  • Created over 7 years ago
  • Updated about 1 month ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Uses tokenized query returned by python-sqlparse and generates query metadata

sql-metadata

PyPI Tests Coverage Status Code style: black Maintenance Downloads

Uses tokenized query returned by python-sqlparse and generates query metadata.

Extracts column names and tables used by the query. Automatically conduct column alias resolution, sub queries aliases resolution as well as tables aliases resolving.

Provides also a helper for normalization of SQL queries.

Supported queries syntax:

(note that listed backends can differ quite substantially but should work in regard of query types supported by sql-metadata)

You can test the capabilities of sql-metadata with an interactive demo: https://sql-app.infocruncher.com/

Usage

pip install sql-metadata

Extracting raw sql-metadata tokens

from sql_metadata import Parser

# extract raw sql-metadata tokens
Parser("SELECT * FROM foo").tokens
# ['SELECT', '*', 'FROM', 'foo']

Extracting columns from query

from sql_metadata import Parser

# get columns from query - for more examples see `tests/test_getting_columns.py`
Parser("SELECT test, id FROM foo, bar").columns
# ['test', 'id']

Parser("INSERT /* VoteHelper::addVote xxx */  INTO `page_vote` (article_id,user_id,`time`) VALUES ('442001','27574631','20180228130846')").columns
# ['article_id', 'user_id', 'time']

parser = Parser("SELECT a.* FROM product_a.users AS a JOIN product_b.users AS b ON a.ip_address = b.ip_address")

# note that aliases are auto-resolved
parser.columns
# ['product_a.*', 'product_a.users.ip_address', 'product_b.users.ip_address']

# note that you can also extract columns with their place in the query
# which will return dict with lists divided into select, where, order_by, group_by, join, insert and update
parser.columns_dict
# {'select': ['product_a.users.*'], 'join': ['product_a.users.ip_address', 'product_b.users.ip_address']}

Extracting columns aliases from query

from sql_metadata import Parser
parser = Parser("SELECT a, (b + c - u) as alias1, custome_func(d) alias2 from aa, bb order by alias1")

# note that columns list do not contain aliases of the columns
parser.columns
# ["a", "b", "c", "u", "d"]

# but you can still extract aliases names
parser.columns_aliases_names
# ["alias1", "alias2"]

# aliases are resolved to the columns which they refer to
parser.columns_aliases
# {"alias1": ["b", "c", "u"], "alias2": "d"}

# you can also extract aliases used by section of the query in which they are used
parser.columns_aliases_dict
# {"order_by": ["alias1"], "select": ["alias1", "alias2"]}

# the same applies to aliases used in queries section when you extract columns_dict
# here only the alias is used in order by but it's resolved to actual columns
assert parser.columns_dict == {'order_by': ['b', 'c', 'u'],
                               'select': ['a', 'b', 'c', 'u', 'd']}

Extracting tables from query

from sql_metadata import Parser

# get tables from query - for more examples see `tests/test_getting_tables.py`
Parser("SELECT a.* FROM product_a.users AS a JOIN product_b.users AS b ON a.ip_address = b.ip_address").tables
# ['product_a.users', 'product_b.users']

Parser("SELECT test, id FROM foo, bar").tables
# ['foo', 'bar']

# you can also extract aliases of the tables as a dictionary
parser = Parser("SELECT f.test FROM foo AS f")

# get table aliases
parser.tables_aliases
# {'f': 'foo'}

# note that aliases are auto-resolved for columns
parser.columns
# ["foo.test"]

Extracting values from insert query

from sql_metadata import Parser

parser = Parser(
    "INSERT /* VoteHelper::addVote xxx */  INTO `page_vote` (article_id,user_id,`time`) " 
    "VALUES ('442001','27574631','20180228130846')"
)
# extract values from query
parser.values
# ["442001", "27574631", "20180228130846"]

# extract a dictionary with column-value pairs
parser.values_dict
#{"article_id": "442001", "user_id": "27574631", "time": "20180228130846"}

# if column names are not set auto-add placeholders
parser = Parser(
    "INSERT IGNORE INTO `table` VALUES (9, 2.15, '123', '2017-01-01');"
)
parser.values
# [9, 2.15, "123", "2017-01-01"]

parser.values_dict
#{"column_1": 9, "column_2": 2.15, "column_3": "123", "column_4": "2017-01-01"}

Extracting limit and offset

from sql_metadata import Parser

Parser('SELECT foo_limit FROM bar_offset LIMIT 50 OFFSET 1000').limit_and_offset
# (50, 1000)

Parser('SELECT foo_limit FROM bar_offset limit 2000,50').limit_and_offset
# (50, 2000)

Extracting with names

from sql_metadata import Parser

parser = Parser(
    """
WITH
    database1.tableFromWith AS (SELECT aa.* FROM table3 as aa 
                                left join table4 on aa.col1=table4.col2),
    test as (SELECT * from table3)
SELECT
  "xxxxx"
FROM
  database1.tableFromWith alias
LEFT JOIN database2.table2 ON ("tt"."ttt"."fff" = "xx"."xxx")
"""
)

# get names/ aliases of with statements
parser.with_names
# ["database1.tableFromWith", "test"]

# get definition of with queries
parser.with_queries
# {"database1.tableFromWith": "SELECT aa.* FROM table3 as aa left join table4 on aa.col1=table4.col2"
#  "test": "SELECT * from table3"}

# note that names of with statements do not appear in tables
parser.tables
# ["table3", "table4", "database2.table2"]

Extracting sub-queries

from sql_metadata import Parser

parser = Parser(
"""
SELECT COUNT(1) FROM
(SELECT std.task_id FROM some_task_detail std WHERE std.STATUS = 1) a
JOIN (SELECT st.task_id FROM some_task st WHERE task_type_id = 80) b
ON a.task_id = b.task_id;
"""
)

# get sub-queries dictionary
parser.subqueries
# {"a": "SELECT std.task_id FROM some_task_detail std WHERE std.STATUS = 1",
#  "b": "SELECT st.task_id FROM some_task st WHERE task_type_id = 80"}


# get names/ aliases of sub-queries / derived tables
parser.subqueries_names
# ["a", "b"]

# note that columns coming from sub-queries are resolved to real columns
parser.columns
#["some_task_detail.task_id", "some_task_detail.STATUS", "some_task.task_id", 
# "task_type_id"]

# same applies for columns_dict, note the join columns are resolved
parser.columns_dict
#{'join': ['some_task_detail.task_id', 'some_task.task_id'],
# 'select': ['some_task_detail.task_id', 'some_task.task_id'],
# 'where': ['some_task_detail.STATUS', 'task_type_id']}

See tests file for more examples of a bit more complex queries.

Queries normalization and comments extraction

from sql_metadata import Parser
parser = Parser('SELECT /* Test */ foo FROM bar WHERE id in (1, 2, 56)')

# generalize query
parser.generalize
# 'SELECT foo FROM bar WHERE id in (XYZ)'

# remove comments
parser.without_comments
# 'SELECT foo FROM bar WHERE id in (1, 2, 56)'

# extract comments
parser.comments
# ['/* Test */']

See test/test_normalization.py file for more examples of a bit more complex queries.

Migrating from sql_metadata 1.x

sql_metadata.compat module has been implemented to make the introduction of sql-metadata v2.0 smoother.

You can use it by simply changing the imports in your code from:

from sql_metadata import get_query_columns, get_query_tables

into:

from sql_metadata.compat import get_query_columns, get_query_tables

The following functions from the old API are available in the sql_metadata.compat module:

  • generalize_sql
  • get_query_columns (since #131 columns aliases ARE NOT returned by this function)
  • get_query_limit_and_offset
  • get_query_tables
  • get_query_tokens
  • preprocess_query

Authors and contributors

Created and maintained by @macbre with a great contributions from @collerek and the others.

Stargazers over time

Stargazers over time

More Repositories

1

phantomas

Headless Chromium-based web performance metrics collector and monitoring tool
JavaScript
2,254
star
2

analyze-css

CSS selectors complexity and performance analyzer
JavaScript
699
star
3

nodemw

MediaWiki API and WikiData client written in Node.js
JavaScript
237
star
4

index-digest

Analyses your database queries and schema and suggests indices and schema improvements
Python
76
star
5

docker-sphinxsearch

Docker image for Sphinx search engine
Dockerfile
54
star
6

push-to-ghcr

This action simplifies pushes of Docker images to ghcr.io repository and the Docker Hub
Dockerfile
26
star
7

mediawiki-dump

Python package for working with MediaWiki XML content dumps
Python
23
star
8

data-flow-graph

Uses your app logs to visualize how the data moves between the code, database, HTTP services, message queue, external storages etc.
Python
23
star
9

wayback-machine

Internet Wayback Machine nodejs Client
JavaScript
22
star
10

telemetry

Telemetry system based on 8-bit AVR microcontroller with full TCP/IP stack (DHCP, NTP, HTTP)
C
12
star
11

farerskie-kadry-feed

Turn Instagram and Facebook feeds into RSS
HTML
8
star
12

wiki-evolution

Visualize evolution of your MediaWiki based site
JavaScript
8
star
13

phantomas-python

Python module for easy integration with phantomas
Python
7
star
14

query-digest

A dynamic code analysis tool that processes SQL queries logs and data flow information
Python
7
star
15

mobify

Download a webpage as an e-book
Python
5
star
16

plc

Power Line Communication
C
4
star
17

travis-fold

Tiny module for emitting folding syntax for Travis CI output
JavaScript
3
star
18

3pc

3rd party web content database
Python
3
star
19

monolog-python

Python's logging formatter compatible with
Python
3
star
20

actions-index-digest

GitHub Actions for index-digest - database performance regression testing
3
star
21

phantomas-reporter-elasticsearch

elasticsearch reporter for phantomas
JavaScript
3
star
22

airrohr-prometheus-exporter

Make prometheus collect temperature, pressure and PM metrics from your airrohr station
Python
2
star
23

nordic-feed

RSS feeds agregator aka planet for Nordic blogs in Polish
Python
2
star
24

faroese-planet

RSS feeds agregator aka planet for Faroese blogs from around the world
Python
2
star
25

curl-http3

A custom curl build with BoringSSL and http3 support via quiche
Dockerfile
2
star
26

electronics

Various electronics related notes, projects and schematics
C
2
star
27

optimist-config-file

Extends optimist with support for JSON/YAML config file and Docker's inspired environment variables handling
JavaScript
1
star
28

map-porn

Templates used to generate maps based on geo data for Faroe Islands and Ireland
JavaScript
1
star
29

rs2eth

Ethernet / RS232 / RS485 converter with Telnet, VCOM and MCHP Discovery protocols support
1
star
30

faroese-resources

Faroese resources
HTML
1
star
31

pelican-planet

A fork of
Python
1
star
32

docker-traefik

Dockerized traefik v3 with auto-discovery of other containers on the same Docker network
Shell
1
star
33

elasticsearch-query

A fork of https://github.com/Wikia/python-commons/tree/master/wikia/common/kibana for public PyPI repository
Python
1
star
34

github-tools

GitHub toolbox
JavaScript
1
star
35

pyrabot

Skrypty bota używanego na Poznańskiej Wiki
Lua
1
star
36

fussball

IR-based goal system for football tables
C
1
star
37

slides

macbre's slides for Wikia
JavaScript
1
star
38

dlibra-py

Python interface for accessing dLibra-powered digital libraries using OAI-PMH protocol
1
star
39

test-wordpress-bitnami

The Docker Compose setup for the Bitnami-powered WordPress instance with SSH access and wp-cli installed. Can be easily put behind Traefik.
Shell
1
star