vertica-python
vertica-python is a native Python client for the Vertica (http://www.vertica.com) database. vertica-python is the replacement of the deprecated Python client vertica_db_client, which was removed since Vertica server version 9.3.
Please check out release notes to learn about the latest improvements.
vertica-python has been tested with Vertica 12.0.4 and Python 3.7/3.8/3.9/3.10/3.11. Feel free to submit issues and/or pull requests (Read up on our contributing guidelines).
Installation
To install vertica-python with pip:
# Latest release version
pip install vertica-python
# Latest commit on master branch
pip install git+https://github.com/vertica/vertica-python.git@master
To install vertica-python from source, run the following command from the root directory:
python setup.py install
Source code for vertica-python can be found at:
https://github.com/vertica/vertica-python
Using Kerberos authentication
vertica-python has optional Kerberos authentication support for Unix-like systems, which requires you to install the kerberos package:
pip install kerberos
Note that kerberos
is a python extension module, which means you need to install python-dev
. The command depends on the package manager and will look like
sudo [yum|apt-get|etc] install python-dev
Then see this section for how to config Kerberos for a connection.
Usage
Create a connection
The example below shows how to create a Connection
object:
import vertica_python
conn_info = {'host': '127.0.0.1',
'port': 5433,
'user': 'some_user',
'password': 'some_password',
'database': 'a_database',
# autogenerated session label by default,
'session_label': 'some_label',
# default throw error on invalid UTF-8 results
'unicode_error': 'strict',
# SSL is disabled by default
'ssl': False,
# autocommit is off by default
'autocommit': True,
# using server-side prepared statements is disabled by default
'use_prepared_statements': False,
# connection timeout is not enabled by default
# 5 seconds timeout for a socket operation (Establishing a TCP connection or read/write operation)
'connection_timeout': 5}
# simple connection, with manual close
try:
connection = vertica_python.connect(**conn_info)
# do things
finally:
connection.close()
# using `with` for auto connection closing after usage
with vertica_python.connect(**conn_info) as connection:
# do things
Connection Option | Description |
---|---|
host | The server host of the connection. This can be a host name or an IP address. Default: "localhost" |
port | The port of the connection. Default: 5433 |
user | The database user name to use to connect to the database. Default: OS login user name |
password | The password to use to log into the database. Default: "" |
database | The database name. Default: "" |
autocommit | See Autocommit. Default: False |
backup_server_node | See Connection Failover. Default: [] |
binary_transfer | See Data Transfer Format. Default: False (use text format transfer) |
connection_load_balance | See Connection Load Balancing. Default: False (disabled) |
connection_timeout | The number of seconds (can be a nonnegative floating point number) the client waits for a socket operation (Establishing a TCP connection or read/write operation). Default: None (no timeout) |
disable_copy_local | See COPY FROM LOCAL. Default: False |
kerberos_host_name | See Kerberos Authentication. Default: the value of connection option host |
kerberos_service_name | See Kerberos Authentication. Default: "vertica" |
log_level | See Logging. |
log_path | See Logging. |
request_complex_types | See SQL Data conversion to Python objects. Default: True |
session_label | Sets a label for the connection on the server. This value appears in the client_label column of the v_monitor.sessions system table. Default: an auto-generated label with format of vertica-python-{version}-{random_uuid} |
ssl | See TLS/SSL. Default: False (disabled) |
unicode_error | See UTF-8 encoding issues. Default: 'strict' (throw error on invalid UTF-8 results) |
use_prepared_statements | See Passing parameters to SQL queries. Default: False |
workload | Sets the workload name associated with this session. Valid values are workload names that already exist in a workload routing rule on the server. If a workload name that doesn't exist is entered, the server will reject it and it will be set to the default. Default: "" |
dsn | See Set Properties with Connection String. |
Below are a few important connection topics you may deal with, or you can skip and jump to the next section: Send Queries and Retrieve Results
Set Properties with Connection String
Another way to set connection properties is passing a connection string to the keyword parameter dsn
of vertica_python.connect(dsn='...', **kwargs)
. The connection string is of the form:
vertica://(user):(password)@(host):(port)/(database)?(arg1=val1&arg2=val2&...)
The connection string would be parsed by vertica_python.parse_dsn(connection_str)
, and the parsing result (a dictionary of keywords and values) would be merged with kwargs. If the same keyword is specified in both the sources, the kwargs value overrides the parsed dsn value. The (arg1=val1&arg2=val2&...)
section can handle string/numeric/boolean values, blank and invalid value would be ignored.
import vertica_python
connection_str = ('vertica://admin@localhost:5433/db1?connection_load_balance=True&connection_timeout=1.5&'
'session_label=vpclient+123%7E456')
print(vertica_python.parse_dsn(connection_str))
# {'user': 'admin', 'host': 'localhost', 'port': 5433, 'database': 'db1',
# 'connection_load_balance': True, 'connection_timeout': 1.5, 'session_label': 'vpclient 123~456'}
additional_info = {
'password': 'some_password',
'backup_server_node': ['10.6.7.123', ('10.20.82.77', 6000)] # invalid value to be set in a connection string
}
with vertica_python.connect(dsn=connection_str, **additional_info) as conn:
# do things
TLS/SSL
You can pass True
to ssl
to enable TLS/SSL connection (Internally ssl.wrap_socket(sock) is called).
import vertica_python
# [TLSMode: require]
conn_info = {'host': '127.0.0.1',
'port': 5433,
'user': 'some_user',
'password': 'some_password',
'database': 'a_database',
'ssl': True}
connection = vertica_python.connect(**conn_info)
You can pass an ssl.SSLContext
to ssl
to customize the SSL connection options. Server mode TLS examples:
import vertica_python
import ssl
# [TLSMode: require]
# Ensure connection is encrypted.
ssl_context = ssl.SSLContext(ssl.PROTOCOL_TLS_CLIENT)
ssl_context.check_hostname = False
ssl_context.verify_mode = ssl.CERT_NONE
conn_info = {'host': '127.0.0.1',
'port': 5433,
'user': 'some_user',
'password': 'some_password',
'database': 'a_database',
'ssl': ssl_context}
connection = vertica_python.connect(**conn_info)
# [TLSMode: verify-ca]
# Ensure connection is encrypted, and client trusts server certificate.
ssl_context = ssl.SSLContext(ssl.PROTOCOL_TLS_CLIENT)
ssl_context.verify_mode = ssl.CERT_REQUIRED
ssl_context.check_hostname = False
ssl_context.load_verify_locations(cafile='/path/to/ca_file.pem') # CA certificate used to verify server certificate
conn_info = {'host': '127.0.0.1',
'port': 5433,
'user': 'some_user',
'password': 'some_password',
'database': 'a_database',
'ssl': ssl_context}
connection = vertica_python.connect(**conn_info)
# [TLSMode: verify-full]
# Ensure connection is encrypted, client trusts server certificate,
# and server hostname matches the one listed in the server certificate.
ssl_context = ssl.SSLContext(ssl.PROTOCOL_TLS_CLIENT)
ssl_context.verify_mode = ssl.CERT_REQUIRED
ssl_context.check_hostname = True
ssl_context.load_verify_locations(cafile='/path/to/ca_file.pem') # CA certificate used to verify server certificate
conn_info = {'host': '127.0.0.1',
'port': 5433,
'user': 'some_user',
'password': 'some_password',
'database': 'a_database',
'ssl': ssl_context}
connection = vertica_python.connect(**conn_info)
Mutual mode TLS example:
import vertica_python
import ssl
# [TLSMode: verify-full]
# Ensure connection is encrypted, client trusts server certificate,
# and server hostname matches the one listed in the server certificate.
ssl_context = ssl.SSLContext(ssl.PROTOCOL_TLS_CLIENT)
ssl_context.verify_mode = ssl.CERT_REQUIRED
ssl_context.check_hostname = True
ssl_context.load_verify_locations(cafile='/path/to/ca_file.pem') # CA certificate used to verify server certificate
# For Mutual mode, provide client certificate and client private key to ssl_context.
# CA certificate used to verify client certificate should be set at the server side.
ssl_context.load_cert_chain(certfile='/path/to/client.pem', keyfile='/path/to/client.key')
conn_info = {'host': '127.0.0.1',
'port': 5433,
'user': 'some_user',
'password': 'some_password',
'database': 'a_database',
'ssl': ssl_context}
connection = vertica_python.connect(**conn_info)
See more on SSL options here.
Kerberos Authentication
In order to use Kerberos authentication, install dependencies first, and it is the user's responsibility to ensure that an Ticket-Granting Ticket (TGT) is available and valid. Whether a TGT is available can be easily determined by running the klist
command. If no TGT is available, then it first must be obtained by running the kinit
command or by logging in. You can pass in optional arguments to customize the authentication. The arguments are kerberos_service_name
, which defaults to "vertica", and kerberos_host_name
, which defaults to the value of argument host
. For example,
import vertica_python
conn_info = {'host': '127.0.0.1',
'port': 5433,
'user': 'some_user',
'password': 'some_password',
'database': 'a_database',
# The service name portion of the Vertica Kerberos principal
'kerberos_service_name': 'vertica_krb',
# The instance or host name portion of the Vertica Kerberos principal
'kerberos_host_name': 'vcluster.example.com'}
with vertica_python.connect(**conn_info) as conn:
# do things
Logging
Logging is disabled by default if neither log_level
or log_path
are set. Passing value to at least one of those options to enable logging.
When logging is enabled, the default value of log_level
is logging.WARNING. You can find all levels here. And the default value of log_path
is 'vertica_python.log', the log file will be in the current execution directory. If log_path
is set to ''
(empty string) or None
, no file handler is set, logs will be processed by root handlers. For example,
import vertica_python
import logging
## Example 1: write DEBUG level logs to './vertica_python.log'
conn_info = {'host': '127.0.0.1',
'port': 5433,
'user': 'some_user',
'password': 'some_password',
'database': 'a_database',
'log_level': logging.DEBUG}
with vertica_python.connect(**conn_info) as connection:
# do things
## Example 2: write WARNING level logs to './path/to/logs/client.log'
conn_info = {'host': '127.0.0.1',
'port': 5433,
'user': 'some_user',
'password': 'some_password',
'database': 'a_database',
'log_path': 'path/to/logs/client.log'}
with vertica_python.connect(**conn_info) as connection:
# do things
## Example 3: write INFO level logs to '/home/admin/logs/vClient.log'
conn_info = {'host': '127.0.0.1',
'port': 5433,
'user': 'some_user',
'password': 'some_password',
'database': 'a_database',
'log_level': logging.INFO,
'log_path': '/home/admin/logs/vClient.log'}
with vertica_python.connect(**conn_info) as connection:
# do things
## Example 4: use root handlers to process logs by setting 'log_path' to '' (empty string)
conn_info = {'host': '127.0.0.1',
'port': 5433,
'user': 'some_user',
'password': 'some_password',
'database': 'a_database',
'log_level': logging.DEBUG,
'log_path': ''}
with vertica_python.connect(**conn_info) as connection:
# do things
Connection Failover
Supply a list of backup hosts to backup_server_node
for the client to try if the primary host you specify in the connection parameters (host
, port
) is unreachable. Each item in the list should be either a host string (using default port 5433) or a (host, port) tuple. A host can be a host name or an IP address.
import vertica_python
conn_info = {'host': 'unreachable.server.com',
'port': 888,
'user': 'some_user',
'password': 'some_password',
'database': 'a_database',
'backup_server_node': ['123.456.789.123', 'invalid.com', ('10.20.82.77', 6000)]}
connection = vertica_python.connect(**conn_info)
Connection Load Balancing
Connection Load Balancing helps automatically spread the overhead caused by client connections across the cluster by having hosts redirect client connections to other hosts. Both the server and the client need to enable load balancing for it to function. If the server disables connection load balancing, the load balancing request from client will be ignored.
import vertica_python
conn_info = {'host': '127.0.0.1',
'port': 5433,
'user': 'some_user',
'password': 'some_password',
'database': 'vdb',
'connection_load_balance': True}
# Server enables load balancing
with vertica_python.connect(**conn_info) as conn:
cur = conn.cursor()
cur.execute("SELECT NODE_NAME FROM V_MONITOR.CURRENT_SESSION")
print("Client connects to primary node:", cur.fetchone()[0])
cur.execute("SELECT SET_LOAD_BALANCE_POLICY('ROUNDROBIN')")
with vertica_python.connect(**conn_info) as conn:
cur = conn.cursor()
cur.execute("SELECT NODE_NAME FROM V_MONITOR.CURRENT_SESSION")
print("Client redirects to node:", cur.fetchone()[0])
## Output
# Client connects to primary node: v_vdb_node0003
# Client redirects to node: v_vdb_node0005
Data Transfer Format
There are two formats for transferring data from a server to a vertica-python client: text and binary. For example, a FLOAT type data is represented as a 8-byte IEEE-754 floating point number (fixed-width) in binary format, and a human-readable string (variable-width) in text format. The text format of values is whatever strings are produced and accepted by the input/output conversion functions for the particular data type.
Depending on the data type, binary transfer is generally more efficient and requires less bandwidth than text transfer. However, when transferring a large number of small values, binary transfer may use more bandwidth.
A connection is set to use text format by default. Set binary_transfer
to True to use binary format.
import vertica_python
conn_info = {'host': '127.0.0.1',
'port': 5433,
'user': 'some_user',
'password': 'some_password',
'database': 'vdb',
'binary_transfer': True # False by default
}
# Server enables binary transfer
with vertica_python.connect(**conn_info) as conn:
cur = conn.cursor()
...
Ideally, the output data should be the same for these two formats. However, there are edge cases:
- FLOAT data: binary format might offer slightly greater precision than text format. E.g.
select ATAN(12.345)
returns 1.48996835348642 (text) or 1.489968353486419 (binary) - TIMESTAMPTZ data: text format always use the session timezone, but binary format might fail to get session timezone and use local timezone.
- NUMERIC data: In old server versions, the precision and scale is incorrect when querying a NUMERIC column that is not from a specific table with prepared statement in binary format. E.g.
select ?::NUMERIC
orselect node_id, ?/50 from nodes
. In newer server versions, binary transfer is forcibly disabled for NUMERIC data by the server, regardless of client-side values ofbinary_transfer
anduse_prepared_statements
.
Send Queries and Retrieve Results
The Connection
class encapsulates a database session. It allows to:
- create new
Cursor
instances using thecursor()
method to execute database commands and queries. - terminate transactions using the methods
commit()
orrollback()
.
The class Cursor
allows interaction with the database:
- send commands to the database using methods such as
execute()
,executemany()
and copy. - retrieve data from the database, iterating on the cursor or using methods such as
fetchone()
,fetchmany()
,fetchall()
,nextset()
.
import vertica_python
conn_info = {'host': '127.0.0.1',
'port': 5433,
'user': 'some_user',
'password': 'some_password',
'database': 'vdb'}
# Connect to a vertica database
with vertica_python.connect(**conn_info) as conn:
# Open a cursor to perform database operations
# vertica-python only support one cursor per connection
cur = conn.cursor()
# Execute a command: create a table
cur.execute("CREATE TABLE tbl (a INT, b VARCHAR)")
# Insert a row
cur.execute("INSERT INTO tbl VALUES (1, 'aa')")
inserted = cur.fetchall() # [[1]]
# Bulk Insert with executemany()
# Pass data to fill a query placeholders and let vertica-python perform the correct conversion
cur.executemany("INSERT INTO tbl(a, b) VALUES (?, ?)", [(2, 'bb'), (3, 'foo'), (4, 'xx'), (5, 'bar')], use_prepared_statements=True)
# OR
# cur.executemany("INSERT INTO tbl(a, b) VALUES (%s, %s)", [(6, 'bb'), (7, 'foo'), (8, 'xx'), (9, 'bar')], use_prepared_statements=False)
# Query the database and obtain data as Python objects.
cur.execute("SELECT * FROM tbl")
datarow = cur.fetchone() # [1, 'aa']
remaining_rows = cur.fetchall() # [[2, 'bb'], [3, 'foo'], [4, 'xx'], [5, 'bar']]
# Make the changes to the database persistent
conn.commit()
# Execute a query with MULTIPLE statements
cur.execute("SELECT 1; SELECT 2; ...; SELECT N")
while True: # Fetch the result set for each statement
rows = cur.fetchall()
print(rows)
if not cur.nextset():
break
# Output:
# [[1]]
# [[2]]
# ...
# [[N]]
💬
Frequently Asked Questions Why does my query return empty results?
If you thinkCursor.fetch*()
should return something, check whether your query contains multiple statements. It is very likely that you miss to call Cursor.nextset()
.
Why does my query not throw an error?
vertica-python tries to throw exceptions in theCursor.execute()
method, but depending on your query, there are some exceptions that can only be raised when you call fetchone()
fetchmany()
or fetchall()
. In addition, if your query has multiple statements, errors that is not in the first statement cannot be thrown by execute()
. It is recommended to always call fetchall()
after execute()
in order to capture any error. And for a query with multiple statements, call fetchall()
and nextset()
as the above example code shows.
Why is this client N times slower than another vertica client?
You may find vertica-python performs much slower executing same query on same machine than another python client (e.g. pyodbc) or client in other programming language. This is because vertica-python is a pure Python program and CPython (the official implementation of Python, which is an interpreted, dynamic language) computation is often many times slower than compiled languages like C and Go, or JIT (Just-in-Time) compiled languages like Java and JavaScript. Therefore, if you want to get better performance, instead of using the official CPython interpreter, try other performance-oriented interpreters such as PyPy.Stream query results
Streaming is recommended if you want to further process each row, save the results in a non-list/dict format (e.g. Pandas DataFrame), or save the results in a file.
cur = connection.cursor()
cur.execute("SELECT * FROM a_table LIMIT 2")
for row in cur.iterate():
print(row)
# [ 1, 'some text', datetime.datetime(2014, 5, 18, 6, 47, 1, 928014) ]
# [ 2, 'something else', None ]
In-memory results as list
cur = connection.cursor()
cur.execute("SELECT * FROM a_table LIMIT 2")
cur.fetchall()
# [ [1, 'something'], [2, 'something_else'] ]
In-memory results as dictionary
cur = connection.cursor('dict')
cur.execute("SELECT * FROM a_table LIMIT 2")
cur.fetchall()
# [ {'id': 1, 'value': 'something'}, {'id': 2, 'value': 'something_else'} ]
connection.close()
Nextset
If you execute multiple statements in a single call to execute(), you can use Cursor.nextset()
to retrieve all of the data.
cur.execute('SELECT 1; SELECT 2;')
cur.fetchone()
# [1]
cur.fetchone()
# None
cur.nextset()
# True
cur.fetchone()
# [2]
cur.fetchone()
# None
cur.nextset()
# False
Passing parameters to SQL queries
vertica-python provides two methods for passing parameters to a SQL query:
Server-side binding: Query using prepared statements
Vertica server-side prepared statements let you define a statement once and then run it many times with different parameters. Internally, vertica-python sends the query and the parameters to the server separately. Placeholders in the statement are represented by question marks (?). Server-side prepared statements are useful for preventing SQL injection attacks.
import vertica_python
# Enable using server-side prepared statements at connection level
conn_info = {'host': '127.0.0.1',
'user': 'some_user',
'password': 'some_password',
'database': 'a_database',
'use_prepared_statements': True,
}
with vertica_python.connect(**conn_info) as connection:
cur = connection.cursor()
cur.execute("CREATE TABLE tbl (a INT, b VARCHAR)")
cur.execute("INSERT INTO tbl VALUES (?, ?)", [1, 'aa'])
cur.execute("INSERT INTO tbl VALUES (?, ?)", [2, 'bb'])
cur.executemany("INSERT INTO tbl VALUES (?, ?)", [(3, 'foo'), (4, 'xx'), (5, 'bar')])
cur.execute("COMMIT")
cur.execute("SELECT * FROM tbl WHERE a>=? AND a<=? ORDER BY a", (2,4))
cur.fetchall()
# [[2, 'bb'], [3, 'foo'], [4, 'xx']]
You can set use_prepared_statements
option in cursor.execute*()
functions to override the connection level setting.
import vertica_python
# Enable using server-side prepared statements at connection level
conn_info = {'host': '127.0.0.1',
'user': 'some_user',
'password': 'some_password',
'database': 'a_database',
'use_prepared_statements': True,
}
with vertica_python.connect(**conn_info) as connection:
cur = connection.cursor()
cur.execute("CREATE TABLE tbl (a INT, b VARCHAR)")
# Executing compound statements
cur.execute("INSERT INTO tbl VALUES (?, ?); COMMIT", [1, 'aa'])
# Error message: Cannot insert multiple commands into a prepared statement
# Disable prepared statements but forget to change placeholders (?)
cur.execute("INSERT INTO tbl VALUES (?, ?); COMMIT;", [1, 'aa'], use_prepared_statements=False)
# TypeError: not all arguments converted during string formatting
cur.execute("INSERT INTO tbl VALUES (%s, %s); COMMIT;", [1, 'aa'], use_prepared_statements=False)
cur.execute("INSERT INTO tbl VALUES (:a, :b); COMMIT;", {'a': 2, 'b': 'bb'}, use_prepared_statements=False)
# Disable using server-side prepared statements at connection level
conn_info['use_prepared_statements'] = False
with vertica_python.connect(**conn_info) as connection:
cur = connection.cursor()
# Try using prepared statements
cur.execute("INSERT INTO tbl VALUES (?, ?)", [3, 'foo'])
# TypeError: not all arguments converted during string formatting
cur.execute("INSERT INTO tbl VALUES (?, ?)", [3, 'foo'], use_prepared_statements=True)
# Query using named parameters
cur.execute("SELECT * FROM tbl WHERE a>=:n1 AND a<=:n2 ORDER BY a", {'n1': 2, 'n2': 4})
cur.fetchall()
# [[2, 'bb'], [3, 'foo']]
Note: In other drivers, the batch insert is converted into a COPY statement by using prepared statements. vertica-python currently does not support that. More details
Client-side binding: Query using named parameters or format parameters
vertica-python can automatically convert Python objects to SQL literals, merge the query and the parameters on the client side, and then send the query to the server: using this feature your code will be more robust and reliable to prevent SQL injection attacks. You need to set use_prepared_statements
option to False (at connection level or in cursor.execute*()) to use client-side binding.
Variables can be specified with named (:name) placeholders.
cur = connection.cursor()
data = {'propA': 1, 'propB': 'stringValue'}
cur.execute("SELECT * FROM a_table WHERE a = :propA AND b = :propB", data, use_prepared_statements=False)
# converted into a SQL command similar to: SELECT * FROM a_table WHERE a = 1 AND b = 'stringValue'
cur.fetchall()
# [ [1, 'stringValue'] ]
Variables can also be specified with positional format (%s) placeholders. The placeholder must always be a %s, even if a different placeholder (such as a %d
for integers or %f
for floats) may look more appropriate. Never use Python string concatenation (+) or string parameters interpolation (%) to pass variables to a SQL query string.
cur = connection.cursor()
data = (1, "O'Reilly")
cur.execute("SELECT * FROM a_table WHERE a = %s AND b = %s" % data) # WRONG: % operator
cur.execute("SELECT * FROM a_table WHERE a = %d AND b = %s", data) # WRONG: %d placeholder
cur.execute("SELECT * FROM a_table WHERE a = %s AND b = %s", data) # correct
# converted into a SQL command: SELECT * FROM a_table WHERE a = 1 AND b = 'O''Reilly'
cur.fetchall()
# [ [1, "O'Reilly"] ]
The placeholder must not be quoted. vertica-python will add quotes where needed.
>>> cur.execute("INSERT INTO table VALUES (':propA')", {'propA': "someString"}) # WRONG
>>> cur.execute("INSERT INTO table VALUES (:propA)", {'propA': "someString"}) # correct
>>> cur.execute("INSERT INTO table VALUES ('%s')", ("someString",)) # WRONG
>>> cur.execute("INSERT INTO table VALUES (%s)", ("someString",)) # correct
In order to merge the query (with placeholders) and the parameters on the client side, parameter values (python object) are converted to SQL literals (str). vertica-python supports default mapping to SQL literals for many standard Python types (str, bytes, bool, int, float, decimal.Decimal, tuple, list, set, dict, datetime.datetime, datetime.date, datetime.time, uuid.UUID). For complex types, in some cases, you may need explicit typecasting for the placeholder (e.g. %s::ARRAY[ARRAY[INT]]
, %s::ROW(varchar,int,date)
):
from datetime import date
cur.execute("CREATE TABLE table (a INT, b ARRAY[DATE])")
value = [date(2021, 6, 10), date(2021, 6, 12), date(2021, 6, 30)]
cur.execute("INSERT INTO table VALUES (%s, %s)", [100, value], use_prepared_statements=False) # WRONG
# Error Message: Column "b" is of type array[date] but expression is of type array[varchar], Sqlstate: 42804,
# Hint: You will need to rewrite or cast the expression
cur.execute("INSERT INTO table VALUES (%s, %s::ARRAY[DATE])", [100, value], use_prepared_statements=False) # correct
# converted into a SQL command: INSERT INTO vptest VALUES (100, ARRAY['2021-06-10','2021-06-12','2021-06-30']::ARRAY[DATE])
Register new SQL literal adapters
It is possible to adapt new Python types to SQL literals via Cursor.register_sql_literal_adapter(py_class_or_type, adapter_function)
function. Example:
class Point(object):
def __init__(self, x, y):
self.x = x
self.y = y
# Adapter should return a string value
def adapt_point(point):
return "STV_GeometryPoint({},{})".format(point.x, point.y)
cur = conn.cursor()
cur.register_sql_literal_adapter(Point, adapt_point)
cur.execute("INSERT INTO geom_data (geom) VALUES (%s)", [Point(1.23, 4.56)])
cur.execute("select ST_asText(geom) from geom_data")
cur.fetchall()
# [['POINT (1.23 4.56)']]
To help you debug the binding process during Cursor.execute*(), Cursor.object_to_sql_literal(py_object)
function can be used to inspect the SQL literal string converted from a Python object.
cur = conn.cursor
cur.object_to_sql_literal("O'Reilly") # "'O''Reilly'"
cur.object_to_sql_literal(None) # "NULL"
cur.object_to_sql_literal(True) # "True"
cur.object_to_sql_literal(Decimal("10.00000")) # "10.00000"
cur.object_to_sql_literal(datetime.date(2018, 9, 7)) # "'2018-09-07'"
cur.object_to_sql_literal(Point(-71.13, 42.36)) # "STV_GeometryPoint(-71.13,42.36)" if you registered in previous step
Cursor.executemany(): Server-side binding vs Client-side binding
Cursor.executemany(query, seq_of_parameters, use_prepared_statements=None)
Execute the same query or command with a sequence of input data.
PARAMETERS
- query (str or bytes) – The query to execute.
- seq_of_parameters (a list/tuple of Sequences or Mappings) – The parameters to pass to the query.
- use_prepared_statements (bool) – Use connection level setting by default. If set, execute the query using server-side prepared statements or not.
When use_prepared_statements=True
(Server-side binding), the query should contain only a single statement. Internally, vertica-python sends the query and each set of parameters to the server separately.
When use_prepared_statements=False
(Client-side binding), the query is limited to simple INSERT statements only. The batch insert is converted into a COPY FROM STDIN statement by the client. This is more efficient than performing separate queries (may even faster than Server-side binding), but in case of other statements you may consider using copy.
# Note the query parameter placeholders difference!
cur.executemany("INSERT INTO tbl(a, b) VALUES (?, ?)", [(2, 'bb'), (3, 'foo'), (4, 'xx'), (5, 'bar')], use_prepared_statements=True)
cur.executemany("INSERT INTO tbl(a, b) VALUES (%s, %s)", [(6, 'bb'), (7, 'foo'), (8, 'xx'), (9, 'bar')], use_prepared_statements=False)
cur.executemany("INSERT INTO tbl(a, b) VALUES (:a, :b)", [{'a': 2, 'b': 'bb'}, {'a': 3, 'b': 'foo'}], use_prepared_statements=False)
Insert and commit/rollback
cur = connection.cursor()
# inline commit (when 'use_prepared_statements' is False)
cur.execute("INSERT INTO a_table (a, b) VALUES (1, 'aa'); commit;")
# commit in execution
cur.execute("INSERT INTO a_table (a, b) VALUES (1, 'aa')")
cur.execute("INSERT INTO a_table (a, b) VALUES (2, 'bb')")
cur.execute("commit;")
# connection.commit()
cur.execute("INSERT INTO a_table (a, b) VALUES (1, 'aa')")
connection.commit()
# connection.rollback()
cur.execute("INSERT INTO a_table (a, b) VALUES (0, 'bad')")
connection.rollback()
Autocommit
Session parameter AUTOCOMMIT can be configured by the connection option and the Connection.autocommit
read/write attribute:
import vertica_python
# Enable autocommit at startup
conn_info = {'host': '127.0.0.1',
'user': 'some_user',
'password': 'some_password',
'database': 'a_database',
# autocommit is off by default
'autocommit': True,
}
with vertica_python.connect(**conn_info) as connection:
# Check current session autocommit setting
print(connection.autocommit) # should be True
# If autocommit is True, statements automatically commit their transactions when they complete.
# Set autocommit setting with attribute
connection.autocommit = False
print(connection.autocommit) # should be False
# If autocommit is False, the methods commit() or rollback() must be manually invoked to terminate the transaction.
To set AUTOCOMMIT to a new value, vertica-python uses Cursor.execute()
to execute a command internally, and that would clear your previous query results, so be sure to call Cursor.fetch*()
to save your results before you set autocommit.
Using COPY FROM
Cursor.execute()
.
There are 2 methods to do copy:
Method 1: "COPY FROM STDIN" sql with Cursor.copy()
cur = connection.cursor()
cur.copy("COPY test_copy (id, name) FROM stdin DELIMITER ',' ", csv)
Where csv
is either a string or a file-like object (specifically, any object with a read()
method). If using a file, the data is streamed (in chunks of buffer_size
bytes, which defaults to 128 * 2 ** 10).
with open("/tmp/binary_file.csv", "rb") as fs:
cursor.copy("COPY table(field1, field2) FROM STDIN DELIMITER ',' ENCLOSED BY '\"'",
fs, buffer_size=65536)
Method 2: "COPY FROM LOCAL" sql with Cursor.execute()
import sys
import vertica_python
conn_info = {'host': '127.0.0.1',
'user': 'some_user',
'password': 'some_password',
'database': 'a_database',
# False by default
#'disable_copy_local': True,
# Don't support executing COPY LOCAL operations with prepared statements
'use_prepared_statements': False
}
with vertica_python.connect(**conn_info) as connection:
cur = connection.cursor()
# Copy from local file
cur.execute("COPY table(field1, field2) FROM LOCAL"
" 'data_Jan_*.csv','data_Feb_01.csv' DELIMITER ','"
" REJECTED DATA 'path/to/write/rejects.txt'"
" EXCEPTIONS 'path/to/write/exceptions.txt'",
buffer_size=65536
)
print("Rows loaded:", cur.fetchall())
# Copy from local stdin
cur.execute("COPY table(field1, field2) FROM LOCAL STDIN DELIMITER ','", copy_stdin=sys.stdin)
print("Rows loaded:", cur.fetchall())
# Copy from local stdin (compound statements)
with open('f1.csv', 'r') as fs1, open('f2.csv', 'r') as fs2:
cur.execute("COPY tlb1(field1, field2) FROM LOCAL STDIN DELIMITER ',';"
"COPY tlb2(field1, field2) FROM LOCAL STDIN DELIMITER ',';",
copy_stdin=[fs1, fs2], buffer_size=65536)
print("Rows loaded 1:", cur.fetchall())
cur.nextset()
print("Rows loaded 2:", cur.fetchall())
# Copy from local stdin (StringIO)
from io import StringIO
data = "Anna|123-456-789\nBrown|555-444-3333\nCindy|555-867-53093453453\nDodd|123-456-789\nEd|123-456-789"
cur.execute("COPY customers (firstNames, phoneNumbers) FROM LOCAL STDIN ENFORCELENGTH RETURNREJECTED AUTO",
copy_stdin=StringIO(data))
When connection option disable_copy_local
set to True, disables COPY LOCAL operations, including copying data from local files/stdin and using local files to store data and exceptions. You can use this property to prevent users from writing to and copying from files on a Vertica host, including an MC host. Note that this property doesn't apply to Cursor.copy()
.
The data for copying from/writing to local files is streamed in chunks of buffer_size
bytes, which defaults to 128 * 2 ** 10.
When executing "COPY FROM LOCAL STDIN", copy_stdin
should be a file-like object or a list of file-like objects (specifically, any object with a read()
method).
Cancel the current database operation
Connection.cancel()
interrupts the processing of the current operation. Interrupting query execution will cause the cancelled method to raise a vertica_python.errors.QueryCanceled
. If no query is being executed, it does nothing. You can call this function from a different thread/process than the one currently executing a database operation.
from multiprocessing import Process
import time
import vertica_python
def cancel_query(connection, timeout=5):
time.sleep(timeout)
connection.cancel()
# Example 1: Cancel the query before Cursor.execute() return.
# The query stops executing in a shorter time after the cancel message is sent.
with vertica_python.connect(**conn_info) as conn:
cur = conn.cursor()
# Call cancel() from a different process
p1 = Process(target=cancel_query, args=(conn,))
p1.start()
try:
cur.execute("<Long running query>")
except vertica_python.errors.QueryCanceled as e:
pass
p1.join()
# Example 2: Cancel the query after Cursor.execute() return.
# Less number of rows read after the cancel message is sent.
with vertica_python.connect(**conn_info) as conn:
cur = conn.cursor()
cur.execute("SELECT id, time FROM large_table")
nCount = 0
try:
while cur.fetchone():
nCount += 1
if nCount == 100:
conn.cancel()
except vertica_python.errors.QueryCanceled as e:
pass
# nCount is less than the number of rows in large_table
SQL Data conversion to Python objects
When a query is executed and Cursor.fetch*()
is called, SQL data (bytes) are deserialized as Python objects. The following table shows the default mapping from SQL data types to Python objects:
SQL data type | Python object type |
---|---|
NULL | None |
BOOLEAN | bool |
INTEGER | int |
FLOAT | float |
NUMERIC | decimal.Decimal |
CHAR | str |
VARCHAR | str |
LONG VARCHAR | str |
BINARY | bytes |
VARBINARY | bytes |
LONG VARBINARY | bytes |
UUID | uuid.UUID |
DATE | datetime.date[1] |
TIME | datetime.time[2] |
TIMETZ | datetime.time[2] |
TIMESTAMP | datetime.datetime[1] |
TIMESTAMPTZ | datetime.datetime[1] |
INTERVAL | dateutil.relativedelta.relativedelta |
ARRAY | list[3] |
SET | set[3] |
ROW | dict[3] |
MAP | dict[3] |
[1]Python’s datetime.date and datetime.datetime only supports date ranges 0001-01-01 to 9999-12-31. Retrieving a value of BC date or future date (year>9999) results in an error.
[2]Python’s datetime.time only supports times until 23:59:59. Retrieving a value of 24:00:00 results in an error.
[3]If connection option 'request_complex_types' set to False, the server returns all complex types as VARCHAR/LONG VARCHAR Json strings, so the client will convert data to str instead. Server before v12.0.2 cannot provide enough metadata for complex types, the behavior is equal to request_complex_types=False.
Bypass data conversion to Python objects
The Cursor.disable_sqldata_converter
attribute can bypass the result data conversion to Python objects.
with vertica_python.connect(**conn_info) as conn:
cur = conn.cursor()
sql = "select 'foo'::VARCHAR, 100::INT, '2001-12-01 02:50:00'::TIMESTAMP"
#### Convert SQL types to Python objects ####
print(cur.disable_sqldata_converter) # Default is False
# False
cur.execute(sql)
print(cur.fetchall())
# [['foo', 100, datetime.datetime(2001, 12, 1, 2, 50)]]
#### No Conversion: return raw bytes data ####
cur.disable_sqldata_converter = True # Set attribute to True
cur.execute(sql)
print(cur.fetchall())
# [[b'foo', b'100', b'2001-12-01 02:50:00']]
As a result, this can improve query performance when you call fetchall()
but ignore/skip result data. This can also be used when defining customized data converters.
Shortcuts
The Cursor.execute()
method returns self
. This means that you can chain a fetch operation, such as fetchone()
, to the execute()
call:
row = cursor.execute(...).fetchone()
for row in cur.execute(...).fetchall():
...
Rowcount oddities
vertica_python behaves a bit differently than dbapi when returning rowcounts.
After a select execution, the rowcount will be -1, indicating that the row count is unknown. The rowcount value will be updated as data is streamed.
cur.execute('SELECT 10 things')
cur.rowcount == -1 # indicates unknown rowcount
cur.fetchone()
cur.rowcount == 1
cur.fetchone()
cur.rowcount == 2
cur.fetchall()
cur.rowcount == 10
After an insert/update/delete, the rowcount will be returned as a single element row:
cur.execute("DELETE 3 things")
cur.rowcount == -1 # indicates unknown rowcount
cur.fetchone()[0] == 3
UTF-8 encoding issues
While Vertica expects varchars stored to be UTF-8 encoded, sometimes invalid strings get into the database. You can specify how to handle reading these characters using the unicode_error connection option. This uses the same values as the unicode type (https://docs.python.org/3/library/codecs.html#error-handlers)
cur = vertica_python.Connection({..., 'unicode_error': 'strict'}).cursor()
cur.execute(r"SELECT E'\xC2'")
cur.fetchone()
# caught 'utf8' codec can't decode byte 0xc2 in position 0: unexpected end of data
cur = vertica_python.Connection({..., 'unicode_error': 'replace'}).cursor()
cur.execute(r"SELECT E'\xC2'")
cur.fetchone()
# �
cur = vertica_python.Connection({..., 'unicode_error': 'ignore'}).cursor()
cur.execute(r"SELECT E'\xC2'")
cur.fetchone()
#
License
Apache 2.0 License, please see LICENSE
for details.
Contributing guidelines
Have a bug or an idea? Please see CONTRIBUTING.md
for details.
Acknowledgements
We would like to thank the contributors to the Ruby Vertica gem (https://github.com/sprsquish/vertica), as this project gave us inspiration and help in understanding Vertica's wire protocol. These contributors are: