• Stars
    star
    233
  • Rank 166,702 (Top 4 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created over 7 years ago
  • Updated 5 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Python DB-API client for Presto

Build Status

Introduction

This package provides a client interface to query Presto a distributed SQL engine. It supports Python 2.7, 3.5, 3.6, 3.7, and pypy.

Installation

$ pip install presto-python-client

Quick Start

Use the DBAPI interface to query Presto:

import prestodb
conn=prestodb.dbapi.connect(
    host='localhost',
    port=8080,
    user='the-user',
    catalog='the-catalog',
    schema='the-schema',
)
cur = conn.cursor()
cur.execute('SELECT * FROM system.runtime.nodes')
rows = cur.fetchall()

This will query the system.runtime.nodes system tables that shows the nodes in the Presto cluster.

The DBAPI implementation in prestodb.dbapi provides methods to retrieve fewer rows for example Cursorfetchone() or Cursor.fetchmany(). By default Cursor.fetchmany() fetches one row. Please set prestodb.dbapi.Cursor.arraysize accordingly.

Basic Authentication

The BasicAuthentication class can be used to connect to a LDAP-configured Presto cluster:

import prestodb
conn=prestodb.dbapi.connect(
    host='coordinator url',
    port=8443,
    user='the-user',
    catalog='the-catalog',
    schema='the-schema',
    http_scheme='https',
    auth=prestodb.auth.BasicAuthentication("principal id", "password"),
)
cur = conn.cursor()
cur.execute('SELECT * FROM system.runtime.nodes')
rows = cur.fetchall()

Oauth Authentication

To enable GCS access, Oauth authentication support is added by passing in a shadow.json file of a service account. Following example shows a use case where both Kerberos and Oauth authentication are enabled.

import getpass
import prestodb
from prestodb.client import PrestoRequest, PrestoQuery
from requests_kerberos import DISABLED

kerberos_auth = prestodb.auth.KerberosAuthentication(
   mutual_authentication=DISABLED,
   service_name='kerberos service name',
   force_preemptive=True,
   hostname_override='example.com'
)

req = PrestoRequest(
    host='GCP coordinator url',
    port=443,
    user=getpass.getuser(),
    service_account_file='Service account json file path',
    http_scheme='https',
    auth=kerberos_auth
)

query = PrestoQuery(req, "SELECT * FROM system.runtime.nodes")
rows = list(query.execute())

Transactions

The client runs by default in autocommit mode. To enable transactions, set isolation_level to a value different than IsolationLevel.AUTOCOMMIT:

import prestodb
from prestodb import transaction
with prestodb.dbapi.connect(
    host='localhost',
    port=8080,
    user='the-user',
    catalog='the-catalog',
    schema='the-schema',
    isolation_level=transaction.IsolationLevel.REPEATABLE_READ,
) as conn:
  cur = conn.cursor()
  cur.execute('INSERT INTO sometable VALUES (1, 2, 3)')
  cur.execute('INSERT INTO sometable VALUES (4, 5, 6)')

The transaction is created when the first SQL statement is executed. prestodb.dbapi.Connection.commit() will be automatically called when the code exits the with context and the queries succeed, otherwise `prestodb.dbapi.Connection.rollback()' will be called.

Running Tests

There is a helper scripts, run, that provides commands to run tests. Type ./run tests to run both unit and integration tests.

presto-python-client uses pytest for its tests. To run only unit tests, type:

$ pytest tests

Then you can pass options like --pdb or anything supported by pytest --help.

To run the tests with different versions of Python in managed virtualenvs, use tox (see the configuration in tox.ini):

$ tox

To run integration tests:

$ pytest integration_tests

They build a Docker image and then run a container with a Presto server:

  • the image is named presto-server:${PRESTO_VERSION}
  • the container is named presto-python-client-tests-{uuid4()[:7]}

The container is expected to be removed after the tests are finished.

Please refer to the Dockerfile for details. You will find the configuration in etc/.

You can use ./run to manipulate the containers:

  • ./run presto_server: build and run Presto in a container
  • ./run presto_cli CONTAINER_ID: connect the Java Presto CLI to a container
  • ./run list: list the running containers
  • ./run clean: kill the containers

Development

Start by forking the repository and then modify the code in your fork. Please refer to CONTRIBUTING.md before submitting your contributions.

Clone the repository and go inside the code directory. Then you can get the version with python setup.py --version.

We recommend that you use virtualenv to develop on presto-python-client:

$ virtualenv /path/to/env
$ /path/to/env/bin/activate
$ pip install -r requirements.txt

For development purpose, pip can reference the code you are modifying in a virtualenv:

$ pip install -e .[tests]

That way, you do not need to run pip install again to make your changes applied to the virtualenv.

When the code is ready, submit a Pull Request.

Need Help?

Feel free to create an issue as it make your request visible to other users and contributors.

If an interactive discussion would be better or if you just want to hangout and chat about the Presto Python client, you can join us on the #presto-python-client channel on Slack.

More Repositories

1

presto

The official home of the Presto distributed SQL query engine for big data
Java
15,571
star
2

presto-go-client

A Presto client for the Go programming language.
Go
223
star
3

presto-admin

A tool to install, configure and manage Presto installations
Python
170
star
4

RPresto

DBI-based adapter for Presto for the statistical programming language R.
R
129
star
5

tempto

A testing framework for Presto
Java
61
star
6

presto-yarn

Java
57
star
7

f8-2019-demo

A tutorial on how to get started with Presto.
Jupyter Notebook
56
star
8

ambari-presto-service

Ambari service for Presto
Python
44
star
9

docker-images

Docker images for Presto integration testing
Dockerfile
35
star
10

presto-kubernetes-operator

Go
31
star
11

benchto

Framework for running macro benchmarks in a clustered environment
JavaScript
23
star
12

presto-hive-apache

Shaded version of Apache Hive for Presto
Java
19
star
13

presto-jdbc-java6

Presto JDBC driver compatible with Java 6
Java
18
star
14

prestorials

Tutorials and examples of how to deploy Presto and connect it to different data sources
14
star
15

presto-hadoop-apache2

Shaded version of Apache Hadoop 2.x for Presto
Java
13
star
16

presto-query-predictor

A query predictor pipeline and service to predict resource usages of Presto queries
Python
13
star
17

presto-landscape

Presto Landscape
10
star
18

sql

A Modern SQL frontend based on SQL16 with extensions for streaming, graph, rich types, etc, including parser, resolver, rewriters, etc.
Java
10
star
19

presto-js-client

Monorepo for Presto JavaScript packages
TypeScript
9
star
20

presto-helm-charts

Presto Helm Charts
8
star
21

presto-hadoop-cdh4

Shaded version of CDH4 Hadoop for Presto
Java
7
star
22

prestodb.github.io

Presto website
6
star
23

tsc

Presto Foundation TSC
6
star
24

presto-maven-plugin

Maven packaging and lifecycle for Presto plugins
Java
6
star
25

orc-protobuf

Compiled ORC protobuf definitions
4
star
26

foundation

Presto Foundation
3
star
27

presto-hive-jdbc

Shaded version of Apache Hive JDBC driver for Presto
3
star
28

presto-hadoop-apache1

Shaded version of Apache Hadoop 1.x for Presto
Java
3
star
29

presto-release-tools

Standard utilities to release Presto
Java
2
star
30

testing-mysql-server

Embedded MySQL server for use in tests
Java
1
star
31

artwork

Presto artwork assets. Please adhere to the Linux Foundation trademark policy.
1
star
32

presto-hive-dwrf

Presto Hive DWRF
Java
1
star
33

presto-cassandra-driver

Shaded version of DataStax Java Driver for Apache Cassandra
Java
1
star