• Stars
    star
    341
  • Rank 123,998 (Top 3 %)
  • Language
    Python
  • License
    MIT License
  • Created about 4 years ago
  • Updated about 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

SQLAlchemy driver for DuckDB

duckdb_engine

Supported Python Versions PyPI version PyPI Downloads codecov

Basic SQLAlchemy driver for DuckDB

Installation

$ pip install duckdb-engine

DuckDB Engine also has a conda feedstock available, the instructions for the use of which are available in it's repository.

Usage

Once you've installed this package, you should be able to just use it, as SQLAlchemy does a python path search

from sqlalchemy import Column, Integer, Sequence, String, create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm.session import Session

Base = declarative_base()


class FakeModel(Base):  # type: ignore
    __tablename__ = "fake"

    id = Column(Integer, Sequence("fakemodel_id_sequence"), primary_key=True)
    name = Column(String)


eng = create_engine("duckdb:///:memory:")
Base.metadata.create_all(eng)
session = Session(bind=eng)

session.add(FakeModel(name="Frank"))
session.commit()

frank = session.query(FakeModel).one()

assert frank.name == "Frank"

Usage in IPython/Jupyter

With IPython-SQL and DuckDB-Engine you can query DuckDB natively in your notebook! Check out DuckDB's documentation or Alex Monahan's great demo of this on his blog.

Configuration

You can configure DuckDB by passing connect_args to the create_engine function

create_engine(
    'duckdb:///:memory:',
    connect_args={
        'read_only': True,
        'config': {
            'memory_limit': '500mb'
        }
    }
)

The supported configuration parameters are listed in the DuckDB docs

How to register a pandas DataFrame

eng = create_engine("duckdb:///:memory:")
eng.execute("register", ("dataframe_name", pd.DataFrame(...)))

eng.execute("select * from dataframe_name")

Things to keep in mind

Duckdb's SQL parser is based on the PostgreSQL parser, but not all features in PostgreSQL are supported in duckdb. Because the duckdb_engine dialect is derived from the postgresql dialect, SQLAlchemy may try to use PostgreSQL-only features. Below are some caveats to look out for.

Auto-incrementing ID columns

When defining an Integer column as a primary key, SQLAlchemy uses the SERIAL datatype for PostgreSQL. Duckdb does not yet support this datatype because it's a non-standard PostgreSQL legacy type, so a workaround is to use the SQLAlchemy.Sequence() object to auto-increment the key. For more information on sequences, you can find the SQLAlchemy Sequence documentation here.

The following example demonstrates how to create an auto-incrementing ID column for a simple table:

>>> import sqlalchemy
>>> engine = sqlalchemy.create_engine('duckdb:////path/to/duck.db')
>>> metadata = sqlalchemy.MetaData(engine)
>>> user_id_seq = sqlalchemy.Sequence('user_id_seq')
>>> users_table = sqlalchemy.Table(
...     'users',
...     metadata,
...     sqlalchemy.Column(
...         'id',
...         sqlalchemy.Integer,
...         user_id_seq,
...         server_default=user_id_seq.next_value(),
...         primary_key=True,
...     ),
... )
>>> metadata.create_all(bind=engine)

Pandas read_sql() chunksize

NOTE: this is no longer an issue in versions >=0.5.0 of duckdb

The pandas.read_sql() method can read tables from duckdb_engine into DataFrames, but the sqlalchemy.engine.result.ResultProxy trips up when fetchmany() is called. Therefore, for now chunksize=None (default) is necessary when reading duckdb tables into DataFrames. For example:

>>> import pandas as pd
>>> import sqlalchemy
>>> engine = sqlalchemy.create_engine('duckdb:////path/to/duck.db')
>>> df = pd.read_sql('users', engine)                ### Works as expected
>>> df = pd.read_sql('users', engine, chunksize=25)  ### Throws an exception

Unsigned integer support

Unsigned integers are supported by DuckDB, and are available in duckdb_engine.datatypes.

Preloading extensions (experimental)

Until the DuckDB python client allows you to natively preload extensions, I've added experimental support via a connect_args parameter

from sqlalchemy import create_engine

create_engine(
    'duckdb:///:memory:',
    connect_args={
        'preload_extensions': ['https'],
        'config': {
            's3_region': 'ap-southeast-1'
        }
    }
)

The name

Yes, I'm aware this package should be named duckdb-driver or something, I wasn't thinking when I named it and it's too hard to change the name now

More Repositories

1

duckdb-deltatable-extension

A purely experimental DuckDB Deltalake extension
Rust
93
star
2

duckdb-extension-framework

Rust
53
star
3

rust_wasm_duckdb

A test to integrate the wasm version of duckdb with rust in the browser
Rust
8
star
4

websocket-notifications

Moderately simple websocket and python based webkit notifications pusher "system"
JavaScript
6
star
5

rpc

Very simple python rabbitmq/pika rpc library
Python
5
star
6

vercel-openapi

TypeScript
3
star
7

media

A merging of various media source APIs, Plex, and Transmission
HTML
3
star
8

transit_dashboard

Dart
3
star
9

houndipy

An attempt at a python api client for houndify (https://www.houndify.com/)
Python
3
star
10

elliana.gay

HTML
2
star
11

duckdb-ast

Python
2
star
12

tyrian

Tyrian is a implementation of lisp in python bytecode... hopefully
Python
2
star
13

duckdb_website_search

TypeScript
2
star
14

garbage-art-generator

Simple python-based art generator
Python
2
star
15

Delser

A product code generator ported from Delphi
C++
2
star
16

material_you_colours

Kotlin
1
star
17

Mausembler

Quick and dirty DCPU16 assembler; probably never going to be complete
Python
1
star
18

more

Haxe 3 compatible fork of https://gitorious.org/more/more
Haxe
1
star
19

duckdb_codespace_env

Dockerfile
1
star
20

DCPU16_Test_Environment

Quick test environment, built using an extracted official emulator
1
star
21

remote_rec

Python
1
star
22

dcputoolchain-module-site

A GitHub blob abstraction service, to serve lua based hardware, preprocessor, and debugger plugins for the DCPUToolchain from the DCPUModules repository
Python
1
star
23

financial-dash

https://financial-dash.vc.mause.me/openapi.yaml
TypeScript
1
star
24

simvm

Silly little vm & assembler in C
C
1
star
25

vercel-jwt-auth

TypeScript
1
star
26

pytransperth

A silly attempt at creating an api for transperth's JourneyPlanner
Python
1
star
27

duckdb_ppa

Python
1
star
28

ChooseFood

Dart
1
star
29

pull_requests

Python
1
star
30

mause.github.com

My own site
CSS
1
star
31

sea_lang

Silly toy language in C
C
1
star
32

DoSH

Dom's own SHell
C
1
star