• Stars
    star
    171
  • Rank 222,266 (Top 5 %)
  • Language
    R
  • License
    Other
  • Created over 6 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A fast, easy-to-use database library for R

dbx

๐Ÿ”ฅ A fast, easy-to-use database library for R

  • Intuitive functions
  • High performance batch operations
  • Safe inserts, updates, and deletes without writing SQL
  • Upserts!!
  • Great date and time support
  • Works well with auto-incrementing primary keys
  • Built on top of DBI

Designed for both research and production environments

Supports Postgres, MySQL, MariaDB, SQLite, and more

Build Status CRAN status

Installation

Install dbx

install.packages("dbx")

And follow the instructions for your database

To install with Jetpack, use:

jetpack::add("dbx")

Postgres

Install the R package

install.packages("RPostgres")

And use:

library(dbx)

db <- dbxConnect(adapter="postgres", dbname="mydb")

You can also pass user, password, host, port, and url.

Works with RPostgreSQL as well

MySQL & MariaDB

Install the R package

install.packages("RMySQL")

And use:

library(dbx)

db <- dbxConnect(adapter="mysql", dbname="mydb")

You can also pass user, password, host, port, and url.

Works with RMariaDB as well

SQLite

Install the R package

install.packages("RSQLite")

And use:

library(dbx)

db <- dbxConnect(adapter="sqlite", dbname=":memory:")

SQL Server

Install the R package

install.packages("odbc")

And use:

library(dbx)

db <- dbxConnect(adapter=odbc::odbc(), database="mydb")

You can also pass uid, pwd, server, and port.

Redshift

For Redshift, follow the Postgres instructions.

Others

Install the appropriate R package and use:

db <- dbxConnect(adapter=odbc::odbc(), database="mydb")

Operations

Select

Create a data frame of records from a SQL query

records <- dbxSelect(db, "SELECT * FROM forecasts")

Pass parameters

dbxSelect(db, "SELECT * FROM forecasts WHERE period = ? AND temperature > ?", params=list("hour", 27))

Parameters can also be vectors

dbxSelect(db, "SELECT * FROM forecasts WHERE id IN (?)", params=list(1:3))

Insert

Insert records

table <- "forecasts"
records <- data.frame(temperature=c(32, 25))
dbxInsert(db, table, records)

If you use auto-incrementing ids in Postgres, you can get the ids of newly inserted rows by passing the column name:

dbxInsert(db, table, records, returning=c("id"))

Update

Update records

records <- data.frame(id=c(1, 2), temperature=c(16, 13))
dbxUpdate(db, table, records, where_cols=c("id"))

Use where_cols to specify the columns used for lookup. Other columns are written to the table.

Updates are batched when possible, but often need to be run as multiple queries. We recommend upsert when possible for better performance, as it can always be run as a single query. Turn on logging to see the difference.

Upsert

Only available for PostgreSQL 9.5+, MySQL 5.5+, and SQLite 3.24+

Atomically insert if they donโ€™t exist, otherwise update them

records <- data.frame(id=c(2, 3), temperature=c(20, 25))
dbxUpsert(db, table, records, where_cols=c("id"))

Use where_cols to specify the columns used for lookup. There must be a unique index on them, or an error will be thrown.

To skip existing rows instead of updating them, use:

dbxUpsert(db, table, records, where_cols=c("id"), skip_existing=TRUE)

If you use auto-incrementing ids in Postgres, you can get the ids of newly upserted rows by passing the column name:

dbxUpsert(db, table, records, where_cols=c("id"), returning=c("id"))

Delete

Delete specific records

bad_records <- data.frame(id=c(1, 2))
dbxDelete(db, table, where=bad_records)

Delete all records (uses TRUNCATE when possible for performance)

dbxDelete(db, table)

Execute

Execute a statement

dbxExecute(db, "UPDATE forecasts SET temperature = temperature + 1")

Pass parameters

dbxExecute(db, "UPDATE forecasts SET temperature = ? WHERE id IN (?)", params=list(27, 1:3))

Logging

Log all SQL queries with:

options(dbx_logging=TRUE)

Customize logging by passing a function

logQuery <- function(sql) {
  # your logging code
}

options(dbx_logging=logQuery)

Database Credentials

Environment variables are a convenient way to store database credentials. This keeps them outside your source control. Itโ€™s also how platforms like Heroku store them.

Create an .Renviron file in your home directory with:

DATABASE_URL=postgres://user:pass@host/dbname

Install urltools:

install.packages("urltools")

And use:

db <- dbxConnect()

If you have multiple databases, use a different variable name, and:

db <- dbxConnect(url=Sys.getenv("OTHER_DATABASE_URL"))

You can also use a package like keyring.

Batching

By default, operations are performed in a single statement or transaction. This is better for performance and prevents partial writes on failures. However, when working with large data frames on production systems, it can be better to break writes into batches. Use the batch_size option to do this.

dbxInsert(db, table, records, batch_size=1000)
dbxUpdate(db, table, records, where_cols, batch_size=1000)
dbxUpsert(db, table, records, where_cols, batch_size=1000)
dbxDelete(db, table, records, where, batch_size=1000)

Query Comments

Add comments to queries to make it easier to see where time-consuming queries are coming from.

options(dbx_comment=TRUE)

The comment will be appended to queries, like:

SELECT * FROM users /*script:forecast.R*/

Set a custom comment with:

options(dbx_comment="hi")

Transactions

To perform multiple operations in a single transaction, use:

DBI::dbWithTransaction(db, {
  dbxInsert(db, ...)
  dbxDelete(db, ...)
})

For updates inside a transaction, use:

dbxUpdate(db, transaction=FALSE)

Schemas

To specify a schema, use:

table <- DBI::Id(schema="schema", table="table")

Data Type Notes

Dates & Times

Dates are returned as Date objects and times as POSIXct objects. Times are stored in the database in UTC and converted to your local time zone when retrieved.

Times without dates are returned as character vectors since R has no built-in support for this type. If you use hms, you can convert columns with:

records$column <- hms::as_hms(records$column)

SQLite does not have support for TIME columns, so we recommend storing as VARCHAR.

JSON

JSON and JSONB columns are returned as character vectors. You can use jsonlite to parse them with:

records$column <- lapply(records$column, jsonlite::fromJSON)

SQLite does not have support for JSON columns, so we recommend storing as TEXT.

Binary Data

BLOB and BYTEA columns are returned as raw vectors.

Data Type Limitations

Dates & Times

RSQLite does not currently provide enough info to automatically typecast dates and times. You can manually typecast date columns with:

records$column <- as.Date(records$column)

And time columns with:

records$column <- as.POSIXct(records$column, tz="Etc/UTC")
attr(records$column, "tzone") <- Sys.timezone()

Booleans

RMariaDB and RSQLite do not currently provide enough info to automatically typecast booleans. You can manually typecast with:

records$column <- records$column != 0

JSON

RMariaDB does not currently support JSON.

Binary Data

RMySQL can write BLOB columns, but canโ€™t retrieve them directly. To workaround this, use:

records <- dbxSelect(db, "SELECT HEX(column) AS column FROM table")

hexToRaw <- function(x) {
  y <- strsplit(x, "")[[1]]
  z <- paste0(y[c(TRUE, FALSE)], y[c(FALSE, TRUE)])
  as.raw(as.hexmode(z))
}

records$column <- lapply(records$column, hexToRaw)

Bigint

BIGINT columns are returned as numeric vectors. The numeric type in R loses precision above 253. Some libraries (RPostgres, RMariaDB, RSQLite, ODBC) support returning bit64::integer64 vectors instead.

dbxConnect(bigint="integer64")

Connection Pooling

Install the pool package

install.packages("pool")

Create a pool

library(pool)

factory <- function() {
  dbxConnect(adapter="postgres", ...)
}

pool <- poolCreate(factory, maxSize=5)

Run queries

conn <- poolCheckout(pool)

tryCatch({
  dbxSelect(conn, "SELECT * FROM forecasts")
}, finally={
  poolReturn(conn)
})

In the future, dbx commands may work directly with pools.

Security

When connecting to a database over a network you donโ€™t fully trust, make sure your connection is secure.

With Postgres, use:

db <- dbxConnect(adapter="postgres", sslmode="verify-full", sslrootcert="ca.pem")

With RMariaDB, use:

db <- dbxConnect(adapter="mysql", ssl.ca="ca.pem")

Please let us know if you have a way that works with RMySQL.

Variables

Set session variables with:

db <- dbxConnect(variables=list(search_path="archive"))

Timeouts

Set a statement timeout with:

# Postgres
db <- dbxConnect(variables=list(statement_timeout=1000)) # ms

# MySQL 5.7.8+
db <- dbxConnect(variables=list(max_execution_time=1000)) # ms

# MariaDB 10.1.1+
db <- dbxConnect(variables=list(max_statement_time=1)) # sec

With Postgres, set a connect timeout with:

db <- dbxConnect(connect_timeout=3) # sec

Compatibility

All connections are simply DBI connections, so you can use them anywhere you use DBI.

dbCreateTable(db, ...)

Install dbplyr to use data with dplyr.

forecasts <- tbl(db, "forecasts")

Reference

To close a connection, use:

dbxDisconnect(db)

Upgrading

0.2.0

Version 0.2.0 brings a number of fixes and improvements to data types.

However, there a few breaking changes to be aware of:

  • The dbxInsert and dbxUpsert functions no longer return a data frame by default. For MySQL and SQLite, the data frame was just the records argument. For Postgres, if you use auto-incrementing primary keys, the data frame contained ids of the newly inserted/upserted records. To get the ids, pass name of the column as the returning argument:

    dbxInsert(db, table, records, returning=c("id"))
  • timestamp without time zone columns in Postgres are now stored in UTC instead of local time by default. This does not affect timestamp with time zone columns. To keep the previous behavior, use:

    dbxConnect(adapter="postgres", storage_tz=Sys.timezone(), ...)

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

To get started with development:

git clone https://github.com/ankane/dbx.git
cd dbx

# create Postgres database
createdb dbx_test

# create MySQL database
mysqladmin create dbx_test

In R, do:

install.packages("devtools")
devtools::install_deps(dependencies=TRUE)
devtools::test()

More Repositories

1

pghero

A performance dashboard for Postgres
Ruby
7,123
star
2

searchkick

Intelligent search made easy
Ruby
6,257
star
3

chartkick

Create beautiful JavaScript charts with one line of Ruby
Ruby
6,157
star
4

blazer

Business intelligence made simple
Ruby
4,511
star
5

ahoy

Simple, powerful, first-party analytics for Rails
Ruby
3,872
star
6

strong_migrations

Catch unsafe migrations in development
Ruby
3,662
star
7

groupdate

The simplest way to group temporal data
Ruby
3,617
star
8

pgsync

Sync data from one Postgres database to another
Ruby
2,787
star
9

the-ultimate-guide-to-ruby-timeouts

Timeouts for popular Ruby gems
Ruby
2,212
star
10

production_rails

Best practices for running Rails in production
1,975
star
11

dexter

The automatic indexer for Postgres
Ruby
1,491
star
12

lockbox

Modern encryption for Ruby and Rails
Ruby
1,290
star
13

chartkick.js

Create beautiful charts with one line of JavaScript
JavaScript
1,211
star
14

react-chartkick

Create beautiful JavaScript charts with one line of React
JavaScript
1,183
star
15

pretender

Log in as another user in Rails
Ruby
1,124
star
16

ahoy_email

First-party email analytics for Rails
Ruby
1,051
star
17

secure_rails

Rails security best practices
954
star
18

pgslice

Postgres partitioning as easy as pie
Ruby
953
star
19

mailkick

Email subscriptions for Rails
Ruby
847
star
20

vue-chartkick

Create beautiful JavaScript charts with one line of Vue
JavaScript
747
star
21

eps

Machine learning for Ruby
Ruby
609
star
22

awesome-legal

Awesome free legal documents for companies
589
star
23

searchjoy

Search analytics made easy
Ruby
579
star
24

polars-ruby

Blazingly fast DataFrames for Ruby
Ruby
563
star
25

torch.rb

Deep learning for Ruby, powered by LibTorch
Ruby
552
star
26

blind_index

Securely search encrypted database fields
Ruby
470
star
27

safely

Rescue and report exceptions in non-critical code
Ruby
470
star
28

authtrail

Track Devise login activity
Ruby
466
star
29

multiverse

Multiple databases for Rails ๐ŸŽ‰
Ruby
463
star
30

ahoy.js

Simple, powerful JavaScript analytics
JavaScript
463
star
31

hightop

A nice shortcut for group count queries
Ruby
462
star
32

field_test

A/B testing for Rails
Ruby
460
star
33

s3tk

A security toolkit for Amazon S3
Python
439
star
34

disco

Recommendations for Ruby and Rails using collaborative filtering
Ruby
431
star
35

active_median

Median and percentile for Active Record, Mongoid, arrays, and hashes
Ruby
427
star
36

informers

State-of-the-art natural language processing for Ruby
Ruby
417
star
37

notable

Track notable requests and background jobs
Ruby
402
star
38

shorts

Short, random tutorials and posts
379
star
39

tensorflow-ruby

Deep learning for Ruby
Ruby
350
star
40

distribute_reads

Scale database reads to replicas in Rails
Ruby
328
star
41

slowpoke

Rack::Timeout enhancements for Rails
Ruby
327
star
42

prophet-ruby

Time series forecasting for Ruby
Ruby
321
star
43

rover

Simple, powerful data frames for Ruby
Ruby
311
star
44

groupdate.sql

The simplest way to group temporal data
PLpgSQL
280
star
45

kms_encrypted

Simple, secure key management for Lockbox and attr_encrypted
Ruby
235
star
46

jetpack

A friendly package manager for R
R
234
star
47

neighbor

Nearest neighbor search for Rails and Postgres
Ruby
230
star
48

rollup

Rollup time-series data in Rails
Ruby
230
star
49

hypershield

Shield sensitive data in Postgres and MySQL
Ruby
227
star
50

logstop

Keep personal data out of your logs
Ruby
218
star
51

pdscan

Scan your data stores for unencrypted personal data (PII)
Go
213
star
52

delete_in_batches

Fast batch deletes for Active Record and Postgres
Ruby
202
star
53

vega-ruby

Interactive charts for Ruby, powered by Vega and Vega-Lite
Ruby
192
star
54

mapkick

Create beautiful JavaScript maps with one line of Ruby
Ruby
173
star
55

fastText-ruby

Efficient text classification and representation learning for Ruby
Ruby
162
star
56

autosuggest

Autocomplete suggestions based on what your users search
Ruby
162
star
57

swipeout

Swipe-to-delete goodness for the mobile web
JavaScript
159
star
58

pghero.sql

Postgres insights made easy
PLpgSQL
154
star
59

mainstreet

Address verification for Ruby and Rails
Ruby
149
star
60

or-tools-ruby

Operations research tools for Ruby
Ruby
139
star
61

mapkick.js

Create beautiful, interactive maps with one line of JavaScript
JavaScript
138
star
62

trend-ruby

Anomaly detection and forecasting for Ruby
Ruby
128
star
63

mitie-ruby

Named-entity recognition for Ruby
Ruby
122
star
64

barkick

Barcodes made easy
Ruby
120
star
65

ownership

Code ownership for Rails
Ruby
111
star
66

anomaly

Easy-to-use anomaly detection for Ruby
Ruby
98
star
67

errbase

Common exception reporting for a variety of services
Ruby
87
star
68

tokenizers-ruby

Fast state-of-the-art tokenizers for Ruby
Rust
81
star
69

ip_anonymizer

IP address anonymizer for Ruby and Rails
Ruby
79
star
70

str_enum

String enums for Rails
Ruby
75
star
71

faiss-ruby

Efficient similarity search and clustering for Ruby
C++
73
star
72

trend-api

Anomaly detection and forecasting API
R
71
star
73

archer

Rails console history for Heroku, Docker, and more
Ruby
70
star
74

onnxruntime-ruby

Run ONNX models in Ruby
Ruby
70
star
75

xgboost-ruby

High performance gradient boosting for Ruby
Ruby
69
star
76

secure-spreadsheet

Encrypt and password protect sensitive CSV and XLSX files
JavaScript
66
star
77

active_hll

HyperLogLog for Rails and Postgres
Ruby
66
star
78

guess

Statistical gender detection for Ruby
Ruby
60
star
79

morph

An encrypted, in-memory, key-value store
C++
59
star
80

lightgbm-ruby

High performance gradient boosting for Ruby
Ruby
56
star
81

midas-ruby

Edge stream anomaly detection for Ruby
Ruby
54
star
82

moves

Ruby client for Moves
Ruby
54
star
83

blingfire-ruby

High speed text tokenization for Ruby
Ruby
54
star
84

vowpalwabbit-ruby

Fast online machine learning for Ruby
Ruby
52
star
85

xlearn-ruby

High performance factorization machines for Ruby
Ruby
51
star
86

tomoto-ruby

High performance topic modeling for Ruby
C++
51
star
87

trove

Deploy machine learning models in Ruby (and Rails)
Ruby
50
star
88

mapkick-static

Create beautiful static maps with one line of Ruby
Ruby
42
star
89

ahoy_events

Simple, powerful event tracking for Rails
Ruby
42
star
90

practical-search

Letโ€™s make search a better experience for our users
40
star
91

breakout-ruby

Breakout detection for Ruby
Ruby
40
star
92

plu

Price look-up codes made easy
Ruby
40
star
93

ngt-ruby

High-speed approximate nearest neighbors for Ruby
Ruby
39
star
94

gindex

Concurrent index migrations for Rails
Ruby
39
star
95

clockwork_web

A web interface for Clockwork
Ruby
38
star
96

ahoy_guide

A foundation of knowledge and libraries for solid analytics
38
star
97

notable_web

A web interface for Notable
HTML
36
star
98

AnomalyDetection.rb

Time series anomaly detection for Ruby
Ruby
34
star
99

khiva-ruby

High-performance time series algorithms for Ruby
Ruby
34
star
100

immudb-ruby

Ruby client for immudb, the immutable database
Ruby
34
star