• Stars
    star
    818
  • Rank 55,717 (Top 2 %)
  • Language
    Ruby
  • License
    MIT License
  • Created about 9 years ago
  • Updated almost 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Efficient bulk inserts with ActiveRecord

BulkInsert

A little ActiveRecord extension for helping to insert lots of rows in a single insert statement.

Installation

Add it to your Gemfile:

gem 'bulk_insert'

Usage

BulkInsert adds a new class method to your ActiveRecord models:

class Book < ActiveRecord::Base
end

book_attrs = ... # some array of hashes, for instance
Book.bulk_insert do |worker|
  book_attrs.each do |attrs|
    worker.add(attrs)
  end
end

All of those #add calls will be accumulated into a single SQL insert statement, vastly improving the performance of multiple sequential inserts (think data imports and the like).

If you don't like using a block API, you can also simply pass an array of rows to be inserted:

book_attrs = ... # some array of hashes, for instance
Book.bulk_insert values: book_attrs

By default, the columns to be inserted will be all columns in the table, minus the id column, but if you want, you can explicitly enumerate the columns:

Book.bulk_insert(:title, :author) do |worker|
  # specify a row as an array of values...
  worker.add ["Eye of the World", "Robert Jordan"]

  # or as a hash
  worker.add title: "Lord of Light", author: "Roger Zelazny"
end

It will automatically set created_at/updated_at columns to the current date, as well.

Book.bulk_insert(:title, :author, :created_at, :updated_at) do |worker|
  # specify created_at/updated_at explicitly...
  worker.add ["The Chosen", "Chaim Potok", Time.now, Time.now]

  # or let BulkInsert set them by default...
  worker.add ["Hello Ruby", "Linda Liukas"]
end

Similarly, if a value is omitted, BulkInsert will use whatever default value is defined for that column in the database:

# create_table :books do |t|
#   ...
#   t.string "medium", default: "paper"
#   ...
# end

Book.bulk_insert(:title, :author, :medium) do |worker|
  worker.add title: "Ender's Game", author: "Orson Scott Card"
end

Book.first.medium #-> "paper"

By default, the batch is always saved when the block finishes, but you can explicitly save inside the block whenever you want, by calling #save! on the worker:

Book.bulk_insert do |worker|
  worker.add(...)
  worker.add(...)

  worker.save!

  worker.add(...)
  #...
end

That will save the batch as it has been defined to that point, and then empty the batch so that you can add more rows to it if you want. Note that all records saved together will have the same created_at/updated_at timestamp (unless one was explicitly set).

Batch Set Size

By default, the size of the insert is limited to 500 rows at a time. This is called the set size. If you add another row that causes the set to exceed the set size, the insert statement is automatically built and executed, and the batch is reset.

If you want a larger (or smaller) set size, you can specify it in two ways:

# specify set_size when initializing the bulk insert...
Book.bulk_insert(set_size: 100) do |worker|
  # ...
end

# specify it on the worker directly...
Book.bulk_insert do |worker|
  worker.set_size = 100
  # ...
end

Insert Ignore

By default, when an insert fails the whole batch of inserts fail. The ignore option ignores the inserts that would have failed (because of duplicate keys or a null in column with a not null constraint) and inserts the rest of the batch.

This is not the default because no errors are raised for the bad inserts in the batch.

destination_columns = [:title, :author]

# Ignore bad inserts in the batch
Book.bulk_insert(*destination_columns, ignore: true) do |worker|
  worker.add(...)
  worker.add(...)
  # ...
end

Update Duplicates (MySQL, PostgreSQL)

If you don't want to ignore duplicate rows but instead want to update them then you can use the update_duplicates option. Set this option to true (MySQL) or list unique column names (PostgreSQL) and when a duplicate row is found the row will be updated with your new values. Default value for this option is false.

destination_columns = [:title, :author]

# Update duplicate rows (MySQL)
Book.bulk_insert(*destination_columns, update_duplicates: true) do |worker|
  worker.add(...)
  worker.add(...)
  # ...
end

# Update duplicate rows (PostgreSQL)
Book.bulk_insert(*destination_columns, update_duplicates: %w[title]) do |worker|
  worker.add(...)
  # ...
end

Return Primary Keys (PostgreSQL, PostGIS)

If you want the worker to store primary keys of inserted records, then you can use the return_primary_keys option. The worker will store a result_sets array of ActiveRecord::Result objects. Each ActiveRecord::Result object will contain the primary keys of a batch of inserted records.

worker = Book.bulk_insert(*destination_columns, return_primary_keys: true) do
|worker|
  worker.add(...)
  worker.add(...)
  # ...
end

worker.result_sets

Ruby and Rails Versions Supported

⚠️ The scope of this gem may be somehow covered natively by the .insert_all API introduced by Rails 6. This gem represents the state of art for rails version < 6 and it is still open to further developments for more recent versions.

The current CI prevents regressions on the following versions:

ruby / rails ~>3 ~>4 ~>5 ~>6
2.2 yes yes no no
2.3 yes yes yes no
2.4 no yes yes no
2.5 no no yes yes
2.6 no no yes yes
2.7 no no yes yes

The adapters covered in the CI are:

  • sqlite
  • mysql
  • postgresql

License

BulkInsert is released under the MIT license (see MIT-LICENSE) by Jamis Buck ([email protected]).

More Repositories

1

bucketwise

ATTENTION: This project is no longer being updated. If you're still interested, feel free to read on... "A web-based personal finance manager with a focus on non-OCD budgeting and avoiding credit card debt"
Ruby
457
star
2

csmazes

Maze algorithms implemented in CoffeeScript, with an eye toward demonstrating how the algorithms work by animating them.
CoffeeScript
391
star
3

castaway

System for building screencasts and video presentations
Ruby
306
star
4

fuzzyfinder_textmate

A vim script that extends the fuzzyfinder plugin to support TextMate style file searches (e.g. cmd-T) (Unmaintained now, see http://weblog.jamisbuck.org/2009/1/28/the-future-of-fuzzyfinder-textmate)
Vim Script
216
star
5

theseus

A very flexible random maze generator, solver, and renderer for Ruby
Ruby
178
star
6

query-composer

A library for composing complex SQL queries by defining their subcomponents and the dependencies between them.
Ruby
166
star
7

fuzzy_file_finder

A (slightly enhanced) implementation of TextMate's cmd-T lookup functionality, in Ruby, for embedding in other projects
Ruby
143
star
8

wordsearch

A word-search puzzle generator
Ruby
76
star
9

safe_mass_assignment

ActiveRecord plugin for allowing (careful) mass assignment of protected attributes, separate from values provided via users of your application.
Ruby
55
star
10

net-ssh-multi

SSH connection multiplexing: execute commands simultaneously on multiple hosts via SSH
Ruby
44
star
11

sqlpp

A simplistic SQL parser and pretty-printer
Ruby
42
star
12

impose

A utility and library for imposition -- arranging pages on a sheet of paper for optimal printing
Ruby
37
star
13

dnd-dungeon

A random maze generator in C, with a CGI front-end for generating random dungeons for D&D, 3rd edition
35
star
14

kaleidoscope

Generate uniform tilings (tesselations) of a plane using Wythoff constructions. Not as hard (or scary) as it sounds!
Ruby
32
star
15

net-ssh-shell

NOTE: this repository is no longer actively maintained. Please go to the actively maintained repository, here: https://github.com/mitchellh/net-ssh-shell. Net::SSH::Shell is a net-ssh extension library that provides an API for programmatically interacting with a login shell
Ruby
26
star
16

ifrb

Interactive Fiction for Interactive Ruby
Ruby
23
star
17

mod_reproxy

A module for Apache 2 that implements support for the X-Reproxy-Url response header
C
23
star
18

net-ssh-gateway

THIS REPOSITORY IS NO LONGER MAINTAINED. Please see https://github.com/net-ssh/net-ssh-gateway for the currently maintained version. Thanks! -- A gateway class for tunneling connections via SSH over a forwarded port.
Ruby
23
star
19

amazing-desktops

A simple utility for generating random abstract images (using mazes) for use as desktop wallpaper.
C
21
star
20

MazeMaker

An implementation of grid layouts and maze algorithms, in Swift
Swift
20
star
21

logic-engine

Prolog-inspired logic engine in Ruby, with backtracking
16
star
22

rtc-ocaml

"The Ray Tracer Challenge" (http://www.raytracerchallenge.com) implemented in OCaml
OCaml
15
star
23

code_slide

Generate PDF/PNG slides from source code
Ruby
12
star
24

curves

A library for interpolating various curves (bezier, cubic hermite, etc.)
Ruby
12
star
25

mazoo

An HTML5 game to test your maze navigation skills!
JavaScript
11
star
26

zing

Framework for playful maze generation (from "Twisty Little Passages" presentation at MWRC 2015)
Ruby
11
star
27

process_roulette

A silly little game that could mess up your machine pretty badly (please use a VM!)
Ruby
9
star
28

chaussettes

A thin wrapper around the sox audio manipulation utility
Ruby
9
star
29

dnd-npc

A random NPC generator for D&D 3rd edition, written in C. Includes CGI and console interfaces.
C
8
star
30

truth

A utility for displaying a truth table for an expression
Ruby
8
star
31

celtic_knot

A library for generating Celtic Knotwork designs from graphs
Ruby
8
star
32

lindy

An L-system parser and interpreter
Ruby
7
star
33

dnd-util

An encapsulation (in C) of the core logic and data of D&D, 3rd edition.
6
star
34

KSP-RealSolarSystem-Bundler

Program for bundling all needed dependencies for the "Real Solar System" mod, for Kerbal Space Program.
Ruby
6
star
35

ekawada-web

A rails application for recording, comparing, and researching string figures
Ruby
6
star
36

weekly-challenges

My submissions for the weekly programming challenges (https://medium.com/@jamis/weekly-programming-challenge-1-55b63b9d2a1)
Ruby
6
star
37

scruffy-labrador

A flexible JavaScript implementation of a grid/graph, and some maze generation algorithms
JavaScript
6
star
38

artifex

A D&D4e NPC generator
CoffeeScript
5
star
39

jamis.github.io

Basil & Fabian - A Wizard & His Man
CoffeeScript
5
star
40

hercule

A logic puzzle for PalmOS (historical interest only, mostly, unless you have a really old device)
C
5
star
41

dnd-templates

A templating system written in C. (Deprecated, obsolete, etc!)
4
star
42

sqlite-ruby

bindings for the SQLite 2.x embedded database
Ruby
4
star
43

korean-proverbs

Translations of Korean proverbs
4
star
44

runeo

An ActiveRecord-inspired wrapper for the Neo4j REST API
Ruby
4
star
45

dnd-writetem

A templating system written in C, with a stream wrapper system. (Deprecated, obsolete, etc!)
3
star
46

hangul-tools

Romanize Korean text
Ruby
3
star
47

piece-by-piece

An evidence-oriented genealogical research database, inspired by the GENTECH data model
CoffeeScript
3
star
48

tinker

CYOA-style game system inspired by Sarah Allen's "pie" project.
Ruby
3
star
49

strings2go

A string figure collection for the iPhone
Ruby
2
star
50

kincaid

A DSL for creating dungeon maps for tabletop RPG's
Ruby
2
star
51

test_session_manager

Allow tests for Rails applications to inject session data (including flash) into test requests
Ruby
1
star
52

buckblog

The Buckblog -- assorted ramblings by Jamis Buck -- http://weblog.jamisbuck.org
HTML
1
star
53

derring-do

The over-eager progress monitor for Ruby.
Ruby
1
star
54

taleswapper

A home for creating and sharing stories, with an emphasis on role-playing games
Ruby
1
star