• Stars
    star
    1,808
  • Rank 25,684 (Top 0.6 %)
  • Language
    Ruby
  • License
    BSD 3-Clause "New...
  • Created over 13 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Online MySQL schema migrations

Large Hadron Migrator Build Status

Rails style database migrations are a useful way to evolve your data schema in an agile manner. Most Rails projects start like this, and at first, making changes is fast and easy.

That is until your tables grow to millions of records. At this point, the locking nature of ALTER TABLE may take your site down for an hour or more while critical tables are migrated. In order to avoid this, developers begin to design around the problem by introducing join tables or moving the data into another layer. Development gets less and less agile as tables grow and grow. To make the problem worse, adding or changing indices to optimize data access becomes just as difficult.

Side effects may include black holes and universe implosion.

There are few things that can be done at the server or engine level. It is possible to change default values in an ALTER TABLE without locking the table. The InnoDB Plugin provides facilities for online index creation, which is great if you are using this engine, but only solves half the problem.

At SoundCloud we started having migration pains quite a while ago, and after looking around for third party solutions, we decided to create our own. We called it Large Hadron Migrator, and it is a gem for online ActiveRecord migrations.

LHC

The Large Hadron collider at CERN

The idea

The basic idea is to perform the migration online while the system is live, without locking the table. In contrast to OAK and the facebook tool, we only use a copy table and triggers.

The Large Hadron is a test driven Ruby solution which can easily be dropped into an ActiveRecord migration. It presumes a single auto incremented numerical primary key called id as per the Rails convention. Unlike the twitter solution, it does not require the presence of an indexed updated_at column.

Requirements

Lhm currently only works with MySQL databases and requires an established ActiveRecord connection.

It is compatible and continuously tested with MRI 2.0.x, 2.1.x, ActiveRecord 3.2.x and 4.x (mysql and mysql2 adapters).

Limitations

Due to the Chunker implementation, Lhm requires that the table to migrate has a single integer numeric key column called id.

Another note about the Chunker, it performs static sized row copies against the id column. Therefore sparse assignment of id can cause performance problems for the backfills. Typically LHM assumes that id is an auto_increment style column.

Installation

Install it via gem install lhm or add gem "lhm" to your Gemfile.

Usage

You can invoke Lhm directly from a plain ruby file after connecting ActiveRecord to your mysql instance:

require 'lhm'

ActiveRecord::Base.establish_connection(
  :adapter => 'mysql',
  :host => '127.0.0.1',
  :database => 'lhm'
)

# and migrate
Lhm.change_table :users do |m|
  m.add_column :arbitrary, "INT(12)"
  m.add_column :locale, "VARCHAR(2) NOT NULL DEFAULT 'en'"
  m.add_index  [:arbitrary_id, :created_at]
  m.ddl("alter table %s add column flag tinyint(1)" % m.name)
end

To use Lhm from an ActiveRecord::Migration in a Rails project, add it to your Gemfile, then invoke as follows:

require 'lhm'

class MigrateUsers < ActiveRecord::Migration
  def self.up
    Lhm.change_table :users do |m|
      m.add_column :arbitrary, "INT(12)"
      m.add_index  [:arbitrary_id, :created_at]
      m.ddl("alter table %s add column flag tinyint(1)" % m.name)
    end
  end

  def self.down
    Lhm.change_table :users do |m|
      m.remove_index  [:arbitrary_id, :created_at]
      m.remove_column :arbitrary
    end
  end
end

Note: Lhm won't delete the old, leftover table. This is on purpose, in order to prevent accidental data loss.

Throttler

Lhm is using a throttle mechanism to read data in your original table.

By default, 40000 rows are read each 0.1 second.

If you want to change that behaviour, you can pass an instance of a throttler with the throttler option.

In this example, 1000 rows will be read with a 10 seconds delay between each processing:

my_throttler = Lhm::Throttler::Time.new(stride: 1000, delay: 10)

Lhm.change_table :users, throttler: my_throttler  do |m|
  #
end

SlaveLag Throttler

Lhm uses by default the time throttler, however a better solution is to throttle the copy of the data depending on the time that the slaves are behind. To use the SlaveLag throttler:

Lhm.change_table :users, throttler: :slave_lag_throttler  do |m|
  #
end

Or to set that as default throttler, use the following (for instance in a Rails initializer):

Lhm.setup_throttler(:slave_lag_throttler)

Table rename strategies

There are two different table rename strategies available: LockedSwitcher and AtomicSwitcher.

The LockedSwitcher strategy locks the table being migrated and issues two ALTER TABLE statements. The AtomicSwitcher uses a single atomic RENAME TABLE query and is the favored solution.

Lhm chooses AtomicSwitcher if no strategy is specified, unless your version of MySQL is affected by binlog bug #39675. If your version is affected, Lhm will raise an error if you don't specify a strategy. You're recommended to use the LockedSwitcher in these cases to avoid replication issues.

To specify the strategy in your migration:

Lhm.change_table :users, :atomic_switch => true do |m|
  # ...
end

Limiting the data that is migrated

For instances where you want to limit the data that is migrated to the new table by some conditions, you may tell the migration to filter by a set of conditions:

Lhm.change_table(:sounds) do |m|
  m.filter("inner join users on users.`id` = sounds.`user_id` and sounds.`public` = 1")
end

Note that this SQL will be inserted into the copy directly after the "from" statement - so be sure to use inner/outer join syntax and not cross joins. These conditions will not affect the triggers, so any modifications to the table during the run will happen on the new table as well.

Cleaning up after an interrupted Lhm run

If an Lhm migration is interrupted, it may leave behind the temporary tables and/or triggers used in the migration. If the migration is re-started, the unexpected presence of these tables will cause an error.

In this case, Lhm.cleanup can be used to drop any orphaned Lhm temporary tables or triggers.

To see what Lhm tables/triggers are found:

Lhm.cleanup

To remove any Lhm tables/triggers found:

Lhm.cleanup(:run)

Optionally only remove tables up to a specific Time, if you want to retain previous migrations.

Rails:

Lhm.cleanup(:run, until: 1.day.ago)

Ruby:

Lhm.cleanup(:run, until: Time.now - 86400)

Contributing

First, get set up for local development:

git clone git://github.com/soundcloud/lhm.git
cd lhm

To run the tests, follow the instructions on spec/README.

We'll check out your contribution if you:

  • Provide a comprehensive suite of tests for your fork.
  • Have a clear and documented rationale for your changes.
  • Package these up in a pull request.

We'll do our best to help you out with any contribution issues you may have.

License

The license is included as LICENSE in this directory.

Similar solutions

More Repositories

1

roshi

Roshi is a large-scale CRDT set implementation for timestamped events.
Go
3,107
star
2

lightcycle

LightCycle lets self-contained classes respond to Androidโ€™s lifecycle events
Java
706
star
3

soundcloud-custom-player

The SoundCloud custom javascript based player
JavaScript
699
star
4

chunk-manifest-webpack-plugin

Allows exporting a manifest that maps entry chunk names to their output files, instead of keeping the mapping inside the webpack bootstrap.
JavaScript
393
star
5

soundcloud-javascript

Official SoundCloud Javascript SDK
JavaScript
382
star
6

areweplayingyet

html5 audio benchmarks
JavaScript
312
star
7

cosine-lsh-join-spark

Approximate Nearest Neighbors in Spark
Scala
175
star
8

Axt

SwiftUI view testing library
Swift
163
star
9

Widget-JS-API

This is the official SoundCloud Widget Javascript API
JavaScript
149
star
10

delect

The Gradle Plugin for Dagger Reflect.
Kotlin
137
star
11

api

A public repo for our Developer Community to engage about bugs and feature requests on our Public API
136
star
12

periskop

Exception Monitoring Service
Go
123
star
13

project-dev-kpis

Key Performance Indicators of product development teams.
Python
119
star
14

soundcloud-python

A Python wrapper around the Soundcloud API
Python
95
star
15

split-by-name-webpack-plugin

Split a Webpack entry bundle into any number of arbitrarily defined smaller bundles
JavaScript
80
star
16

spark-pagerank

PageRank in Spark
Scala
74
star
17

intervene

A machine-in-the-middle proxy for development, enabling mocking and/or modification of API endpoints
JavaScript
71
star
18

normailize

Normalize emails like [email protected] into [email protected]
Ruby
67
star
19

SoundCloud-API-jQuery-plugin

SoundCloud API jQuery plugin
JavaScript
52
star
20

spdt

Streaming Parallel Decision Tree
Scala
51
star
21

twinagle

Twinagle = Twirp + Finagle
Scala
50
star
22

prometheus-clj

Clojure wrappers for the Prometheus java client
Clojure
49
star
23

simple_circuit_breaker

Simple Ruby implementation of the Circuit Breaker design pattern
Ruby
28
star
24

git-sha-webpack-plugin

Tag your webpack bundles with a Git SHA linked to the latest commit on that bundle
JavaScript
27
star
25

remixin

Mixin library for Javascript
JavaScript
24
star
26

cando

A simple access rights gem with users, roles and capabilities
Ruby
22
star
27

move-to-parent-merging-webpack-plugin

JavaScript
19
star
28

MinimalPerfectHashes.jl

An implementation of minimal perfect hash function generation as described in Czech et. al. 1992.
Julia
16
star
29

ogg

Mirror of http://svn.xiph.org/trunk/ogg/
C
11
star
30

sc-gaws

Glue code to wrap around AWS and do useful things in Go
Go
9
star
31

vorbis

Mirror of http://svn.xiph.org/trunk/vorbis/
C
8
star
32

collins_exporter

Simple Collins exporter for Prometheus
Go
8
star
33

dns-endpoint-pool

Manage and load-balance a pool of service endpoints retrieved from a DNS lookup for a service discovery name.
JavaScript
7
star
34

tremor

Mirror of http://svn.xiph.org/trunk/Tremor
C
5
star
35

soundcloud-ruby

Official SoundCloud API Wrapper for Ruby.
Ruby
5
star
36

periskop-scala

Scala low level client for Periskop
Scala
3
star
37

knife-scrub

Knife plugin to scrub normal attributes
Ruby
1
star
38

go-runit

go library wrapping runit service status
Go
1
star