• Stars
    star
    907
  • Rank 50,358 (Top 1.0 %)
  • Language
    Ruby
  • License
    MIT License
  • Created over 6 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Makes your background jobs interruptible and resumable by design.

Job Iteration API

CI

Meet Iteration, an extension for ActiveJob that makes your jobs interruptible and resumable, saving all progress that the job has made (aka checkpoint for jobs).

Background

Imagine the following job:

class SimpleJob < ApplicationJob
  def perform
    User.find_each do |user|
      user.notify_about_something
    end
  end
end

The job would run fairly quickly when you only have a hundred User records. But as the number of records grows, it will take longer for a job to iterate over all Users. Eventually, there will be millions of records to iterate and the job will end up taking hours or even days.

With frequent deploys and worker restarts, it would mean that a job will be either lost or restarted from the beginning. Some records (especially those in the beginning of the relation) will be processed more than once.

Cloud environments are also unpredictable, and there's no way to guarantee that a single job will have reserved hardware to run for hours and days. What if AWS diagnosed the instance as unhealthy and will restart it in 5 minutes? What if a Kubernetes pod is getting evicted? Again, all job progress will be lost. At Shopify, we also use it to interrupt workloads safely when moving tenants between shards and move shards between regions.

Software that is designed for high availability must be resilient to interruptions that come from the infrastructure. That's exactly what Iteration brings to ActiveJob. It's been developed at Shopify to safely process long-running jobs, in Cloud, and has been working in production since May 2017.

We recommend that you watch one of our conference talks about the ideas and history behind Iteration API.

Getting started

Add this line to your application's Gemfile:

gem 'job-iteration'

And then execute:

$ bundle

In the job, include JobIteration::Iteration module and start describing the job with two methods (build_enumerator and each_iteration) instead of perform:

class NotifyUsersJob < ApplicationJob
  include JobIteration::Iteration

  def build_enumerator(cursor:)
    enumerator_builder.active_record_on_records(
      User.all,
      cursor: cursor,
    )
  end

  def each_iteration(user)
    user.notify_about_something
  end
end

each_iteration will be called for each User model in User.all relation. The relation will be ordered by primary key, exactly like find_each does.

Check out more examples of Iterations:

class BatchesJob < ApplicationJob
  include JobIteration::Iteration

  def build_enumerator(product_id, cursor:)
    enumerator_builder.active_record_on_batches(
      Comment.where(product_id: product_id).select(:id),
      cursor: cursor,
      batch_size: 100,
    )
  end

  def each_iteration(batch_of_comments, product_id)
    comment_ids = batch_of_comments.map(&:id)
    CommentService.call(comment_ids: comment_ids)
  end
end
class BatchesAsRelationJob < ApplicationJob
  include JobIteration::Iteration

  def build_enumerator(product_id, cursor:)
    enumerator_builder.active_record_on_batch_relations(
      Product.find(product_id).comments,
      cursor: cursor,
      batch_size: 100,
    )
  end

  def each_iteration(batch_of_comments, product_id)
    # batch_of_comments will be a Comment::ActiveRecord_Relation
    batch_of_comments.update_all(deleted: true)
  end
end
class ArrayJob < ApplicationJob
  include JobIteration::Iteration

  def build_enumerator(cursor:)
    enumerator_builder.array(['build', 'enumerator', 'from', 'any', 'array'], cursor: cursor)
  end

  def each_iteration(array_element)
    # use array_element
  end
end
class CsvJob < ApplicationJob
  include JobIteration::Iteration

  def build_enumerator(import_id, cursor:)
    import = Import.find(import_id)
    enumerator_builder.csv(import.csv, cursor: cursor)
  end

  def each_iteration(csv_row, import_id)
    # insert csv_row to database
  end
end
class NestedIterationJob < ApplicationJob
  include JobIteration::Iteration

  def build_enumerator(cursor:)
    enumerator_builder.nested(
      [
        ->(cursor) { enumerator_builder.active_record_on_records(Shop.all, cursor: cursor) },
        ->(shop, cursor) { enumerator_builder.active_record_on_records(shop.products, cursor: cursor) },
        ->(_shop, product, cursor) { enumerator_builder.active_record_on_batch_relations(product.product_variants, cursor: cursor) }
      ],
      cursor: cursor
    )
  end

  def each_iteration(product_variants_relation)
    # do something
  end
end

Iteration hooks into Sidekiq and Resque out of the box to support graceful interruption. No extra configuration is required.

Guides

For more detailed documentation, see rubydoc.

Requirements

ActiveJob is the primary requirement for Iteration. While there's nothing that prevents it, Iteration is not yet compatible with vanilla Sidekiq API.

API

Iteration job must respond to build_enumerator and each_iteration methods. build_enumerator must return Enumerator object that respects the cursor value.

Sidekiq adapter

Unless you are running on Heroku, we recommend you to tune Sidekiq's timeout option from the default 8 seconds to 25-30 seconds, to allow the last each_iteration to complete and gracefully shutdown.

Resque adapter

There a few configuration assumptions that are required for Iteration to work with Resque. GRACEFUL_TERM must be enabled (giving the job ability to gracefully interrupt), and FORK_PER_JOB is recommended to be disabled (set to false).

FAQ

Why can't I just iterate in #perform method and do whatever I want? You can, but then your job has to comply with a long list of requirements, such as the ones above. This creates leaky abstractions more easily, when instead we can expose a more powerful abstraction for developers--without exposing the underlying infrastructure.

What happens when my job is interrupted? A checkpoint will be persisted to Redis after the current each_iteration, and the job will be re-enqueued. Once it's popped off the queue, the worker will work off from the next iteration.

What happens with retries? An interruption of a job does not count as a retry. If an exception occurs, the job will retry or be discarded as normal using Active Job configuration for the job. If the job retries, it processes the iteration that originally failed and progress will continue from there on if succesful.

What happens if my iteration takes a long time? We recommend that a single each_iteration should take no longer than 30 seconds. In the future, this may raise an exception.

Why is it important that each_iteration takes less than 30 seconds? When the job worker is scheduled for restart or shutdown, it gets a notice to finish remaining unit of work. To guarantee that no progress is lost we need to make sure that each_iteration completes within a reasonable amount of time.

Why do I use have to use this ugly helper in build_enumerator? Why can't you automatically infer it? This is how the first version of the API worked. We checked the type of object returned by build_enumerable, and whether it was ActiveRecord Relation or an Array, we used the matching adapter. This caused opaque type branching in Iteration internals and it didn’t allow developers to craft their own Enumerators and control the cursor value. We made a decision to always return Enumerator instance from build_enumerator. Now we provide explicit helpers to convert ActiveRecord Relation or an Array to Enumerator, and for more complex iteration flows developers can build their own Enumerator objects.

What is the difference between Enumerable and Enumerator? We recomend this post to learn more about Enumerators in Ruby.

My job has a complex flow. How do I write my own Enumerator? Iteration API takes care of persisting the cursor (that you may use to calculate an offset) and controlling the job state. The power of Enumerator object is that you can use the cursor in any way you want. One example is a cursorless job that pops records from a datastore until the job is interrupted:

class MyJob < ApplicationJob
  include JobIteration::Iteration

  def build_enumerator(cursor:)
    Enumerator.new do
      Redis.lpop("mylist") # or: Kafka.poll(timeout: 10.seconds)
    end
  end

  def each_iteration(element_from_redis)
    # ...
  end
end

Credits

This project would not be possible without these individuals (in alphabetical order):

  • Daniella Niyonkuru
  • Emil Stolarsky
  • Florian Weingarten
  • Guillaume Malette
  • Hormoz Kheradmand
  • Mohamed-Adam Chaieb
  • Simon Eskildsen

Development

After checking out the repo, run bin/setup to install dependencies and create mysql database. Then, run bundle exec rake test to run the tests.

To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and tags, and push the .gem file to rubygems.org.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/Shopify/job-iteration. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.

License

The gem is available as open source under the terms of the MIT License.

Code of Conduct

Everyone interacting in the Job::Iteration project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the code of conduct.

More Repositories

1

draggable

The JavaScript Drag & Drop library your grandparents warned you about.
JavaScript
17,927
star
2

dashing

The exceptionally handsome dashboard framework in Ruby and Coffeescript.
JavaScript
11,025
star
3

liquid

Liquid markup language. Safe, customer facing template language for flexible web apps.
Ruby
10,419
star
4

toxiproxy

⏰ 🔥 A TCP proxy to simulate network and system conditions for chaos and resiliency testing
Go
9,412
star
5

react-native-skia

High-performance React Native Graphics using Skia
TypeScript
6,746
star
6

flash-list

A better list for React Native
TypeScript
5,489
star
7

polaris

Shopify’s design system to help us work together to build a great experience for all of our merchants.
TypeScript
5,352
star
8

hydrogen-v1

React-based framework for building dynamic, Shopify-powered custom storefronts.
TypeScript
3,747
star
9

go-lua

A Lua VM in Go
Go
2,773
star
10

bootsnap

Boot large Ruby/Rails apps faster
Ruby
2,614
star
11

graphql-design-tutorial

2,335
star
12

restyle

A type-enforced system for building UI components in React Native with TypeScript.
TypeScript
2,331
star
13

dawn

Shopify's first source available reference theme, with Online Store 2.0 features and performance built-in.
Liquid
2,279
star
14

identity_cache

IdentityCache is a blob level caching solution to plug into Active Record. Don't #find, #fetch!
Ruby
1,874
star
15

quilt

A loosely related set of packages for JavaScript/TypeScript projects at Shopify
TypeScript
1,703
star
16

shopify_app

A Rails Engine for building Shopify Apps
Ruby
1,649
star
17

kubeaudit

kubeaudit helps you audit your Kubernetes clusters against common security controls
Go
1,624
star
18

shipit-engine

Deployment coordination
Ruby
1,406
star
19

graphql-batch

A query batching executor for the graphql gem
Ruby
1,388
star
20

packwerk

Good things come in small packages.
Ruby
1,346
star
21

krane

A command-line tool that helps you ship changes to a Kubernetes namespace and understand the result
Ruby
1,309
star
22

semian

🐒 Resiliency toolkit for Ruby for failing fast
Ruby
1,286
star
23

slate

Slate is a toolkit for developing Shopify themes. It's designed to assist your workflow and speed up the process of developing, testing, and deploying themes.
JavaScript
1,283
star
24

ejson

EJSON is a small library to manage encrypted secrets using asymmetric encryption.
Go
1,246
star
25

superdb

The Super Debugger, a realtime wireless debugger for iOS
Objective-C
1,158
star
26

shopify_python_api

ShopifyAPI library allows Python developers to programmatically access the admin section of stores
Python
1,072
star
27

storefront-api-examples

Example custom storefront applications built on Shopify's Storefront API
JavaScript
1,069
star
28

themekit

Shopify theme development command line tool.
Go
1,068
star
29

Timber

The ultimate Shopify theme framework, built by Shopify.
Liquid
992
star
30

shopify-cli

Shopify CLI helps you build against the Shopify platform faster.
Ruby
987
star
31

shopify-api-ruby

ShopifyAPI is a lightweight gem for accessing the Shopify admin REST and GraphQL web services.
Ruby
982
star
32

hydrogen

Hydrogen is Shopify’s stack for headless commerce. It provides a set of tools, utilities, and best-in-class examples for building dynamic and performant commerce applications. Hydrogen is designed to dovetail with Remix, Shopify’s full stack web framework, but it also provides a React library portable to other supporting frameworks. Demo store 👇🏼
TypeScript
966
star
33

js-buy-sdk

The JS Buy SDK is a lightweight library that allows you to build ecommerce into any website. It is based on Shopify's API and provides the ability to retrieve products and collections from your shop, add products to a cart, and checkout.
JavaScript
932
star
34

cli-ui

Terminal user interface library
Ruby
869
star
35

react-native-performance

Performance monitoring for React Native apps
TypeScript
860
star
36

ruby-lsp

An opinionated language server for Ruby
Ruby
851
star
37

active_shipping

ActiveShipping is a simple shipping abstraction library extracted from Shopify
Ruby
809
star
38

shopify-api-js

Shopify Admin API Library for Node. Accelerate development with support for authentication, graphql proxy, webhooks
TypeScript
765
star
39

tapioca

The swiss army knife of RBI generation
Ruby
733
star
40

maintenance_tasks

A Rails engine for queueing and managing data migrations.
Ruby
705
star
41

shopify-app-template-node

JavaScript
701
star
42

remote-ui

TypeScript
701
star
43

erb_lint

Lint your ERB or HTML files
Ruby
651
star
44

shopify_theme

A console tool for interacting with Shopify Theme Assets.
Ruby
640
star
45

pitchfork

Ruby
630
star
46

ghostferry

The swiss army knife of live data migrations
Go
596
star
47

yjit

Optimizing JIT compiler built inside CRuby
593
star
48

statsd-instrument

A StatsD client for Ruby apps. Provides metaprogramming methods to inject StatsD instrumentation into your code.
Ruby
546
star
49

autotuner

Get suggestions to tune Ruby's garbage collector
Ruby
511
star
50

shopify.github.com

A collection of the open source projects by Shopify
CSS
505
star
51

ruby-style-guide

Shopify’s Ruby Style Guide
Ruby
475
star
52

theme-scripts

Theme Scripts is a collection of utility libraries which help theme developers with problems unique to Shopify Themes.
JavaScript
470
star
53

livedata-ktx

Kotlin extension for LiveData, chaining like RxJava
Kotlin
468
star
54

starter-theme

The Shopify Themes Team opinionated starting point for new a Slate project
Liquid
459
star
55

shopify-demo-app-node-react

JavaScript
444
star
56

web-configs

Common configurations for building web apps at Shopify
JavaScript
433
star
57

mobile-buy-sdk-ios

Shopify’s Mobile Buy SDK makes it simple to sell physical products inside your mobile app. With a few lines of code, you can connect your app with the Shopify platform and let your users buy your products using Apple Pay or their credit card.
Swift
433
star
58

shopify_django_app

Get a Shopify app up and running with Django and Python Shopify API
Python
425
star
59

deprecation_toolkit

⚒Eliminate deprecations from your codebase ⚒
Ruby
390
star
60

ruby-lsp-rails

A Ruby LSP extension for Rails
Ruby
388
star
61

bootboot

Dualboot your Ruby app made easy
Ruby
374
star
62

FunctionalTableData

Declarative UITableViewDataSource implementation
Swift
365
star
63

shadowenv

reversible directory-local environment variable manipulations
Rust
349
star
64

shopify-node-app

An example app that uses Polaris components and shopify-express
JavaScript
327
star
65

polaris-viz

A collection of React and React native components that compose Shopify's data visualization system
TypeScript
317
star
66

better-html

Better HTML for Rails
Ruby
311
star
67

theme-check

The Ultimate Shopify Theme Linter
Ruby
306
star
68

product-reviews-sample-app

A sample Shopify application that creates and stores product reviews for a store, written in Node.js
JavaScript
300
star
69

tracky

The easiest way to do motion tracking!
Swift
295
star
70

shopify-api-php

PHP
279
star
71

measured

Encapsulate measurements and their units in Ruby.
Ruby
275
star
72

cli

Build apps, themes, and hydrogen storefronts for Shopify
TypeScript
273
star
73

money

Manage money in Shopify with a class that won't lose pennies during division
Ruby
265
star
74

javascript

The home for all things JavaScript at Shopify.
253
star
75

ruvy

Rust
252
star
76

limiter

Simple Ruby rate limiting mechanism.
Ruby
244
star
77

vscode-ruby-lsp

VS Code plugin for connecting with the Ruby LSP
TypeScript
232
star
78

ruby_memcheck

Use Valgrind memcheck on your native gem without going crazy
Ruby
230
star
79

polaris-tokens

Design tokens for Polaris, Shopify’s design system
TypeScript
230
star
80

buy-button-js

BuyButton.js is a highly customizable UI library for adding ecommerce functionality to any website.
JavaScript
230
star
81

android-testify

Add screenshots to your Android tests
Kotlin
225
star
82

spoom

Useful tools for Sorbet enthusiasts
Ruby
220
star
83

turbograft

Hard fork of turbolinks, adding partial page replacement strategies, and utilities.
JavaScript
213
star
84

mobile-buy-sdk-android

Shopify’s Mobile Buy SDK makes it simple to sell physical products inside your mobile app. With a few lines of code, you can connect your app with the Shopify platform and let your users buy your products using their credit card.
Java
202
star
85

graphql-js-client

A Relay compliant GraphQL client.
JavaScript
187
star
86

shopify-app-template-php

PHP
186
star
87

skeleton-theme

A barebones ☠️starter theme with the required files needed to compile with Slate and upload to Shopify.
Liquid
185
star
88

sprockets-commoner

Use Babel in Sprockets to compile JavaScript modules for the browser
Ruby
182
star
89

rotoscope

High-performance logger of Ruby method invocations
Ruby
180
star
90

shopify-app-template-remix

TypeScript
178
star
91

git-chain

Tool to rebase multiple Git branches based on the previous one.
Ruby
176
star
92

verdict

Framework to define and implement A/B tests in your application, and collect data for analysis purposes.
Ruby
176
star
93

hydrogen-react

Reusable components and utilities for building Shopify-powered custom storefronts.
TypeScript
174
star
94

ui-extensions

TypeScript
173
star
95

storefront-api-learning-kit

JavaScript
171
star
96

heap-profiler

Ruby heap profiler
C++
159
star
97

autoload_reloader

Experimental implementation of code reloading using Ruby's autoload
Ruby
158
star
98

app_profiler

Collect performance profiles for your Rails application.
Ruby
157
star
99

graphql-metrics

Extract as much much detail as you want from GraphQL queries, served up from your Ruby app and the graphql gem.
Ruby
157
star
100

active_fulfillment

Active Merchant library for integration with order fulfillment services
Ruby
155
star