• Stars
    star
    907
  • Rank 48,218 (Top 1.0 %)
  • Language
    Ruby
  • License
    MIT License
  • Created almost 6 years ago
  • Updated 11 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Makes your background jobs interruptible and resumable by design.

Job Iteration API

CI

Meet Iteration, an extension for ActiveJob that makes your jobs interruptible and resumable, saving all progress that the job has made (aka checkpoint for jobs).

Background

Imagine the following job:

class SimpleJob < ApplicationJob
  def perform
    User.find_each do |user|
      user.notify_about_something
    end
  end
end

The job would run fairly quickly when you only have a hundred User records. But as the number of records grows, it will take longer for a job to iterate over all Users. Eventually, there will be millions of records to iterate and the job will end up taking hours or even days.

With frequent deploys and worker restarts, it would mean that a job will be either lost or restarted from the beginning. Some records (especially those in the beginning of the relation) will be processed more than once.

Cloud environments are also unpredictable, and there's no way to guarantee that a single job will have reserved hardware to run for hours and days. What if AWS diagnosed the instance as unhealthy and will restart it in 5 minutes? What if a Kubernetes pod is getting evicted? Again, all job progress will be lost. At Shopify, we also use it to interrupt workloads safely when moving tenants between shards and move shards between regions.

Software that is designed for high availability must be resilient to interruptions that come from the infrastructure. That's exactly what Iteration brings to ActiveJob. It's been developed at Shopify to safely process long-running jobs, in Cloud, and has been working in production since May 2017.

We recommend that you watch one of our conference talks about the ideas and history behind Iteration API.

Getting started

Add this line to your application's Gemfile:

gem 'job-iteration'

And then execute:

$ bundle

In the job, include JobIteration::Iteration module and start describing the job with two methods (build_enumerator and each_iteration) instead of perform:

class NotifyUsersJob < ApplicationJob
  include JobIteration::Iteration

  def build_enumerator(cursor:)
    enumerator_builder.active_record_on_records(
      User.all,
      cursor: cursor,
    )
  end

  def each_iteration(user)
    user.notify_about_something
  end
end

each_iteration will be called for each User model in User.all relation. The relation will be ordered by primary key, exactly like find_each does.

Check out more examples of Iterations:

class BatchesJob < ApplicationJob
  include JobIteration::Iteration

  def build_enumerator(product_id, cursor:)
    enumerator_builder.active_record_on_batches(
      Comment.where(product_id: product_id).select(:id),
      cursor: cursor,
      batch_size: 100,
    )
  end

  def each_iteration(batch_of_comments, product_id)
    comment_ids = batch_of_comments.map(&:id)
    CommentService.call(comment_ids: comment_ids)
  end
end
class BatchesAsRelationJob < ApplicationJob
  include JobIteration::Iteration

  def build_enumerator(product_id, cursor:)
    enumerator_builder.active_record_on_batch_relations(
      Product.find(product_id).comments,
      cursor: cursor,
      batch_size: 100,
    )
  end

  def each_iteration(batch_of_comments, product_id)
    # batch_of_comments will be a Comment::ActiveRecord_Relation
    batch_of_comments.update_all(deleted: true)
  end
end
class ArrayJob < ApplicationJob
  include JobIteration::Iteration

  def build_enumerator(cursor:)
    enumerator_builder.array(['build', 'enumerator', 'from', 'any', 'array'], cursor: cursor)
  end

  def each_iteration(array_element)
    # use array_element
  end
end
class CsvJob < ApplicationJob
  include JobIteration::Iteration

  def build_enumerator(import_id, cursor:)
    import = Import.find(import_id)
    enumerator_builder.csv(import.csv, cursor: cursor)
  end

  def each_iteration(csv_row, import_id)
    # insert csv_row to database
  end
end
class NestedIterationJob < ApplicationJob
  include JobIteration::Iteration

  def build_enumerator(cursor:)
    enumerator_builder.nested(
      [
        ->(cursor) { enumerator_builder.active_record_on_records(Shop.all, cursor: cursor) },
        ->(shop, cursor) { enumerator_builder.active_record_on_records(shop.products, cursor: cursor) },
        ->(_shop, product, cursor) { enumerator_builder.active_record_on_batch_relations(product.product_variants, cursor: cursor) }
      ],
      cursor: cursor
    )
  end

  def each_iteration(product_variants_relation)
    # do something
  end
end

Iteration hooks into Sidekiq and Resque out of the box to support graceful interruption. No extra configuration is required.

Guides

For more detailed documentation, see rubydoc.

Requirements

ActiveJob is the primary requirement for Iteration. While there's nothing that prevents it, Iteration is not yet compatible with vanilla Sidekiq API.

API

Iteration job must respond to build_enumerator and each_iteration methods. build_enumerator must return Enumerator object that respects the cursor value.

Sidekiq adapter

Unless you are running on Heroku, we recommend you to tune Sidekiq's timeout option from the default 8 seconds to 25-30 seconds, to allow the last each_iteration to complete and gracefully shutdown.

Resque adapter

There a few configuration assumptions that are required for Iteration to work with Resque. GRACEFUL_TERM must be enabled (giving the job ability to gracefully interrupt), and FORK_PER_JOB is recommended to be disabled (set to false).

FAQ

Why can't I just iterate in #perform method and do whatever I want? You can, but then your job has to comply with a long list of requirements, such as the ones above. This creates leaky abstractions more easily, when instead we can expose a more powerful abstraction for developers--without exposing the underlying infrastructure.

What happens when my job is interrupted? A checkpoint will be persisted to Redis after the current each_iteration, and the job will be re-enqueued. Once it's popped off the queue, the worker will work off from the next iteration.

What happens with retries? An interruption of a job does not count as a retry. If an exception occurs, the job will retry or be discarded as normal using Active Job configuration for the job. If the job retries, it processes the iteration that originally failed and progress will continue from there on if succesful.

What happens if my iteration takes a long time? We recommend that a single each_iteration should take no longer than 30 seconds. In the future, this may raise an exception.

Why is it important that each_iteration takes less than 30 seconds? When the job worker is scheduled for restart or shutdown, it gets a notice to finish remaining unit of work. To guarantee that no progress is lost we need to make sure that each_iteration completes within a reasonable amount of time.

Why do I use have to use this ugly helper in build_enumerator? Why can't you automatically infer it? This is how the first version of the API worked. We checked the type of object returned by build_enumerable, and whether it was ActiveRecord Relation or an Array, we used the matching adapter. This caused opaque type branching in Iteration internals and it didn’t allow developers to craft their own Enumerators and control the cursor value. We made a decision to always return Enumerator instance from build_enumerator. Now we provide explicit helpers to convert ActiveRecord Relation or an Array to Enumerator, and for more complex iteration flows developers can build their own Enumerator objects.

What is the difference between Enumerable and Enumerator? We recomend this post to learn more about Enumerators in Ruby.

My job has a complex flow. How do I write my own Enumerator? Iteration API takes care of persisting the cursor (that you may use to calculate an offset) and controlling the job state. The power of Enumerator object is that you can use the cursor in any way you want. One example is a cursorless job that pops records from a datastore until the job is interrupted:

class MyJob < ApplicationJob
  include JobIteration::Iteration

  def build_enumerator(cursor:)
    Enumerator.new do
      Redis.lpop("mylist") # or: Kafka.poll(timeout: 10.seconds)
    end
  end

  def each_iteration(element_from_redis)
    # ...
  end
end

Credits

This project would not be possible without these individuals (in alphabetical order):

  • Daniella Niyonkuru
  • Emil Stolarsky
  • Florian Weingarten
  • Guillaume Malette
  • Hormoz Kheradmand
  • Mohamed-Adam Chaieb
  • Simon Eskildsen

Development

After checking out the repo, run bin/setup to install dependencies and create mysql database. Then, run bundle exec rake test to run the tests.

To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and tags, and push the .gem file to rubygems.org.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/Shopify/job-iteration. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.

License

The gem is available as open source under the terms of the MIT License.

Code of Conduct

Everyone interacting in the Job::Iteration project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the code of conduct.

More Repositories

1

draggable

The JavaScript Drag & Drop library your grandparents warned you about.
JavaScript
17,454
star
2

dashing

The exceptionally handsome dashboard framework in Ruby and Coffeescript.
JavaScript
11,025
star
3

liquid

Liquid markup language. Safe, customer facing template language for flexible web apps.
Ruby
10,419
star
4

toxiproxy

⏰ 🔥 A TCP proxy to simulate network and system conditions for chaos and resiliency testing
Go
9,412
star
5

react-native-skia

High-performance React Native Graphics using Skia
TypeScript
6,377
star
6

polaris

Shopify’s design system to help us work together to build a great experience for all of our merchants.
TypeScript
5,352
star
7

flash-list

A better list for React Native
TypeScript
4,536
star
8

hydrogen-v1

React-based framework for building dynamic, Shopify-powered custom storefronts.
TypeScript
3,760
star
9

go-lua

A Lua VM in Go
Go
2,773
star
10

bootsnap

Boot large Ruby/Rails apps faster
Ruby
2,614
star
11

graphql-design-tutorial

2,335
star
12

restyle

A type-enforced system for building UI components in React Native with TypeScript.
TypeScript
2,331
star
13

dawn

Shopify's first source available reference theme, with Online Store 2.0 features and performance built-in.
Liquid
2,279
star
14

identity_cache

IdentityCache is a blob level caching solution to plug into Active Record. Don't #find, #fetch!
Ruby
1,874
star
15

shopify_app

A Rails Engine for building Shopify Apps
Ruby
1,649
star
16

kubeaudit

kubeaudit helps you audit your Kubernetes clusters against common security controls
Go
1,624
star
17

quilt

A loosely related set of packages for JavaScript/TypeScript projects at Shopify
TypeScript
1,570
star
18

graphql-batch

A query batching executor for the graphql gem
Ruby
1,388
star
19

shipit-engine

Deployment coordination
Ruby
1,382
star
20

packwerk

Good things come in small packages.
Ruby
1,346
star
21

krane

A command-line tool that helps you ship changes to a Kubernetes namespace and understand the result
Ruby
1,309
star
22

semian

🐒 Resiliency toolkit for Ruby for failing fast
Ruby
1,286
star
23

slate

Slate is a toolkit for developing Shopify themes. It's designed to assist your workflow and speed up the process of developing, testing, and deploying themes.
JavaScript
1,281
star
24

ejson

EJSON is a small library to manage encrypted secrets using asymmetric encryption.
Go
1,246
star
25

superdb

The Super Debugger, a realtime wireless debugger for iOS
Objective-C
1,158
star
26

shopify_python_api

ShopifyAPI library allows Python developers to programmatically access the admin section of stores
Python
1,072
star
27

storefront-api-examples

Example custom storefront applications built on Shopify's Storefront API
JavaScript
1,069
star
28

themekit

Shopify theme development command line tool.
Go
1,068
star
29

Timber

The ultimate Shopify theme framework, built by Shopify.
Liquid
992
star
30

shopify-cli

Shopify CLI helps you build against the Shopify platform faster.
Ruby
987
star
31

shopify-api-ruby

ShopifyAPI is a lightweight gem for accessing the Shopify admin REST and GraphQL web services.
Ruby
982
star
32

hydrogen

Hydrogen is Shopify’s stack for headless commerce. It provides a set of tools, utilities, and best-in-class examples for building dynamic and performant commerce applications. Hydrogen is designed to dovetail with Remix, Shopify’s full stack web framework, but it also provides a React library portable to other supporting frameworks. Demo store 👇🏼
TypeScript
966
star
33

js-buy-sdk

The JS Buy SDK is a lightweight library that allows you to build ecommerce into any website. It is based on Shopify's API and provides the ability to retrieve products and collections from your shop, add products to a cart, and checkout.
JavaScript
932
star
34

cli-ui

Terminal user interface library
Ruby
869
star
35

ruby-lsp

An opinionated language server for Ruby
Ruby
851
star
36

react-native-performance

Performance monitoring for React Native apps
TypeScript
843
star
37

active_shipping

ActiveShipping is a simple shipping abstraction library extracted from Shopify
Ruby
809
star
38

shopify-api-js

Shopify Admin API Library for Node. Accelerate development with support for authentication, graphql proxy, webhooks
TypeScript
765
star
39

maintenance_tasks

A Rails engine for queueing and managing data migrations.
Ruby
705
star
40

shopify-app-template-node

JavaScript
701
star
41

remote-ui

TypeScript
701
star
42

shopify_theme

A console tool for interacting with Shopify Theme Assets.
Ruby
640
star
43

tapioca

The swiss army knife of RBI generation
Ruby
636
star
44

pitchfork

Ruby
630
star
45

ghostferry

The swiss army knife of live data migrations
Go
596
star
46

yjit

Optimizing JIT compiler built inside CRuby
593
star
47

erb-lint

Lint your ERB or HTML files
Ruby
565
star
48

statsd-instrument

A StatsD client for Ruby apps. Provides metaprogramming methods to inject StatsD instrumentation into your code.
Ruby
546
star
49

shopify.github.com

A collection of the open source projects by Shopify
CSS
505
star
50

theme-scripts

Theme Scripts is a collection of utility libraries which help theme developers with problems unique to Shopify Themes.
JavaScript
470
star
51

livedata-ktx

Kotlin extension for LiveData, chaining like RxJava
Kotlin
467
star
52

starter-theme

The Shopify Themes Team opinionated starting point for new a Slate project
Liquid
459
star
53

ruby-style-guide

Shopify’s Ruby Style Guide
Ruby
446
star
54

shopify-demo-app-node-react

JavaScript
444
star
55

web-configs

Common configurations for building web apps at Shopify
JavaScript
433
star
56

mobile-buy-sdk-ios

Shopify’s Mobile Buy SDK makes it simple to sell physical products inside your mobile app. With a few lines of code, you can connect your app with the Shopify platform and let your users buy your products using Apple Pay or their credit card.
Swift
433
star
57

shopify_django_app

Get a Shopify app up and running with Django and Python Shopify API
Python
425
star
58

deprecation_toolkit

⚒Eliminate deprecations from your codebase ⚒
Ruby
390
star
59

ruby-lsp-rails

A Ruby LSP extension for Rails
Ruby
388
star
60

bootboot

Dualboot your Ruby app made easy
Ruby
374
star
61

FunctionalTableData

Declarative UITableViewDataSource implementation
Swift
365
star
62

shadowenv

reversible directory-local environment variable manipulations
Rust
349
star
63

shopify-node-app

An example app that uses Polaris components and shopify-express
JavaScript
327
star
64

better-html

Better HTML for Rails
Ruby
311
star
65

theme-check

The Ultimate Shopify Theme Linter
Ruby
306
star
66

product-reviews-sample-app

A sample Shopify application that creates and stores product reviews for a store, written in Node.js
JavaScript
300
star
67

tracky

The easiest way to do motion tracking!
Swift
295
star
68

shopify-api-php

PHP
279
star
69

polaris-viz

A collection of React and React native components that compose Shopify's data visualization system
TypeScript
279
star
70

measured

Encapsulate measurements and their units in Ruby.
Ruby
275
star
71

cli

Build apps, themes, and hydrogen storefronts for Shopify
TypeScript
273
star
72

money

Manage money in Shopify with a class that won't lose pennies during division
Ruby
265
star
73

javascript

The home for all things JavaScript at Shopify.
254
star
74

ruvy

Rust
252
star
75

limiter

Simple Ruby rate limiting mechanism.
Ruby
244
star
76

vscode-ruby-lsp

VS Code plugin for connecting with the Ruby LSP
TypeScript
232
star
77

polaris-tokens

Design tokens for Polaris, Shopify’s design system
TypeScript
230
star
78

buy-button-js

BuyButton.js is a highly customizable UI library for adding ecommerce functionality to any website.
JavaScript
230
star
79

android-testify

Add screenshots to your Android tests
Kotlin
225
star
80

turbograft

Hard fork of turbolinks, adding partial page replacement strategies, and utilities.
JavaScript
213
star
81

mobile-buy-sdk-android

Shopify’s Mobile Buy SDK makes it simple to sell physical products inside your mobile app. With a few lines of code, you can connect your app with the Shopify platform and let your users buy your products using their credit card.
Java
202
star
82

spoom

Useful tools for Sorbet enthusiasts
Ruby
192
star
83

graphql-js-client

A Relay compliant GraphQL client.
JavaScript
187
star
84

ruby_memcheck

Use Valgrind memcheck on your native gem without going crazy
Ruby
187
star
85

shopify-app-template-php

PHP
186
star
86

skeleton-theme

A barebones ☠️starter theme with the required files needed to compile with Slate and upload to Shopify.
Liquid
185
star
87

sprockets-commoner

Use Babel in Sprockets to compile JavaScript modules for the browser
Ruby
182
star
88

rotoscope

High-performance logger of Ruby method invocations
Ruby
180
star
89

shopify-app-template-remix

TypeScript
178
star
90

git-chain

Tool to rebase multiple Git branches based on the previous one.
Ruby
176
star
91

verdict

Framework to define and implement A/B tests in your application, and collect data for analysis purposes.
Ruby
176
star
92

hydrogen-react

Reusable components and utilities for building Shopify-powered custom storefronts.
TypeScript
174
star
93

ui-extensions

TypeScript
173
star
94

storefront-api-learning-kit

JavaScript
171
star
95

heap-profiler

Ruby heap profiler
C++
159
star
96

autoload_reloader

Experimental implementation of code reloading using Ruby's autoload
Ruby
158
star
97

app_profiler

Collect performance profiles for your Rails application.
Ruby
157
star
98

graphql-metrics

Extract as much much detail as you want from GraphQL queries, served up from your Ruby app and the graphql gem.
Ruby
157
star
99

active_fulfillment

Active Merchant library for integration with order fulfillment services
Ruby
155
star
100

ci-queue

Distribute tests over many workers using a queue
Ruby
148
star