• This repository has been archived on 14/Feb/2018
  • Stars
    star
    135
  • Rank 269,297 (Top 6 %)
  • Language
    Java
  • License
    Apache License 2.0
  • Created about 9 years ago
  • Updated over 7 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

PinLater is a Thrift service to manage scheduling and execution of asynchronous jobs.

Note: This project is no longer actively maintained by Pinterest.


Pinlater

PinLater is a Thrift service to manage scheduling and execution of asynchronous jobs.

Key features

  • reliable job execution: explicit acks and automatically retries with configurable delay.
  • job scheduling: schedule jobs to be executed at a specific time in the future
  • rate limiting: ability to rate limit the execution of a particular queue in the system
  • language agnostic: allows both job enqueuers and workers to be written in any thrift-supported language.
  • horizontal scalability: both the service and storage are horizontal scalable. scaling the system up to handle more load is as easy as launching more PinLater hosts
  • multiple storage backends: currently provide MySQL and Redis implementations for different use cases.
  • observability: visibility into individual jobs and corresponding job queues, metrics tracking various runtime properties are exposed through Ostrich.

QuickStart

Build

mvn clean compile

Create and install jars

mvn clean package
mkdir ${PINLATER_INSTALL_DIR}
tar -zxvf target/pinlater-0.1-SNAPSHOT-bin.tar.gz -C ${PINLATER_INSTALL_DIR}

Run Server Locally with MySQL backend

cd ${PINLATER_INSTALL_DIR}
# Make sure you have a MySQL instance running locally
java -server -cp .:./*:./lib/* -Dserver_config=pinlater.local.properties -Dbackend_config=mysql.local.json -Dlog4j.configuration=log4j.local.properties com.pinterest.pinlater.PinLaterServer

Run Server Locally with Redis backend

# Make sure you have a Redis instance running locally
cd ${PINLATER_INSTALL_DIR}
java -server -cp .:./*:./lib/* -Dserver_config=pinlater.redis.local.properties -Dbackend_config=redis.local.json -Dlog4j.configuration=log4j.local.properties com.pinterest.pinlater.PinLaterServer

Client Tool

A PinLater client tool for correctness and test/performance testing.

Create Queue

java -cp .:./*:./lib/* -Dlog4j.configuration=log4j.local.properties com.pinterest.pinlater.client.PinLaterClientTool --host localhost --port 9010 --mode create  --queue test_queue

Enqueue jobs

java -cp .:./*:./lib/* -Dlog4j.configuration=log4j.local.properties  com.pinterest.pinlater.client.PinLaterClientTool --host localhost --port 9010 --mode enqueue --queue test_queue --num_queries -1 --batch_size 1 --concurrency 5

Dequeue/Ack jobs

java -cp .:./*:./lib/* -Dlog4j.configuration=log4j.local.properties com.pinterest.pinlater.client.PinLaterClientTool --host localhost --port 9010 --mode dequeue --queue test_queue --num_queries -1 --batch_size 50 --concurrency 5 --dequeue_success_percent 95

Look up job

java -cp .:./*:./lib/* -Dlog4j.configuration=log4j.local.properties com.pinterest.pinlater.client.PinLaterClientTool --host localhost --port 9010 --mode lookup --queue test_queue --job_descriptor test_queue:s1d1:p1:1

PinLater job descriptor is formatted as [queue_name][shard_id][priority][local_id].

Get queue names

java -cp .:./*:./lib/* -Dlog4j.configuration=log4j.local.properties com.pinterest.pinlater.client.PinLaterClientTool --host localhost --port 9010 --mode get_queue_names

Get job count

java -cp .:./*:./lib/* -Dlog4j.configuration=log4j.local.properties com.pinterest.pinlater.client.PinLaterClientTool --host localhost --port 9010 --mode get_job_count --queue test_queue --job_state 0 --priority 1 --count_future_jobs false

The job state can be 0, 1, 2, or 3 (corresponds to pending, running, succeeded, and failed respectively).

Customizing your setup

PinLater comes with sample run scripts, properties files and backend configs which can be customized to your setup. Each run script (run_server_local_mysql.sh, run_server_local_redis.sh) contains environment variables which you may modify according to your setup. It’s highly likely that you would need to modify the following variables:

  • PINLATER_HOME_DIR: Directory containing extracted jars.
  • SERVER_CONFIG: Your cluster properties file
  • BACKEND_CONFIG: Your storage backend config file
  • RUN_DIR: Directory for writing PID files

Before running the PinLater service, you’ll need to modify the backend JSON configs to point to the desired backend storage. Note that the MySQL and Redis implementation uses config files with different JSON schema. Check out mysql.local.json and redis.local.json for examples. You will also need to modify the log4j properties file to point to the desired directories for the application logs for the controller, server and thrift server respectively.

There are several important properties that you might also want to modify:

# Timeout for preventing jobs from running forever. Timeout jobs will be 
# retried or marked as failed depending on the number of attempts remained.
BACKEND_MONITOR_JOB_CLAIMED_TIMEOUT_SECONDS=600
# TTL of failed jobs before they are cleaned up
BACKEND_MONITOR_JOB_FAILED_GC_TTL_HOURS=168
# TTL of succeeded jobs before they are cleaned up
BACKEND_MONITOR_JOB_SUCCEEDED_GC_TTL_HOURS=24

To start and stop the service:

# Start the PinLater service with MySQL
scripts/run_server_local_mysql.sh start

# Stop the PinLater service with MySQL
scripts/run_server_local_mysql.sh stop
# Start the PinLater service with Redis
scripts/run_server_local_redis.sh start

# Stop the PinLater service with Redis
scripts/run_server_local_redis.sh stop

Usage

Build a worker

A PinLater worker is responsible for continuously dequeuing jobs, execute them and then reply to the PinLater server with a positive or negative ACK, depending on whether the execution succeeded or failed. PinLater allows the worker to optionally specify a retry delay for a job failure. It can be used to implement arbitrary retry policies per job, e.g. constant delay retry, exponential backoff, or a combination thereof.

We provide an example worker as a reference implementation which handles all above. It executes an example PinLater job defined in PinLaterExampleJob.java. For more detail, check out PinLaterExampleWorker.java. To run the example worker:

cd ${PINLATER_INSTALL_DIR}
# Use the client tool to create test_queue and enqueue jobs
java -server -cp .:./*:./lib/* -Dserverset_path=discovery.pinlater.local -Dlog4j.configuration=log4j.local.properties com.pinterest.pinlater.example.PinLaterExampleWorker

We also provide a Java client used by the example worker and client tool. Client can also be implemented in any Thrift-supported language with the thrift interface. A job is language agnostic and its job body is just a sequence of bytes from the PinLater service’s perspective. Therefore jobs can even be enqueued and dequeued by clients in different languages.

Dequeue rate limiting

PinLater provides per-queue rate limiting that allows an operator to limit the dequeue rate on any queue in the system, or even stop dequeues completely, which can help alleviate load quickly on a struggling backend system, or prevent a slow job from affecting other jobs.

Rate limiting is configured via a JSON config file, which PinLater automatically reloads when any change happen. Here is an example queue config file:

{
    "queues": [
        {
            "name": "pinlater_test_queue",
            "queueConfig": {
                "maxJobsPerSecond": 100
            }
        },
        {
            "name": "pinlater_test_slow_queue",
            "queueConfig": {
                "maxJobsPerSecond": 0.1
            }
        },
        {
            "name": "pinlater_test_paused_queue",
            "queueConfig": {
                "maxJobsPerSecond": 0
            }
        }
    ]
}

Rate limiting also depends on ServerSet to figure out how many PinLater servers are active and compute the per-server rate limit. We use a file based ServerSet implementation as an example, which uses a local file that contains a [HOSTNAME]:[PORT] pair on each line. ServerSet can be configured by setting SERVER_SET_ENABLED and SERVER_SET_PATH. If serverset is disable, rate limits will be applied per server.

Storage backends

PinLater has two implementations: one built on top of Redis and one built on top of MySQL. In general services should default to use the MySQL backend as long as the QPS is in the lower to mid range (no more than 1000 QPS per shard). If the QPS is expected to be higher than this, then the Redis implementation should be used. The main advantage using MySQL over Redis is the amount of available space to store jobs is far greater; Redis backed services run the risk of incurring data loss if pending job back ups are not tended to in a few hours. Both MySQL and Redis implementations use a JSON file for backend configuration. (Note: the MySQL backend supports automatic reload of backend configuration, which can help implement features like auto failover. It’s not yet implemented in the Redis backend).

PinLater also supports a dequeue-only mode where a shard only receives dequeue requests but no new jobs will be enqueued. It can help dealing with struggling backend system, or drain a shard before downsizing the cluster. Check out mysql.local.json and redis.local.json for examples.

Monitoring

PinLater provides a set of APIs through the thrift interface for monitoring queue status and managing jobs, including getQueueNames, getJobCount, lookupJob, scanJobs and etc. For more details, please check out the thrift interface

You can also retrieve detailed metrics by running curl localhost:9999/stats.txt on the server. These metrics are exported using Twitter's Ostrich library and are easy to parse. The port can be modified by setting the ostrich_metrics_port property.

Design

If you are interested in the detailed design, check out our blog post about PinLater

More Repositories

1

ktlint

An anti-bikeshedding Kotlin linter with built-in formatter
Kotlin
6,192
star
2

gestalt

A set of React UI components that supports Pinterest’s design language
TypeScript
4,240
star
3

PINRemoteImage

A thread safe, performant, feature rich image fetcher
Objective-C
4,009
star
4

PINCache

Fast, non-deadlocking parallel object cache for iOS, tvOS and OS X
Objective-C
2,660
star
5

querybook

Querybook is a Big Data Querying UI, combining collocated table metadata and a simple notebook interface.
TypeScript
1,923
star
6

secor

Secor is a service implementing Kafka log persistence
Java
1,845
star
7

teletraan

Teletraan is Pinterest's deploy system.
Java
1,807
star
8

knox

Knox is a secret management service
Go
1,229
star
9

pinball

Pinball is a scalable workflow manager
JavaScript
1,048
star
10

mysql_utils

Pinterest MySQL Management Tools
Python
883
star
11

snappass

Share passwords securely
Python
837
star
12

elixometer

A light Elixir wrapper around exometer.
Elixir
827
star
13

pymemcache

A comprehensive, fast, pure-Python memcached client.
Python
771
star
14

bonsai

Understand the tree of dependencies inside your webpack bundles, and trim away the excess.
JavaScript
738
star
15

rocksplicator

RocksDB Replication
C++
662
star
16

esprint

Fast eslint runner
JavaScript
661
star
17

bender

An easy-to-use library for creating load testing applications
Go
658
star
18

DoctorK

DoctorK is a service for Kafka cluster auto healing and workload balancing
Java
633
star
19

plank

A tool for generating immutable model objects
Swift
469
star
20

riffed

Provides idiomatic Elixir bindings for Apache Thrift
Elixir
307
star
21

thrift-tools

thrift-tools is a library and a set of tools to introspect Apache Thrift traffic.
Python
233
star
22

elixir-thrift

A Pure Elixir Thrift Implementation
Elixir
214
star
23

widgets

JavaScript widgets, including the Pin It button.
JavaScript
210
star
24

singer

A high-performance, reliable and extensible logging agent for uploading data to Kafka, Pulsar, etc.
Java
178
star
25

terrapin

Serving system for batch generated data sets
Java
176
star
26

git-stacktrace

Easily figure out which git commit caused a given stacktrace
Python
158
star
27

jbender

An easy-to-use library for creating load testing applications.
Java
156
star
28

ptracer

A library for ptrace-based tracing of Python programs
Python
155
star
29

react-pinterest

JavaScript
151
star
30

memq

MemQ is an efficient, scalable cloud native PubSub system
Java
129
star
31

api-quickstart

Code that makes it easy to get started with the Pinterest API.
Python
122
star
32

it-cpe-cookbooks

A suite of Chef cookbooks that we use to manage our fleet of client devices
Ruby
118
star
33

psc

PubSubClient (PSC)
Java
117
star
34

pinterest-api-demo

JavaScript
106
star
35

PINOperation

Objective-C
104
star
36

orion

Management and automation platform for Stateful Distributed Systems
Java
101
star
37

soundwave

A searchable EC2 Inventory store
Java
96
star
38

PINFuture

An Objective-C future implementation that aims to provide maximal type safety
Objective-C
83
star
39

kingpin

KingPin is the toolset used at Pinterest for service discovery and application configuration.
Python
69
star
40

arcanist-linters

A collection of custom Arcanist linters
PHP
63
star
41

pagerduty-monit

Wrapper scripts to integrate monit and PagerDuty.
Shell
60
star
42

pinrepo

Pinrepo is a highly scalable solution for storing and serving build artifacts such as debian packages, maven jars and pypi packages.
Python
58
star
43

transformer_user_action

Transformer-based Realtime User Action Model for Recommendation at Pinterest
Python
49
star
44

quasar-thrift

A Thrift server that uses Quasar's lightweight threads to handle connections.
Java
47
star
45

pinterest-python-sdk

An SDK that makes it quick and easy to build applications with Pinterest API.
Python
47
star
46

yuvi

Yuvi is an in-memory storage engine for recent time series metrics data.
Java
45
star
47

atg-research

Python
41
star
48

slackminion

A python bot framework for slack
Python
22
star
49

api-description

OpenAPI descriptions for Pinterest's REST API
18
star
50

l10nmessages

L10nMessages is a library that makes internationalization (i18n) and localization (l10n) of Java applications easy and safe.
Java
17
star
51

thriftcheck

A linter for Thrift IDL files
Go
16
star
52

arcanist-owners

An Arcanist extension for displaying file ownership information
PHP
16
star
53

tiered-storage

Pinterest's simplified and efficient Tiered Storage implementation for Kafka
Java
13
star
54

.github

Pinterest's Open Source Project Template
12
star
55

homebrew-tap

macOS Homebrew formulas to install Pinterest open source software
Ruby
12
star
56

pinterest-python-generated-api-client

This is the auto-generated code using OpenAPI generator. Generated code comprises HTTP requests to various v5 API endpoints.
Python
12
star
57

vscode-gestalt

Visual Studio Code extension for Gestalt, Pinterest's design system
TypeScript
9
star
58

wheeljack

Work with interdependent python repositories seemlessly.
Python
8
star
59

ffffound

FFFFOUND Import tool for Pinterest
HTML
8
star
60

vscode-package-watcher

Watch package lock files and suggest to re-run npm or yarn.
TypeScript
6
star
61

graphql-lint-rules

Pinterest GraphQL Lint Rules
TypeScript
6
star
62

ss-gtm-template

This is a repository to implement the Google Tag Manager server-side tag template for Pinterest Conversions API to be deployed into Google Community Template Gallery.
Smarty
5
star
63

pinterest-magento2-extension

PHP
4
star
64

Pinterest-Salesforce-Commerce-Cartridge

JavaScript
4
star
65

figma-calculations

TypeScript
2
star
66

slate

Resource Lifecycle Management framework
Java
1
star