• Stars
    star
    6
  • Rank 2,461,562 (Top 50 %)
  • Language
    Crystal
  • License
    MIT License
  • Created 11 months ago
  • Updated 5 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A library to create and execute tasks with dependencies

Croupier

Croupier is a smart task definition and execution library, which can be used for dataflow programming.

Docs License Release News about Croupier

Tests codecov Mutation Tests

What does it mean

You use Croupier to define tasks. Tasks have:

  • An id

  • Zero or more input files or k/v store keys

  • Zero or more output files or k/v store keys

  • A Proc that consumes the inputs and returns a string

  • After the Proc returns data which is saved to the output(s) unless the task has the no_save flag set to true, in which case it's expected to have already saved it.

    Note: the return value for procs depends on several factors, see below. Note: A reference to a k/v key is of the form kv://mykey

And here is the fun part:

Croupier will examine the inputs and outputs for your tasks and use them to build a dependency graph. This expresses the connections between your tasks and the files on disk, and between tasks, and will use that information to decide what to run.

So, suppose you have task1 consuming input.txt producing fileA and task2 that has fileA as input and outputs fileB. That means your tasks look something like this:

  graph LR;
      id1(["πŸ“ input.txt"])-->idt1["βš™οΈ task1"]-->id2(["πŸ“ fileA"]);
      id2-->idt2["βš™οΈ task2"]-->id3(["πŸ“ fileB"]);

Croupier guarantees the following:

  • If task1 has never run before, it will run and create fileA
  • If task1 has run before and input.txt has not changed, it will not run.
  • If task1 has run before and Γ¬nput.txt` has changed, it will run
  • If task1 runs, task2 will run and create fileB
  • task1 will run before task2

That's a very long way to say: Croupier will run whatever needs running, based on the content of the dependency files and the dependencies between tasks. In this example it may look silly because it's simple, but it should work even for thousands of tasks and dependencies.

The state between runs is kept in .croupier so if you delete that file all tasks will run.

Further documentation at the doc pages

Notes

Notes about proc return types

  • Procs in Tasks without outputs can return nil or a string, it will be ignored.

  • Procs with one output and no_save==false should return a string which will be saved to that output.

    If no_save==true then the returned value is ignored.

  • Procs with multiple outputs and no_save==false should return an Array(String) which will be saved to those outputs.

    If no_save==true then the returned value is ignored.

No target conflicts

If there are two or more tasks with the same output they will be merged into the first task created. The resulting task will:

  • Depend on the combination of all dependencies of all merged tasks
  • Run the procs of all merged tasks in order of creation

Tasks without output

A task with no output will be registered under its id and is not expected to create any output files. Other than that, it's just a regular task.

Tasks with multiple outputs

If a task expects the TaskManager to create multiple files, it should return an array of strings.

Installation

  1. Add the dependency to your shard.yml:

    dependencies:
      croupier:
        github: ralsina/croupier
  2. Run shards install

Usage

This is the example described above, in actual code:

require "croupier"

Croupier::Task.new(
  output: "fileA",
  inputs: ["input.txt"],
) {
  puts "task1 running"
  File.read("input.txt").downcase
}

Croupier::Task.new(
  output: "fileB",
  inputs: ["fileA"],
) do
  puts "task2 running"
  File.read("fileA").upcase
end

Croupier::Task.run_tasks

If we create a index.txt file with some text in it and run this program, it will print it's running task1 and task2 and produce fileA with that same text in upper case, and fileB with the text in lowercase.

The second time we run it, it will do nothing because all tasks dependencies are unchanged.

If we modify index.txt or fileA then one or both tasks will run, as needed.

Auto Mode

Besides run_tasks, there is another way to run your tasks, auto_run. It will run tasks as needed, when their input files change. This allows for some sorts of "continuous build" which is useful for things like web development.

You start the auto mode with TaskManager.auto_run and stop it with TaskManager.auto_stop. It runs in a separate fiber so your main fiber needs to do something else and yield. For details on that, see Crystal's docs.

This feature is still under development and may change, but here is an example of how it works, taken from the specs:

# We create a proc that has a visible side effect
x = 0
counter = TaskProc.new { x += 1; x.to_s }
# This task depends on a file called "i" and produces "t1"
Task.new(output: "t1", inputs: ["i"], proc: counter)
# Launch in auto mode
TaskManager.auto_run

# We have to yield and/or do stuff in the main fiber
# so the auto_run fibers can run
Fiber.yield

# Trigger a build by creating the dependency
File.open("i", "w") << "foo"
Fiber.yield

# Stop the auto_run
TaskManager.auto_stop

# It should only have ran once
x.should eq 1
File.exists?("t1").should eq true

Development

Let's try to keep test coverage good :-)

  • To run tests: make test or crystal spec
  • To check coverage: make coverage
  • To run mutation testing: make mutation

Other than that, anything is fair game. In the TODO.md file there is a section for things that were considered and decided to be a bad idea, but that is conditional and can change when presented with a good argument.

Contributing

  1. Fork it (https://github.com/ralsina/croupier/fork)
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create a new Pull Request

Contributors

More Repositories

1

rst-cheatsheet

A two-page cheatsheet for restructured text
Shell
1,143
star
2

pyqt-by-example

A tutorial for PyQt focused on showing by doing
Python
127
star
3

python-cheatsheet

A one-page python cheatsheet
27
star
4

enum_switch

A Enum-based implementation of switch for Python
Python
20
star
5

xrandroll

A better tool to configure xrandr
Python
17
star
6

nicoletta

A minimalistic static blog generator
Python
17
star
7

devicenzo

A PySide2 (Qt for Python) web browser, guaranteed to be under 256 lines of properly formatted code.
Python
17
star
8

cobra-py

80s-style python environment
C
10
star
9

planeta-pyar

Configuracion de Planeta Python Argentina (http://planeta.python.org.ar)
Python
6
star
10

russian-gallery

A Django app to create online art and design portfolios
Python
6
star
11

nikola-server

nikola-server
JavaScript
5
star
12

og_att

Code and stuff I reference / mention / explain in a video
Python
4
star
13

24-hour-apps

Small, well defined apps that should have a working version in 24 hours
Python
4
star
14

tapita

Script/library to generate algoithmic book covers
Python
4
star
15

rst2qhc

Automatically exported from code.google.com/p/rst2qhc
Python
3
star
16

ficcion

2
star
17

gyro

How easy to run can a wiki be? Let's see.
JavaScript
2
star
18

Planeta-LUGLi

Blogs de usuarios del LUGLi
Python
2
star
19

staticman-data

Data for Staticman
Python
2
star
20

python-no-muerde

Automatically exported from code.google.com/p/python-no-muerde
Python
2
star
21

zero-nfsroot

How to make one or more Zeros boot without SD cards, using NFS roots and ethernet bridges.
Jinja
2
star
22

dbus-reactor

A tool to react to DBUS events
2
star
23

rst2sile

Use SILE to generate PDF from reSt
Python
1
star
24

magicforum

Python
1
star
25

nicolino

A not-quite-minimalisting SSG written in Crystal
Crystal
1
star
26

serverfull

Playing with implementing a limited "serverless" thing
Python
1
star
27

locche

locche is an untouchable pastebin
1
star
28

lois_lane

Lois Lane is a tool to create reports out of JSON data and Jinja templates.
Python
1
star
29

roke

A tool/library to create unique yet readable names.
Python
1
star
30

clusterhat-saltstack

A SaltStack setup to configure a ClusterHAT quickly
SaltStack
1
star
31

ra-plugins

Automatically exported from code.google.com/p/ra-plugins
C
1
star
32

sitio-mediadores

Sitio de Asoc. de Mediadores
Python
1
star