• Stars
    star
    102
  • Rank 335,584 (Top 7 %)
  • Language
    Haskell
  • License
    Other
  • Created over 5 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Forward build system with speculation and caching

Rattle Hackage version Stackage version Build status

{Fast, Correct, Simple, Unfinished} - Choose four.

To write a Rattle build system you need to write a script that builds your code. No dependencies, no targets, just a straight-line script that builds your code. Rattle takes that script, and turns it into a build system:

  1. Rattle skips commands if their inputs/outputs haven't changed.
  2. Rattle copies outputs over if a command has been run before on the same inputs (potentially for a different user).
  3. Rattle guesses what commands might be coming next, and speculates of them, to increase parallelism.

The closest build system to Rattle is Fabricate, which follows a similar approach, but only provides step 1. Rattle extends that to steps 2 and 3.

Advantages

Rattle build systems are simple to write - just write commands to build your project. There is only one "Rattle" specific thing, and that's running a command, giving it a tiny API. Rattle build systems can delegate to other build systems quite happily -- if a dependency is built with make, just shell out to make -- no need to recreate the dependencies with Rattle. You don't need to think about dependencies, and because you aren't writing dependencies, there's no way to get them wrong.

Disadvantages

The disadvantages of Rattle are that only system commands are skipped, so the zero build time might not be as quick as it could. Rattle also has to guess at useful commands to get parallelism, and that might go wrong.

Rattle fundamentally relies on tracing the files used by system commands, and currently uses the fsatrace program to do so, which must be installed. As a consequence of the design, if fsatrace fails, then rattle won't work properly. Reasons that fsatrace might fail include:

  • If you use Go compiled binaries on Linux or Mac then it's unlikely to be traced, see Issue 24.
  • If you a Mac OS X 1.10 or higher then system binaries that themselves invoke system binaries they won't be traced, unless you disable System Integrity Protection (which you do at your own risk).
  • If you use statically linked binaries on Linux they probably won't be traced.

It is possible to write alternative tracing mechanisms that wouldn't have these limitations, for example:

  • Bigbro uses ptrace so works for Go binaries and statically linked binaries.
  • traced-fs is an unfinished file system tracer that implements a user file system that would also work for Go binaries and static linking.
  • BuildXL uses a Mac sandbox based on KAuth + TrustedBSD Mandatory Access Control (MAC).

Relationship to Shake

Rattle was designed by the same author as the Shake build system. Rattle is named as a follow-on from Shake, but not a replacement -- Shake continues to be actively developed and is a much more mature system.

Rattle User Manual / Paper

The rest of this README serves as a dumping ground for things that will either go into a user manual or an academic paper (if one ever gets written). As a result, it is a bit of a mismatch, and on top of that it's very definitely not complete.

Abstract

Most build systems are backwards build systems, users define rules to produce targets. These systems have a lot of advantages -- they encode parallelism, they are compositional and the user can easily build any subtree of dependencies. However, they require manually specifying dependencies and all the state of an object must be encoded in its filename. The alternative, and poorly studied, forwards build systems are structured very differently, which we argue produces simpler systems for the end user, at least at small to medium scale.

Introduction

In a backwards build system (e.g. Shake, Bazel, Make) a rule to build a C file will probably look something like:

"*.o" %> \out -> do
    let src = out -<.> "c"
    need [src]
    cmd "gcc -c" src "-o" out

Taking a look at this rule, it has 4 lines:

  1. First line, say how to build .o files.
  2. Compute, from the .o filename alone, the .c source it corresponds to.
  3. Add an explicit dependency on the .c file.
  4. Specify the command and run it.

Of these steps, only step 4 is truely building things, the other steps are about providing the metadata for a backwards build system to operate. And furthermore, in a practical build system, we're likely to have to go further -- encoding optimisation level in the .o file, and depending on any headers that were used.

To describe it as a backwards build systems is apt. In all likelihood we will have had a .c file in mind, then we generate the .o file, then we go back to the .c file -- all a little odd. In contrast, a forward build system can be expressed more directly:

buildCSrc src = cmd "gcc -c" src "-o" (src -<.> "o")

We can talk about the .c file directly, so don't need to extra information back from the output. We don't need to add dependencies. We just specify the command line. Simple.

In the rest of this paper we describe how to construct a forward build system which is as powerful as a modern fully featured backward build system. There are trade offs, particularly at scale, which we describe and in some cases mitigate.

The no-op forward build system

We believe forward build systems should be simple, so let's take the other extreme, and think about what a build system is in it's minimality. At an absolute minimum, it must list the commands to be run. For example, to build a C executable we might say:

gcc -c hello.c -o hello.o
gcc hello.o -o hello.exe

However, we've actually gone slightly further than just listing the commands, but supplied them in an order which satisfies the dependencies. We haven't mentioned the dependencies, but we have written the link command after the compile, which is an order that does satisfy them.

Some build systems are able to build without any ordering, by restarting and retrying commands. While an interesting area of research, we don't persue that path, but see David Roundry's system.

So, given this straight-line code that builds the program what are we missing relative to a build system?

  1. Build systems don't rebuild commands whose inputs haven't changed.
  2. Some build systems are able to download results from a shared store (e.g. cloud build systems).
  3. Backwards build systems can extract parallelism.

In the next three sections we add back each of these aspects.

Avoid rebuilds of things that haven't changed

Given a command, if you know what files it reads and writes, and none of those files have changed, you can skip running it providing that if it were deterministic it wouldn't be harmful. As an example, if the compiler produces a different object file each time, but it would be OK with not doing that, then you're fine. In practice this means that any command that isn't relied on to produce fresh entropy (the time, a GUID, a random number) is fine to skip subsequent times. Those commands that do produce fresh entropy are not well suited to a build system anyway, so aren't typically used in build systems.

This got solved by fabricate. You can use a system access tracer to watch what a process reads/writes, and have a database storing the command, previous reads/writes, and then skip it if the values haven't changed. This step is not complex.

However, it a build writes a file twice in a run, or reads it then writes to it, the result will be that a subsequent rebuild will have stuff to do, as it won't end at a quiescent state. To detect that, we introduce the concept of hazards.

Shared build cache

Given knowledge of the reads/writes of a command, it's not too hard to store them in a cloud, and satisfy commands not by rerunning them, but by copying their result from a shared cache. While this works beautifully in theory, in practice it leads to at least three separate problems which we solve.

Relative build dirs

Often the current directory, or users profile directory, will be accessed by commands. These change either if the user has two working directories, or if they use different machines. We solve this by having a substitution table.

Compound commands

Sometimes a command will produce something that is user specific, but the next step will make it not user specific. To fix that we add compound commands. You can do it with a shell script, using pipeline, or with TemplateHaskell blocks.

Non-deterministic builds

We solve this by having .rattle.hash files which we substitute for reading.

Generating parallelism

We use speculation and hazards to figure out if speculation went wrong.

Relative simplicity

Compare Rattle to Shake. The interface to Rattle is basically "run this command", providing a vastly simpler API. In contast, Shake has a lot of API to learn and use.

More Repositories

1

hlint

Haskell source code suggestions
Haskell
1,459
star
2

ghcid

Very low feature GHCi based IDE
Haskell
1,130
star
3

shake

Shake build system
Haskell
744
star
4

hoogle

Haskell API search engine
Haskell
689
star
5

tagsoup

Haskell library for parsing and extracting information from (possibly malformed) HTML/XML documents
Haskell
231
star
6

bake

UNMAINTAINED: Continuous integration server
Haskell
130
star
7

record-dot-preprocessor

A preprocessor for a Haskell record syntax using dot
Haskell
129
star
8

weeder

Detect dead exports or package imports
Haskell
124
star
9

debug

Haskell library for debugging
JavaScript
121
star
10

spaceleak

Notes on space leaks
Haskell
101
star
11

extra

Extra Haskell functions
Haskell
93
star
12

cmdargs

Haskell library for command line argument processing
Haskell
91
star
13

build-shootout

Comparison of build program expressive power
Haskell
88
star
14

uniplate

Haskell library for simple, concise and fast generic operations.
Haskell
74
star
15

safe

Haskell library for safe (pattern match free) functions
Haskell
45
star
16

ghc-make

An alternative to ghc --make which supports parallel compilation of modules and runs faster when nothing needs compiling.
Haskell
40
star
17

interpret

Rust
38
star
18

neil

General tools for Neil
Haskell
38
star
19

profiterole

GHC prof manipulation script
Haskell
30
star
20

nsis

Haskell DSL for producing Windows Installer using NSIS
Haskell
26
star
21

derive

A Haskell program and library to derive instances for data types
TeX
25
star
22

supero

Haskell optimisation tool based on supercompilation
Haskell
25
star
23

hexml

A bad XML parser
C
19
star
24

offline-stack

Install Stack without internet access
Haskell
18
star
25

catch

Haskell pattern match analsyis checker
Haskell
15
star
26

js-jquery

Haskell library to obtain minified jQuery code
Haskell
9
star
27

rexe

.exe forwarder, to allow replacing binaries on PATH
Haskell
8
star
28

office

Macros for Microsoft Office
VBA
7
star
29

VSHaskell

Visual Studio 2010 addin
C#
7
star
30

record-hasfield

A version of HasField that will be available in future GHC
Haskell
7
star
31

shake-paper

Paper on the new GHC Shake-based build system
HTML
6
star
32

lasagna

Checker for Haskell layering violations
Haskell
5
star
33

guihaskell

A graphical REPL and development environment for Haskell
Haskell
5
star
34

hwwg

Haskell Website Working Group
5
star
35

shake-bazel

Experimenting with Shake and Bazel combined
Haskell
4
star
36

blogs

TeX
4
star
37

haskell-parser

Haskell parser based on that from GHC
Haskell
4
star
38

filepattern

A file path matching library
Haskell
4
star
39

idris-playground

Playing around with Idris
Idris
4
star
40

proplang

A Haskell library for functional GUI development
Haskell
4
star
41

thesis

My PhD thesis - Transformation and Analysis of Functional Programs
Haskell
3
star
42

js-flot

Haskell library to obtain minified Flot code
Haskell
3
star
43

qed

Experiments writing a prover
Haskell
3
star
44

core-playground

Simple Core language for Haskell experiments
Haskell
3
star
45

js-dgtable

Haskell library to obtain minified jquery.dgtable code
JavaScript
3
star
46

neil-check

Run neil and hlint on all my projects
Haskell
3
star
47

shake-examples

3
star
48

proof

Haskell library for writing proofs
Haskell
2
star
49

ci-check

Test the various supported CI's
2
star
50

hogle-dead

This repo has been moved to the master branch of https://github.com/ndmitchell/hoogle.
Haskell
2
star
51

stack-3137

Reproduce https://github.com/commercialhaskell/stack/issues/3137
Haskell
2
star
52

winhaskell

Windows Haskell GUI interpretter
C++
1
star
53

fossilizer

Generate 3D images of fossil bedding surfaces
JavaScript
1
star
54

bug-i386-ghc84

1
star
55

proposition

Haskell library for manipulating propositions.
Haskell
1
star
56

tex2hs

A program to check for type errors in a Latex document
Haskell
1
star
57

ndmitchell

1
star
58

hlint-test

Haskell
1
star
59

ghc-process

Compiling and linking files in a single process using the GHC API
Haskell
1
star
60

shark

A bad replacement for cabal/stack
1
star
61

index-search

Searching compressed text indicies
C
1
star
62

awesomo

Prototype optimiser for Haskell programs
Haskell
1
star
63

ninjasmith

Ninja file generator and tester
Haskell
1
star
64

firstify

A Haskell library to transform Yhc Core programs to first-order
TeX
1
star