• Stars
    star
    114
  • Rank 307,278 (Top 7 %)
  • Language
    Haskell
  • License
    BSD 3-Clause "New...
  • Created over 6 years ago
  • Updated over 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A distributed data processing framework in Haskell.

distributed-dataset

CI Status

A distributed data processing framework in pure Haskell. Inspired by Apache Spark.

Packages

distributed-dataset

This package provides a Dataset type which lets you express and execute transformations on a distributed multiset. Its API is highly inspired by Apache Spark.

It uses pluggable Backends for spawning executors and ShuffleStores for exchanging information. See 'distributed-dataset-aws' for an implementation using AWS Lambda and S3.

It also exposes a more primitive Control.Distributed.Fork module which lets you run IO actions remotely. It is especially useful when your task is embarrassingly parallel.

distributed-dataset-aws

This package provides a backend for 'distributed-dataset' using AWS services. Currently it supports running functions on AWS Lambda and using an S3 bucket as a shuffle store.

distributed-dataset-opendatasets

Provides Dataset's reading from public open datasets. Currently it can fetch GitHub event data from GH Archive.

Running the example

  • Clone the repository.

    $ git clone https://github.com/utdemir/distributed-dataset
    $ cd distributed-dataset
  • Make sure that you have AWS credentials set up. The easiest way is to install AWS command line interface and to run:

    $ aws configure
  • Create an S3 bucket to put the deployment artifact in. You can use the console or the CLI:

    $ aws s3api create-bucket --bucket my-s3-bucket
  • Build an run the example:

    • If you use Nix on Linux:

      • (Recommended) Use my binary cache on Cachix to reduce compilation times:
      nix-env -i cachix # or your preferred installation method
      cachix use utdemir
      • Then:

        $ nix run -f ./default.nix example-gh -c example-gh my-s3-bucket
    • If you use stack (requires Docker, works on Linux and MacOS):

      $ stack run --docker-mount $HOME/.aws/ --docker-env HOME=$HOME example-gh my-s3-bucket

Stability

Experimental. Expect lots of missing features, bugs, instability and API changes. You will probably need to modify the source if you want to do anything serious. See issues.

Contributing

I am open to contributions; any issue, PR or opinion is more than welcome.

  • In order to develop distributed-dataset, you can use;
    • On Linux: Nix, cabal-install or stack.
    • On MacOS: stack with docker.
  • Use ormolu to format source code.

Nix

  • You can use my binary cache on cachix so that you don't recompile half of the Hackage.
  • nix-shell will drop you into a shell with ormolu, cabal-install and steeloverseer alongside with all required haskell and system dependencies. You can use cabal new-* commands there.
  • Easiest way to get a development environment would be to run sos at the top level directory inside of a nix-shell.

Stack

  • Make sure that you have Docker installed.
  • Use stack as usual, it will automatically use a Docker image
  • Run ./make.sh stack-build before you send a PR to test different resolvers.

Related Work

Papers

Projects

More Repositories

1

nix-tree

Interactively browse dependency graphs of Nix derivations.
Haskell
632
star
2

ghc-musl

Docker image with GHC+musl for static executables
Shell
135
star
3

hs-nix-template

A Haskell project template that uses Nix and comes with cabal-install, ghcid, ormolu, haskell-language-server and more.
Nix
102
star
4

dotfiles-nix

Comprehensive configurations of my NixOS workstations and home server.
Nix
84
star
5

nixlisp

Nix
59
star
6

bencoder

A simple bencode decoder-encoder library in pure Python.
Python
32
star
7

qualified-imports-plugin

Haskell
16
star
8

composable-indexes

Index arbitrary JavaScript objects with multiple dimensions.
TypeScript
9
star
9

hs-pivotal-tracker

A Haskell library and a CLI tool for interacting with Pivotal Tracker
Haskell
5
star
10

recursive-let-plugin

An experiment to implement something similar to RecursiveLet proposal using GHC plugins.
Haskell
4
star
11

zsh-up

ZSH integration for the Ultimate Plumber
Shell
4
star
12

utdemir.com

Source and build scripts for my personal website.
JavaScript
4
star
13

handsy

[DEPRECATED] A Haskell DSL to describe common shell operations and interpeters for running them locally and remotely.
Haskell
4
star
14

bar

Configurable progress bars/status monitors for Python console applications.
Python
4
star
15

midye

Haskell
4
star
16

emacs-with-config

A Nix function for customizing Emacs
Emacs Lisp
4
star
17

thlpe

Haskell implementation of "The Hardest Logic Puzzle Ever"
Haskell
2
star
18

hamza

Haskell
2
star
19

dotfiles

Dotfiles for my Macbook
Shell
2
star
20

qrpush

Simple application for transferring files to a smart phone.
Ruby
2
star
21

talks

Nix
2
star
22

serverless-hs

A Haskell web framework that runs on AWS Lambda. Currently incomplete, large parts are missing, nothing is working.
Haskell
2
star
23

aoc2022

Haskell
1
star
24

network-transport-websockets

A websocket transport implementing network-transport API
Haskell
1
star
25

trying-yi

Haskell
1
star
26

allrgb

Scala
1
star
27

lmdb-safe

A bit safer lmdb binding for Haskell.
Haskell
1
star
28

apidoc-hs

Generate Haskell data types from Apidoc schemas using Template Haskell.
Haskell
1
star
29

gcal-i-am-busy

Nix
1
star
30

furby

A simplified rawdog clone, in Ruby
HTML
1
star
31

cookiecutter-haskell

Nix
1
star