• Stars
    star
    1,076
  • Rank 42,988 (Top 0.9 %)
  • Language
    OCaml
  • License
    Apache License 2.0
  • Created almost 9 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Connect processes into powerful data pipelines with a simple git-like filesystem interface

DataKit -- Orchestrate applications using a Git-like dataflow

DataKit is a tool to orchestrate applications using a Git-like dataflow. It revisits the UNIX pipeline concept, with a modern twist: streams of tree-structured data instead of raw text. DataKit allows you to define complex build pipelines over version-controlled data.

DataKit is currently used as the coordination layer for HyperKit, the hypervisor component of Docker for Mac and Windows, and for the DataKitCI continuous integration system.


Build Status (OSX, Linux) Build status (Windows) docs

There are several components in this repository:

  • src contains the main DataKit service. This is a Git-like database to which other services can connect.
  • ci contains DataKitCI, a continuous integration system that uses DataKit to monitor repositories and store build results.
  • ci/self-ci is the CI configuration for DataKitCI that tests DataKit itself.
  • bridge/github is a service that monitors repositories on GitHub and syncs their metadata with a DataKit database. e.g. when a pull request is opened or updated, it will commit that information to DataKit. If you commit a status message to DataKit, the bridge will push it to GitHub.
  • bridge/local is a drop-in replacement for bridge/github that just monitors a local Git repository. This is useful for local testing.

Quick Start

The easiest way to use DataKit is to start both the server and the client in containers.

To expose a Git repository as a 9p endpoint on port 5640 on a private network, run:

$ docker network create datakit-net # create a private network
$ docker run -it --net datakit-net --name datakit -v <path/to/git/repo>:/data datakit/db

Note: The --name datakit option is mandatory. It will allow the client to connect to a known name on the private network.

You can then start a DataKit client, which will mount the 9p endpoint and expose the database as a filesystem API:

# In an other terminal
$ docker run -it --privileged --net datakit-net datakit/client
$ ls /db
branch     remotes    snapshots  trees

Note: the --privileged option is needed because the container will have to mount the 9p endpoint into its local filesystem.

Now you can explore, edit and script /db. See the Filesystem API for more details.

Building

The easiest way to build the DataKit project is to use docker, (which is what the start-datakit.sh script does under the hood):

docker build -t datakit/db -f Dockerfile .
docker run -p 5640:5640 -it --rm datakit/db --listen-9p=tcp://0.0.0.0:5640

These commands will expose the database's 9p endpoint on port 5640.

If you want to build the project from source without Docker, you will need to install ocaml and opam. Then write:

$ make depends
$ make && make test

For information about command-line options:

$ datakit --help

Prometheus metric reporting

Run with --listen-prometheus 9090 to expose metrics at http://*:9090/metrics.

Note: there is no encryption and no access control. You are expected to run the database in a container and to not export this port to the outside world. You can either collect the metrics by running a Prometheus service in a container on the same Docker network, or front the service with nginx or similar if you want to collect metrics remotely.

Language bindings

  • Go bindings are in the api/go directory.
  • OCaml bindings are in the api/ocaml directory. See examples/ocaml-client for an example.

Licensing

DataKit is licensed under the Apache License, Version 2.0. See LICENSE for the full license text.

Contributions are welcome under the terms of this license. You may wish to browse the weekly reports to read about overall activity in the repository.

More Repositories

1

moby

The Moby Project - a collaborative project for the container ecosystem to assemble container-based systems
Go
68,400
star
2

buildkit

concurrent, cache-efficient, and Dockerfile-agnostic builder toolkit
Go
8,112
star
3

hyperkit

A toolkit for embedding hypervisor capabilities in your application
C
3,604
star
4

swarmkit

A toolkit for orchestrating distributed systems at any scale. It includes primitives for node discovery, raft-based consensus, task scheduling and more.
Go
3,345
star
5

libnetwork

networking for containers
Go
2,156
star
6

vpnkit

A toolkit for embedding VPN capabilities in your application
OCaml
1,102
star
7

spdystream

Go
146
star
8

docker-ci-zap

Go
110
star
9

ipvs

IPVS networking for containers (package derived from moby/libnetwork)
Go
97
star
10

tool

Temporary repository for the moby assembly tool used by the Moby project
Go
74
star
11

sys

Go
71
star
12

libentitlement

Entitlements library for high level control of container permissions
Go
66
star
13

term

Go
65
star
14

mobywebsite

website for the moby project
HTML
31
star
15

tsc

Moby Technical Steering Committee
22
star
16

containerd

This is a limited-use, moby-specific, temporary "fork" of containerd -- not the (active) upstream containerd project!
Go
21
star
17

locker

This is a direct pull from https://github.com/moby/moby/tree/master/pkg/locker
Go
19
star
18

docker-signal

Utility for signalling a docker daemon running on Windows to dump its stacks in the case of a deadlock
Go
12
star
19

patternmatcher

Go
12
star
20

pubsub

Go
10
star
21

busybox

Dockerfile for a Windows busybox image
Dockerfile
10
star
22

docker-image-spec

Docker Image Specification v1
Go
6
star
23

datakit.logs

CI state for DataKit CI
4
star
24

docker-tdmgcc

TDM-GCC x64 Windows compilers for Docker CI
2
star