• Stars
    star
    231
  • Rank 173,434 (Top 4 %)
  • Language
    Jupyter Notebook
  • License
    BSD 3-Clause "New...
  • Created over 7 years ago
  • Updated about 1 month ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Docker images for dask

Dask docker images

Docker build

Image Description Versions
ghcr.io/dask/dask Base image to use for Dask scheduler and workers

ghcr.io/dask/dask-notebook Jupyter Notebook image to use as helper entrypoint

Example

An example docker-compose.yml file is included for starting a small cluster.

docker-compose up

Open the notebook using the URL that is printed by the output so it has the token.

On a new notebook run:

from dask.distributed import Client
client = Client()  # The address is automatically set by the DASK_SCHEDULER_ADDRESS environment variable
client.ncores()

It should output something like this:

{'tcp://172.23.0.4:41269': 4}

Environment Variables

The following environment variables are supported for both the base and notebook images:

  • $EXTRA_APT_PACKAGES - Space separated list of additional system packages to install with apt.
  • $EXTRA_CONDA_PACKAGES - Space separated list of additional packages to install with conda. This variable can also be used to specify custom conda channels; for example, to install the latest Dask conda nightly packages:
docker run -e EXTRA_CONDA_PACKAGES="-c dask/label/dev dask" daskdev/dask:latest
  • $EXTRA_PIP_PACKAGES - Space separated list of additional python packages to install with pip.
  • $USE_MAMBA - Boolean controlling whether to use conda or mamba to install $EXTRA_CONDA_PACKAGES.

The notebook image supports the following additional environment variables:

  • $JUPYTERLAB_ARGS - Extra arguments to pass to the jupyter lab command.

Building images

Docker compose provides an easy way to building all the images with the right context

cd build

# Use legacy builder as buildkit still doesn't support subdirectories when building from git repos
export DOCKER_BUILDKIT=0
export COMPOSE_DOCKER_CLI_BUILD=0

docker-compose build

# Just build one image e.g. notebook
docker-compose build notebook

Cross building

The images can be cross-built using docker buildx bake. However buildx bake does not listen to depends_on (since in theory that is only a runtime not a build time constraint docker/buildx#447). To work around this we first need to build the "docker-stacks-foundation" image.

cd build

# If you have permission to push to daskdev/
docker buildx bake --progress=plain --set *.platform=linux/arm64,linux/amd64 --push docker-stacks-foundation
docker buildx bake --progress=plain --set *.platform=linux/arm64,linux/amd64 --push

# If you don'tset DOCKERUSER to your dockerhub username.
export DOCKERUSER=holdenk
docker buildx bake --progress=plain --set *.platform=linux/arm64,linux/amd64 --set docker-stacks-foundation.tags.image=${DOCKERUSER}/docker-stacks-foundation:lab-py38 --push docker-stacks-foundation
docker buildx bake --progress=plain --set *.platform=linux/arm64,linux/amd64 --set scheduler.tags=${DOCKERUSER}/dask --set worker.tags=${DOCKERUSER}/dask --set notebook.tags=${DOCKERUSER}/dask-notebook --set docker-stacks-foundation.tags=${DOCKERUSER}/docker-stacks-foundation:lab-py38 --set notebook.args.base=${DOCKERUSER} --push

Releasing

Building and releasing new image versions is done automatically.

  • When a new Dask version is released the watch-conda-forge action will trigger and open a PR to update the latest release version in this repo.
  • If images build successfully that PR will be automatically merged by the automerge action.
  • When a PR like this is merged which updates the pinned release version a tag is automatically created to match that version by the autotag action.
  • When tags are created a new image is built and pushed using the docker/build-push-action action.

More Repositories

1

dask

Parallel computing with task scheduling
Python
12,531
star
2

dask-tutorial

Dask tutorial
Jupyter Notebook
1,832
star
3

distributed

A distributed task scheduler for Dask
Python
1,576
star
4

dask-ml

Scalable Machine Learning with Dask
Python
898
star
5

dask-examples

Easy-to-run example notebooks for Dask
Jupyter Notebook
373
star
6

dask-kubernetes

Native Kubernetes integration for Dask
Python
311
star
7

dask-labextension

JupyterLab extension for Dask
TypeScript
311
star
8

dask-searchcv

dask-searchcv is now part of dask-ml: https://github.com/dask/dask-ml
Python
240
star
9

dask-jobqueue

Deploy Dask on job schedulers like PBS, SLURM, and SGE
Python
234
star
10

dask-image

Distributed image processing
Python
210
star
11

dask-xgboost

Python
162
star
12

cachey

Caching based on computation time and storage space
Python
137
star
13

hdfs3

A wrapper for libhdfs3 to interact with HDFS from Python
Python
136
star
14

dask-gateway

A multi-tenant server for securely deploying and managing Dask clusters.
Python
136
star
15

dask-cloudprovider

Cloud provider cluster managers for Dask. Supports AWS, Google Cloud Azure and more...
Python
134
star
16

dask-ec2

Start a cluster in EC2 for dask.distributed
Python
106
star
17

partd

Concurrent appendable key-value storage
Python
105
star
18

dask-tensorflow

Python
93
star
19

helm-chart

Helm charts for Dask
YAML
91
star
20

dask-expr

Python
86
star
21

dask-lightgbm

Python
79
star
22

dask-glm

Python
76
star
23

zict

Useful Mutable Mappings
Python
69
star
24

dask-yarn

Deploy dask on YARN clusters
Python
69
star
25

dask-gke

kubernetes setup to bootstrap distributed on google container engine
Python
67
star
26

old-dask-examples

Collection of dask example notebooks
Jupyter Notebook
57
star
27

knit

Deprecated, please use https://github.com/jcrist/skein or https://github.com/dask/dask-yarn instead
Python
53
star
28

dask-mpi

Deploy Dask using MPI4Py
Python
52
star
29

dask-stories

Python
40
star
30

dask-drmaa

Deploy Dask on DRMAA clusters
Python
40
star
31

dask-blog

Dask development blog
HTML
30
star
32

crick

Streaming and approximate algorithms. WIP, use at own risk.
Python
24
star
33

community

For general discussion and community planning. Discussion issues welcome.
20
star
34

mtprof

Thread-aware Python profiler hack
Python
17
star
35

dask-benchmarks

asv benchmarks for dask projects
Python
17
star
36

pandas-streaming

Python
16
star
37

dask-tutorial-infrastructure

Cluster for the Dask Tutorial.
Dockerfile
11
star
38

old-dask-yarn

Deprecated, please use https://github.com/jcrist/skein or https://github.com/dask/dask-yarn instead
Python
7
star
39

governance

The governance process and model for Dask
7
star
40

dask-sphinx-theme

Sphinx theme for Dask documentation
Python
6
star
41

dask-ml-benchmarks

Python
5
star
42

dask.github.io

Dask Website
HTML
5
star
43

scipy-tutorials-2018

5
star
44

design-docs

Experimental repo for proposals of future work
2
star
45

.github

2
star
46

dask-org

General dask resources that aren't code
Jupyter Notebook
2
star
47

marketing

Resources and guidelines for marketing Dask
Python
1
star
48

dask-gateway-helm-repo

Repository holding published dask-gateway helm charts
1
star
49

parquet-integration

Integration tests for various parquet readers and writers
Python
1
star