• Stars
    star
    306
  • Rank 131,732 (Top 3 %)
  • Language
    TypeScript
  • License
    BSD 3-Clause "New...
  • Created over 7 years ago
  • Updated 9 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

JupyterLab extension for Dask

Dask JupyterLab Extension

Build Status Version Downloads Dependencies

This package provides a JupyterLab extension to manage Dask clusters, as well as embed Dask's dashboard plots directly into JupyterLab panes.

Dask Extension

Explanatory Video (5 minutes)

Dask + JupyterLab Screencast

Requirements

JupyterLab >= 1.0 distributed >= 1.24.1

Installation

To install the Dask JupyterLab extension you will need to have JupyterLab installed. For JupyterLab < 3.0, you will also need Node.js version >= 12. These are available through a variety of sources. One source common to Python users is the conda package manager.

conda install jupyterlab
conda install -c conda-forge nodejs

JupyterLab 3.0 or greater

You should be able to install this extension with pip or conda, and start using it immediately, e.g.

pip install dask-labextension

JupyterLab 3.x

This extension includes both client-side and server-side components. Prior to JupyterLab 3.0 these needed to be installed separately, with node available on the machine.

The server-side component can be installed via pip or conda-forge:

pip install dask_labextension
conda install -c conda-forge dask-labextension

You then build the client-side extension into JupyterLab with:

jupyter labextension install dask-labextension

If you are running Notebook 5.2 or earlier, enable the server extension by running

jupyter serverextension enable --py --sys-prefix dask_labextension

Configuration of Dask cluster management

This extension has the ability to launch and manage several kinds of Dask clusters, including local clusters and kubernetes clusters. Options for how to launch these clusters are set via the dask configuration system, typically a .yml file on disk.

By default the extension launches a LocalCluster, for which the configuration is:

labextension:
  factory:
    module: 'dask.distributed'
    class: 'LocalCluster'
    args: []
    kwargs: {}
  default:
    workers: null
    adapt:
      null
      # minimum: 0
      # maximum: 10
  initial:
    []
    # - name: "My Big Cluster"
    #   workers: 100
    # - name: "Adaptive Cluster"
    #   adapt:
    #     minimum: 0
    #     maximum: 50

In this configuration, factory gives the module, class name, and arguments needed to create the cluster. The default key describes the initial number of workers for the cluster, as well as whether it is adaptive. The initial key gives a list of initial clusters to start upon launch of the notebook server.

In addition to LocalCluster, this extension has been used to launch several other Dask cluster objects, a few examples of which are:

  • A SLURM cluster, using
labextension:
    factory:
      module: 'dask_jobqueue'
       class: 'SLURMCluster'
       args: []
       kwargs: {}
  • A PBS cluster, using
labextension:
  factory:
    module: 'dask_jobqueue'
    class: 'PBSCluster'
    args: []
    kwargs: {}
labextension:
  factory:
    module: dask_kubernetes
    class: KubeCluster
    args: []
    kwargs: {}

Configuring a default layout

This extension can store a default layout for the Dask dashboard panes, which is useful if you find yourself reaching for the same dashboard charts over and over. You can launch the default layout via the command palette, or by going to the File menu and choosing "Launch Dask Dashboard Layout".

Default layouts can be configured via the JupyterLab config system (either using the JSON editor or the user interface). Specify a layout by writing a JSON object keyed by the individual charts you would like to open. Each chart is opened with a mode, and a ref. mode refers to how the chart is to be added to the workspace. For example, if you want to split a panel and add the new one to the right, choose split-right. Other options are split-top, split-bottom, split-left, tab-after, and tab-before. ref refers to the panel to which mode is applied, and might be the names of other dashboard panels. If ref is null, the panel in question is added at the top of the layout hierarchy.

A concrete example of a default layout is

{
  "individual-task-stream": {
    "mode": "split-right",
    "ref": null
  },
  "individual-workers-memory": {
    "mode": "split-bottom",
    "ref": "individual-task-stream"
  },
  "individual-progress": {
    "mode": "split-right",
    "ref": "individual-workers-memory"
  }
}

which adds the task stream to the right of the workspace, then adds the worker memory chart below the task stream, then adds the progress chart to the right of the worker memory chart.

Development install

As described in the JupyterLab documentation for a development install of the labextension you can run the following in this directory:

jlpm  # Install npm package dependencies
jlpm build  # Compile the TypeScript sources to Javascript
jupyter labextension develop . --overwrite  # Install the current directory as an extension

To rebuild the extension:

jlpm build

You should then be able to refresh the JupyterLab page and it will pick up the changes to the extension.

To run an editable install of the server extension, run

pip install -e .
jupyter serverextension enable --sys-prefix dask_labextension

Publishing

This application is distributed as two subpackages.

The JupyterLab frontend part is published to npm, and the server-side part to PyPI.

Releases for both packages are done with the jlpm tool, git and Travis CI.

Note: Package versions are not prefixed with the letter v. You will need to disable this.

$ jlpm config set version-tag-prefix ""

Making a release

$ jlpm version [--major|--minor|--patch]  # updates package.json and creates git commit and tag
$ git push upstream main && git push upstream main --tags  # pushes tags to GitHub which triggers Travis CI to build and deploy

More Repositories

1

dask

Parallel computing with task scheduling
Python
12,031
star
2

dask-tutorial

Dask tutorial
Jupyter Notebook
1,817
star
3

distributed

A distributed task scheduler for Dask
Python
1,544
star
4

dask-ml

Scalable Machine Learning with Dask
Python
882
star
5

dask-examples

Easy-to-run example notebooks for Dask
Jupyter Notebook
361
star
6

dask-kubernetes

Native Kubernetes integration for Dask
Python
309
star
7

dask-searchcv

dask-searchcv is now part of dask-ml: https://github.com/dask/dask-ml
Python
240
star
8

dask-jobqueue

Deploy Dask on job schedulers like PBS, SLURM, and SGE
Python
230
star
9

dask-docker

Docker images for dask
Jupyter Notebook
227
star
10

dask-image

Distributed image processing
Python
199
star
11

dask-xgboost

Python
163
star
12

hdfs3

A wrapper for libhdfs3 to interact with HDFS from Python
Python
136
star
13

cachey

Caching based on computation time and storage space
Python
134
star
14

dask-cloudprovider

Cloud provider cluster managers for Dask. Supports AWS, Google Cloud Azure and more...
Python
129
star
15

dask-gateway

A multi-tenant server for securely deploying and managing Dask clusters.
Python
128
star
16

dask-ec2

Start a cluster in EC2 for dask.distributed
Python
106
star
17

partd

Concurrent appendable key-value storage
Python
100
star
18

dask-tensorflow

Python
93
star
19

helm-chart

Helm charts for Dask
YAML
89
star
20

dask-lightgbm

Python
78
star
21

dask-expr

Python
77
star
22

dask-glm

Python
75
star
23

dask-yarn

Deploy dask on YARN clusters
Python
69
star
24

zict

Useful Mutable Mappings
Python
68
star
25

dask-gke

kubernetes setup to bootstrap distributed on google container engine
Python
67
star
26

old-dask-examples

Collection of dask example notebooks
Jupyter Notebook
56
star
27

knit

Deprecated, please use https://github.com/jcrist/skein or https://github.com/dask/dask-yarn instead
Python
53
star
28

dask-mpi

Deploy Dask using MPI4Py
Python
49
star
29

dask-drmaa

Deploy Dask on DRMAA clusters
Python
41
star
30

dask-stories

Python
39
star
31

dask-blog

Dask development blog
HTML
30
star
32

crick

Streaming and approximate algorithms. WIP, use at own risk.
Python
21
star
33

community

For general discussion and community planning. Discussion issues welcome.
20
star
34

dask-benchmarks

asv benchmarks for dask projects
Python
17
star
35

pandas-streaming

Python
16
star
36

mtprof

Thread-aware Python profiler hack
Python
16
star
37

dask-tutorial-infrastructure

Cluster for the Dask Tutorial.
Dockerfile
11
star
38

old-dask-yarn

Deprecated, please use https://github.com/jcrist/skein or https://github.com/dask/dask-yarn instead
Python
7
star
39

governance

The governance process and model for Dask
7
star
40

dask-sphinx-theme

Sphinx theme for Dask documentation
Python
6
star
41

dask-ml-benchmarks

Python
5
star
42

dask.github.io

Dask Website
HTML
5
star
43

scipy-tutorials-2018

5
star
44

design-docs

Experimental repo for proposals of future work
2
star
45

.github

2
star
46

dask-org

General dask resources that aren't code
Jupyter Notebook
2
star
47

marketing

Resources and guidelines for marketing Dask
Python
1
star
48

dask-gateway-helm-repo

Repository holding published dask-gateway helm charts
1
star
49

parquet-integration

Integration tests for various parquet readers and writers
Python
1
star