• Stars
    star
    104
  • Rank 329,598 (Top 7 %)
  • Language
    Jupyter Notebook
  • License
    Apache License 2.0
  • Created over 4 years ago
  • Updated over 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Jupyter notebooks that analyze COVID-19 time series data

Analyzing COVID-19 time series data

This repository provides a set of Jupyter Notebooks that augment and analyze COVID-19 time series data.

While working on this scenario, we identified that building a pipeline would help organize the notebooks and simplify running the full workflow to process and analyze new data. For this, we leveraged Elyra's ability to build notebook pipelines to orchestrate the running of the full scenario on a Kubeflow Pipeline runtime.

COVID-19 Analytics Pipeline

Configuring the local development environment

WARNING: Do not run these notebooks from your system Python environment.

Use the following steps to create a consistent Python environment for running the notebooks in this repository:

  1. Install Anaconda or Miniconda
  2. Navigate to your local copy of this repository.
  3. Run the script env.sh to create an Anaconda environment in the directory ./env:
    $ bash env.sh
    Note: This script takes a while to run.
  4. Activate the new environment and start JupyterLab:
    $ conda activate ./env
    $ jupyter lab --debug

Configuring a local Kubeflow Pipeline runtime

Elyra's Notebook pipeline visual editor currently supports running these pipelines in a Kubeflow Pipeline runtime. If required, these are the steps to install a local deployment of KFP.

After installing your Kubeflow Pipeline runtime, use the command below (with proper updates) to configure the new KFP runtime with Elyra.

elyra-metadata install runtimes --replace=true \
       --schema_name=kfp \
       --name=kfp-local \
       --display_name="Kubeflow Pipeline (local)" \
       --api_endpoint=http://[host]:[api port]/pipeline \
       --cos_endpoint=http://[host]:[cos port] \
       --cos_username=[cos username] \
       --cos_password=[cos password] \
       --cos_bucket=covid

Note: The cloud object storage above is a local minio object storage but other cloud-based object storage services could be configured and used in this scenario.

Elyra Notebook pipelines

Elyra provides a visual editor for building Notebook-based AI pipelines, simplifying the conversion of multiple notebooks into batch jobs or workflows. By leveraging cloud-based resources to run their experiments faster, the data scientists, machine learning engineers, and AI developers are then more productive, allowing them to spend their time using their technical skills.

Notebook pipeline

Running the Elyra pipeline

The Elyra pipeline us_data.pipeline, which is located in the pipeline directory, can be run by clicking on the play button as seen on the image above. The submit dialog will request two inputs from the user: a name for the pipeline and a runtime to use while executing the pipeline. The list of available runtimes comes from the registered Kubeflow Pipelines runtimes documented above. After submission, Elyra will show a dialog with a direct link to where the experiment is being executed on Kubeflow Piplines.

The user can access the pipelines, and respective experiment runs, via the api_endpoint of the Kubeflow Pipelines runtime (e.g. http://[host]:[port]/pipeline)

Pipeline experiment run

The output from the executed experiments are then available in the associated object storage and the executed notebooks are available as native ipynb notebooks and also in html format to facilitate the visualization and sharing of the results.

Pipeline experiment results in object storage

References

Find more project details on Elyra's GitHub or watching the Elyra's demo.

More Repositories

1

spark-bench

Benchmark Suite for Apache Spark
Scala
238
star
2

text-extensions-for-pandas

Natural language processing support for Pandas dataframes.
Jupyter Notebook
217
star
3

deep-histopath

A deep learning approach to predicting breast tumor proliferation scores for the TUPAC16 challenge
Jupyter Notebook
203
star
4

stocator

Stocator is high performing connector to object storage for Apache Spark, achieving performance by leveraging object storage semantics.
Java
111
star
5

max-central-repo

Central Repository of Model Asset Exchange project. This repository contains information about the available models, current project status, contribution guidelines and supporting assets.
78
star
6

aardpfark

A library for exporting Spark ML models and pipelines to PFA
Scala
54
star
7

presentations

Talks & Workshops by the CODAIT team
Jupyter Notebook
52
star
8

r4ml

Scalable R for Machine Learning
R
42
star
9

spark-ref-architecture

Reference Architectures for Apache Spark
Scala
38
star
10

graph_def_editor

GraphDef Editor: A port of the TensorFlow contrib.graph_editor package that operates over serialized graphs
Python
31
star
11

magicat

πŸ§™πŸ˜Ί magicat - Deep learning magic.. with the convenience of cat!
JavaScript
26
star
12

node-red-contrib-model-asset-exchange

Node-RED nodes for the Model Asset Exchange on IBM Developer
JavaScript
20
star
13

max-tfjs-models

Pre-trained TensorFlow.js models for the Model Asset Exchange
JavaScript
18
star
14

pardata

Python
17
star
15

nlp-editor

Visual Editor for Natural Language Processing pipelines
JavaScript
15
star
16

flight-delay-notebooks

Analyzing flight delay and weather data using Elyra, IBM Data Asset Exchange, Kubeflow Pipelines and KFServing
Jupyter Notebook
15
star
17

spark-db2

DB2/DashDB Connector for Apache Spark
Scala
14
star
18

redrock

RedRock - Mobile Application prototype using Apache Spark, Twitter and Elasticsearch
Scala
14
star
19

spark-netezza

Netezza Connector for Apache Spark
Scala
13
star
20

Identifying-Incorrect-Labels-In-CoNLL-2003

Research into identifying and correcting incorrect labels in the CoNLL-2003 corpus.
Jupyter Notebook
12
star
21

max-vis

Image annotation library and command-line utility for MAX image models
JavaScript
9
star
22

fae-tfjs

JavaScript
9
star
23

WELCOME-TO-CODAIT

Welcome to the Center for Open-Source Data & AI Technologies (CODAIT) organization on GitHub! Learn more about our projects ...
8
star
24

spark-tracing

A flexible instrumentation package for visualizing the internal operation of Apache Spark and related tools
Scala
8
star
25

redrock-v2

RedRock v2 Repository
Jupyter Notebook
8
star
26

max-node-red-docker-image

Demo Docker image for the Model Asset Exchange Node-RED module
Dockerfile
8
star
27

max-workshop-oscon-2019

7
star
28

notebook-exporter

One Click deployment of Notebooks - Bringing Notebooks to Production
Scala
6
star
29

redrock-ios

RedRock - Mobile Application prototype
JavaScript
4
star
30

max-base

This repo has been moved
Python
4
star
31

max-status

Current status of the Model Asset Exchange ecosystem
4
star
32

project-codenet-notebooks

Jupyter Notebook
3
star
33

MAX-Web-App-skeleton

A fully functioning skeleton for MAX model web apps
JavaScript
3
star
34

development-guidelines

Development Guidelines and related resources for IBM Spark Technology Center
3
star
35

codait.github.io

CODAIT Homepage
HTML
3
star
36

dax-schemata

Python
2
star
37

redrock-v2-ios

RedRock v2 iPad Application
JavaScript
2
star
38

max-pytorch-mnist

Jupyter Notebook
2
star
39

teach-nao-robot-a-new-skill

Teach your NAO robot a new skill using deep learning microservices
2
star
40

max-fashion-mnist-tutorial-app

Python
1
star
41

MAX-cloud-deployment-cheatsheets

Work in progress
1
star
42

ddc-data-and-ai-2021-automate-using-open-source

Jupyter Notebook
1
star
43

exchange-metadata-converter

Basic conversion utility for YAML-based metadata descriptors
Python
1
star
44

streaming-integration-sample

Scala
1
star
45

covid-trusted-ai-pipeline

Jupyter Notebook
1
star