• Stars
    star
    185
  • Rank 207,512 (Top 5 %)
  • Language
    Java
  • License
    Apache License 2.0
  • Created over 11 years ago
  • Updated 7 days ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Pegasus Workflow Management System - Automate, recover, and debug scientific computations.

Pegasus WMS

Pegasus Workflow Management System

Pegasus WMS is a configurable system for mapping and executing scientific workflows over a wide range of computational infrastructures including laptops, campus clusters, supercomputers, grids, and commercial and academic clouds. Pegasus has been used to run workflows with up to 1 million tasks that process tens of terabytes of data at a time.

Pegasus WMS bridges the scientific domain and the execution environment by automatically mapping high-level workflow descriptions onto distributed resources. It automatically locates the necessary input data and computational resources required by a workflow, and plans out all of the required data transfer and job submission operations required to execute the workflow. Pegasus enables scientists to construct workflows in abstract terms without worrying about the details of the underlying execution environment or the particulars of the low-level specifications required by the middleware (Condor, Globus, Amazon EC2, etc.). In the process, Pegasus can plan and optimize the workflow to enable efficient, high-performance execution of large workflows on complex, distributed infrastructures.

Pegasus has a number of features that contribute to its usability and effectiveness:

  • Portability / Reuse – User created workflows can easily be run in different environments without alteration. Pegasus currently runs workflows on top of Condor pools, Grid infrastructures such as Open Science Grid and XSEDE, Amazon EC2, Google Cloud, and HPC clusters. The same workflow can run on a single system or across a heterogeneous set of resources.
  • Performance – The Pegasus mapper can reorder, group, and prioritize tasks in order to increase overall workflow performance.
  • Scalability – Pegasus can easily scale both the size of the workflow, and the resources that the workflow is distributed over. Pegasus runs workflows ranging from just a few computational tasks up to 1 million. The number of resources involved in executing a workflow can scale as needed without any impediments to performance.
  • Provenance – By default, all jobs in Pegasus are launched using the Kickstart wrapper that captures runtime provenance of the job and helps in debugging. Provenance data is collected in a database, and the data can be queried with tools such as pegasus-statistics, pegasus-plots, or directly using SQL.
  • Data Management – Pegasus handles replica selection, data transfers and output registration in data catalogs. These tasks are added to a workflow as auxilliary jobs by the Pegasus planner.
  • Reliability – Jobs and data transfers are automatically retried in case of failures. Debugging tools such as pegasus-analyzer help the user to debug the workflow in case of non-recoverable failures.
  • Error Recovery – When errors occur, Pegasus tries to recover when possible by retrying tasks, by retrying the entire workflow, by providing workflow-level checkpointing, by re-mapping portions of the workflow, by trying alternative data sources for staging data, and, when all else fails, by providing a rescue workflow containing a description of only the work that remains to be done. It cleans up storage as the workflow is executed so that data-intensive workflows have enough space to execute on storage-constrained resources. Pegasus keeps track of what has been done (provenance) including the locations of data used and produced, and which software was used with which parameters.

Getting Started

You can find more information about Pegasus on the Pegasus Website.

Pegasus has an extensive User Guide that documents how to create, plan, and monitor workflows.

We recommend you start by completing the Pegasus Tutorial from Chapter 3 of the Pegasus User Guide.

The easiest way to install Pegasus is to use one of the binary packages available on the Pegasus downloads page. Consult Chapter 2 of the Pegasus User Guide for more information about installing Pegasus from binary packages.

There is documentation on the Pegasus website for the Python, Java and R Abstract Workflow Generator APIs. We strongly recommend using the Python API which is feature complete, and also allows you to invoke all the pegasus command line tools.

You can use pegasus-init command line tool to run several examples on your local machine. Consult Chapter 4 of the Pegasus User Guide for more information.

There are also examples of how to Configure Pegasus for Different Execution Environments in the Pegasus User Guide.

If you need help using Pegasus, please contact us. See the [contact page] (http://pegasus.isi.edu/contact) on the Pegasus website for more information.

Building from Source

Pegasus can be compiled on any recent Linux or Mac OS X system.

Source Dependencies

In order to build Pegasus from source, make sure you have the following installed:

  • Git
  • Java 8 or higher
  • Python 3.5 or higher
  • R
  • Ant
  • gcc
  • g++
  • make
  • tox 3.14.5 or higher
  • mysql (optional, required to access MySQL databases)
  • postgresql (optional, required to access PostgreSQL databases)
  • Python pyyaml
  • Python GitPython

Other packages may be required to run unit tests, and build MPI tools.

Compiling

Ant is used to compile Pegasus.

To get a list of build targets run:

$ ant -p

The targets that begin with "dist" are what you want to use.

To build a basic binary tarball (excluding documentation), run:

$ ant dist

To build the release tarball (including documentation), run:

$ ant dist-release

The resulting packages will be created in the dist subdirectory.

More Repositories

1

WorkflowGenerator

Synthetic workflow generators
Java
40
star
2

1000genome-workflow

Bioinformatics workflow that identifies mutational overlaps using data from the 1000 genomes project
Jupyter Notebook
11
star
3

montage-workflow-v3

A new Python DAX generator version of the classic Montage workflow. This workflow uses the Montage toolkit to re-project, background correct and add astronomical images into custom mosaics.
Python
9
star
4

precip

Pegasus Repeatable Experiments for the Cloud in Python
Python
8
star
5

montage-workflow-v2

A new Python DAX generator version of the classic Montage workflow
Python
7
star
6

craft

JSON schema for design flows (DARPA CRAFT Program)
Python
6
star
7

Soybean-Workflow

Python
4
star
8

pegasus-olcf-kubernetes

Dockerfile
3
star
9

dipa-workflow

Pegasus workflow for the DIPA Pipeline at Waisman Center
Python
3
star
10

darpa_population_modeling

Sample workflow to demonstrate how Pegasus can be used to manage the population modeling tools in the MINT project
Python
3
star
11

SAGA-Sample-Workflow

Example on how to run Pegasus workflows on the ISI SAGA cluster
Python
3
star
12

ACME-Workflow

Pegasus workflow for ACME climate models
Shell
3
star
13

SNS-Workflow

SNS Refinement Workflow
Python
2
star
14

lung-instance-segmentation-workflow

Instance segmentation with U-Net/Mask R-CNN workflow using Keras & Ray Tune
Python
2
star
15

homebrew-tools

Homebrew Formulas for Pegasus
Ruby
2
star
16

pegasus-docker-build

Dockerfiles to build containers to build and test Pegasus on variety of platforms.
Dockerfile
2
star
17

ACCESS-Pegasus-Examples

Pegasus Workflows examples including the Pegasus tutorial, to run on ACCESS resources.
Jupyter Notebook
2
star
18

pegasus-llnl

Repository for work with LLNL
Shell
2
star
19

PGen-GenomicVariations-Workflow

Python
2
star
20

time-driven-data-placement

Code to recreate algorithm from the paper "A Time-driven Data Placement Strategy for a Scientific Workflow Combining Edge Computing and Cloud Computing"
Jupyter Notebook
2
star
21

pegasus-service

Pegasus as a Service
Python
2
star
22

freesurfer-osg-workflow

A Pegasus workflow for running FreeSurfer on the Open Science Grid
Python
2
star
23

BLAST-Workflow

Pegasus workflow for NCBI BLAST - Basic Local Alignment Search Tool
Python
2
star
24

edge-synthetic-workflow

Python
1
star
25

jetstream-pegasus

Example setup for HTCondor/Pegasus VMs on Jetstream
SaltStack
1
star
26

seismology-workflow

Python
1
star
27

pegasus-glidein

A simple HTCondor glidein for HPC systems
Shell
1
star
28

spark-workflow

Shell
1
star
29

pegasus-gtfar

ISI & Keck School collaboration to implement a Pegasus workflow with a GUI front-end for the GTFAR workflow.
Python
1
star
30

GlideinLite

Shell
1
star
31

SPLINTER-Workflow

Pegasus workflow for SPLINTER
Python
1
star
32

pegasus-split-example

Simple hierarchical workflow demonstrating dynamic split/process/merge
Python
1
star
33

TACC-Wrangler-Pegasus-Example

Python
1
star
34

process-workflow

Process workflow example
Python
1
star
35

dibbs-data-collection-setup

Quickly setup RabbitMQ, an ELK stack, and Grafana to start collecting and visualizing data from Pegasus workflow runs.
1
star
36

pegasus-docker-deploy

Set of scripts for deploying Pegasus into Docker containers using overlay network
Shell
1
star
37

pegasus-wlpipe

For running Weak Lensing pipelines through Pegasus
Python
1
star
38

pegasus-sage

Pegasus Sage Workflows
Jupyter Notebook
1
star
39

pegasus-metrics

Anonymous usage metrics collection and reporting for Pegasus
Python
1
star
40

AutoDock-Vina-Workflow

AutoDock Vina Pegasus workflow for the OSG Connect infrastructure
Python
1
star
41

single-cell-rna-bioconductor

R
1
star
42

data-placement-algo-example

Pegasus example to recreate example from the paper "A Time-driven Data Placement Strategy for a Scientific Workflow Combining Edge Computing and Cloud Computing"
Python
1
star
43

page-imputation

Pegasus Imputation Workflows for PAGE2
Shell
1
star
44

pegasus-fb-nlp

Example of a Facebook NLP pipeline
Python
1
star
45

casa-containers

Dockerfile
1
star
46

pegasus-cycles-ui

JavaScript
1
star
47

cwl-to-dax-reference

example CWL workflows, corresponding DAX workflows, and test scripts to convert CWL to DAX
Common Workflow Language
1
star
48

galaxy-classification-workflow

Python
1
star
49

orcasound-workflow

Pegasus workflow for the orca github actions workflow
Python
1
star
50

tutorials

Repository for tutorial materials, that are specific to a conference, or an organization. This is not the repo for the main pegasus tutorial that is in the pegasus documentation.
HTML
1
star
51

SNS-MCViNE

Python
1
star