• Stars
    star
    193
  • Rank 201,081 (Top 4 %)
  • Language
    Shell
  • License
    GNU General Publi...
  • Created about 11 years ago
  • Updated 12 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Magpie contains a number of scripts for running Big Data software in HPC environments, including Hadoop and Spark. There is support for Lustre, Slurm, Moab, Torque. LSF, Flux, and more.

Magpie

Magpie contains a number of scripts for running Big Data software in HPC environments. Thus far, Hadoop, Spark, Hbase, Storm, Pig, Phoenix, Kafka, Zeppelin, Zookeeper, and Alluxio are supported. It currently supports running over the parallel file system Lustre and running over any generic network filesytem. There is scheduler/resource manager support for Slurm, Moab, Torque, and LSF.

Some of the features presently supported:

  • Run jobs interactively or via scripts.
  • Run against a number of filesystem options, such as HDFS, HDFS over Lustre, HDFS over a generic network filesystem, Lustre directly, or a generic network filesystem.
  • Take advantage of SSDs/NVRAM for local caching if available
  • Make decent optimizations for your hardware

Experimental support for several distributed machine learning frameworks has also been added. Presently tensorflow, tensorflow w/ horovod, and ray is supported.

Basic Idea

The basic idea behind these scripts are to:

  1. Submit a Magpie batch script to allocate nodes on a cluster using your HPC scheduler/resource manager. Slurm, Slurm+mpirun, Moab+Slurm, Moab+Torque and LSF+mpirun are currently supported.

  2. The batch script will create configuration files for all appropriate projects (Hadoop, Spark, etc.) The configuration files will be setup so the rank 0 node is the "master". All compute nodes will have configuration files created that point to the node designated as the master server.

    The configuration files will be populated with values for your filesystem choice and the hardware that exists in your cluster. Reasonable attempts are made to determine optimal values for your system and hardware (they are almost certainly better than the default values). A number of options exist in the batch scripts to adjust these values for individual jobs.

  3. Launch daemons on all nodes. The rank 0 node will run master daemons, such as the Hadoop Namenode. All remaining nodes will run appropriate worker daemons, such as the Hadoop Datanodes.

  4. Now you have a mini big data cluster to do whatever you want. You can log into the master node and interact with your mini big data cluster however you want. Or you could have Magpie run a script to execute your big data calculation instead.

  5. When your job completes or your allocation time has run out, Magpie will cleanup your job by tearing down daemons. When appropriate, Magpie may also do some additional cleanup work to hopefully make re-execution on later runs cleaner and faster.

Supported Packages & Versions

For a complete list of supported package versions and dependencies, please see doc/README. The following can be considered a summary of support.

Hadoop - 2.2.0, 2.3.0, 2.4.X, 2.5.X, 2.6.X, 2.7.X, 2.8.X, 2.9.X, 3.0.X, 3.1.X, 3.2.X, 3.3.X

Spark - 1.1.X, 1.2.X, 1.3.X, 1.4.X, 1.5.X, 1.6.X, 2.0.X, 2.1.X, 2.2.X, 2.3.X, 2.4.X, 3.0.X, 3.1.X, 3.2.X, 3.3.X

Hbase - 1.0.X, 1.1.X, 1.2.X, 1.3.X, 1.4.X, 1.5.X, 1.6.X

Hive - 2.3.0

Pig - 0.13.0, 0.14.0, 0.15.0, 0.16.0, 0.17.0

Zookeeper - 3.4.X

Storm - 0.9.X, 0.10.X, 1.0.X, 1.1.X, 1.2.X

Phoenix - 4.5.X, 4.6.0, 4.7.0, 4.8.X, 4.9.0, 4.10.1, 4.11.0, 4.12.0, 4.13.X, 4.14.0

Kafka - 2.11-0.9.0.0

Zeppelin - 0.6.X, 0.7.X, 0.8.X

Alluxio - 2.3.0

TensorFlow - 1.9, 1.12

Ray - 0.7.0

Older Supported Packages & Features

Some packages and features were dropped due to lack of interest, the software becoming old/deprecated, and/or their initial experimental addition into Magpie. If you are interested in them, please look at older versions for supported versions and documentation. If you are very interested in support in current versions of Magpie beyond an experimental nature, please submit a support request and we can reconsider adding it back in.

Removed in Magpie 2.0

  • Hadoop 1.X support
  • Tachyon
  • UDA/uda-plugin for Hadoop
  • HDFS Federation in Hadoop
  • IntelLustre option for a Hadoop Filesystem
  • MagpieNetworkFS option for a Hadoop Filesystem

Removed in Magpie 3.0

  • Spark 0.9.X support
  • Hbase 0.98.X and 0.99.X support
  • Mahout

Documentation

All documentation is in the 'doc' subdirectory. Please see the doc/README file as a starting point. It provides general instructions as well as pointers to documentation for each project, setup requirements, ability to do local configurations, tips & tricks, and more information.

Release

Magpie is release under a GPL license. For more information, see the COPYING file.

LLNL-CODE-644248

More Repositories

1

zfp

Compressed numerical arrays that support high-speed random access
C++
668
star
2

sundials

Official development repository for SUNDIALS - a SUite of Nonlinear and DIfferential/ALgebraic equation Solvers. Pull requests are welcome for bug fixes and minor changes.
C
515
star
3

RAJA

RAJA Performance Portability Layer (C++)
C++
458
star
4

Caliper

Caliper is an instrumentation and performance profiling library
C++
345
star
5

Umpire

An application-focused API for memory management on NUMA & GPU architectures
C++
315
star
6

blt

A streamlined CMake build system foundation for developing HPC software
C++
253
star
7

lbann

Livermore Big Artificial Neural Network Toolkit
C++
223
star
8

SAMRAI

Structured Adaptive Mesh Refinement Application Infrastructure - a scalable C++ framework for block-structured AMR application development
C++
214
star
9

hiop

HPC solver for nonlinear optimization problems
C++
210
star
10

conduit

Simplified Data Exchange for HPC Simulations
C++
207
star
11

libROM

Data-driven model reduction library with an emphasis on large scale parallelism and linear subspace methods
C++
201
star
12

HPC-Tutorials

Future home of hpc-tutorials.llnl.gov
C
188
star
13

units

A run-time C++ library for working with units of measurement and conversions between them and with string representations of units and measurements
C++
140
star
14

maestrowf

A tool to easily orchestrate general computational workflows both locally and on supercomputers
Python
133
star
15

merlin

Machine Learning for HPC Workflows
Python
121
star
16

serac

Serac is a high order nonlinear thermomechanical simulation code
C++
120
star
17

axom

CS infrastructure components for HPC applications
C++
110
star
18

UnifyFS

UnifyFS: A file system for burst buffers
C
106
star
19

ior

Parallel filesystem I/O benchmark
C
105
star
20

umap

User-space Page Management
C++
104
star
21

CHAI

Copy-hiding array abstraction to automatically migrate data between memory spaces
C++
104
star
22

cowc

Cars Overhead With Context related scripts described in Mundhenk et al. 2016 ECCV.
Python
104
star
23

scr

SCR caches checkpoint data in storage on the compute nodes of a Linux cluster to provide a fast, scalable checkpoint / restart capability for MPI codes.
C
99
star
24

LULESH

Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics (LULESH)
C++
97
star
25

msr-safe

Allows safer access to model specific registers (MSRs)
C
92
star
26

RAJAPerf

RAJA Performance Suite
C++
90
star
27

FAST

Fusion models for Atomic and molecular STructures (FAST)
Python
89
star
28

shroud

Shroud: generate Fortran and Python wrappers for C and C++ libraries
C++
87
star
29

MacPatch

Software & Patch management for macOS
Objective-C
85
star
30

Aluminum

High-performance, GPU-aware communication library
C++
84
star
31

mpiP

A light-weight MPI profiler.
C
79
star
32

yorick

yorick interpreted language
C
78
star
33

camp

Compiler agnostic metaprogramming library providing concepts, type operations and tuples for C++ and cuda
C++
78
star
34

fpzip

Lossless compressor of multidimensional floating-point arrays
C++
75
star
35

GOTCHA

GOTCHA is a library for wrapping function calls in shared libraries
C
68
star
36

dataracebench

Data race benchmark suite for evaluating OpenMP correctness tools aimed to detect data races.
C
67
star
37

variorum

Vendor-neutral library for exposing power and performance features across diverse architectures
C++
67
star
38

STAT

STAT - the Stack Trace Analysis Tool
C
63
star
39

lmt

Lustre Monitoring Tools
C
62
star
40

pyranda

A Python driven, Fortran powered Finite Difference solver for arbitrary hyperbolic PDE systems. This is the mini-app for the Miranda code.
Fortran
61
star
41

spheral

C++
60
star
42

Abmarl

Agent Based Modeling and Reinforcement Learning
Python
56
star
43

pylibROM

Python interface for libROM, library for reduced order models
Python
56
star
44

ExaCA

Cellular automata code for alloy nucleation and solidification written with Kokkos
C++
56
star
45

lustre

LLNL's branches of Lustre
C
55
star
46

metall

Persistent memory allocator for data-centric analytics
C++
53
star
47

libmsr

Wrapper library for model-specific registers. APIs cover RAPL, performance counters, clocks and turbo.
C
52
star
48

H5Z-ZFP

A registered ZFP compression plugin for HDF5
C
50
star
49

mpiBench

MPI benchmark to test and measure collective performance
C
50
star
50

cardioid

Cardiac simulation toolkit.
C++
49
star
51

scraper

Python library for getting metadata from source code hosting tools
Python
49
star
52

llnl.github.io

Public home for LLNL software catalog
JavaScript
48
star
53

mttime

Time Domain Moment Tensor Inversion in Python
Python
45
star
54

quandary

Optimal control for open quantum systems
C++
45
star
55

LaSDI

Jupyter Notebook
45
star
56

GridDyn

GridDyn is an open-source power transmission simulation software package
C++
45
star
57

qball

Qball (also known as qb@ll) is a first-principles molecular dynamics code that is used to compute the electronic structure of atoms, molecules, solids, and liquids within the Density Functional Theory (DFT) formalism. It is a fork of the Qbox code by Francois Gygi.
C++
45
star
58

mgmol

MGmol is a scalable O(N) First-Principles Molecular Dynamics code that is capable of performing large-scale electronics structure calculations and molecular dynamics simulations of atomistic systems.
C++
44
star
59

Juqbox.jl

Juqbox.jl solves quantum optimal control problems in closed quantum systems
Julia
42
star
60

ExaConstit

A crystal plasticity FEM code that runs on the GPU
C++
41
star
61

unum

Universal Number Library
C
40
star
62

fastcam

A toolkit for efficent computation of saliency maps for explainable AI attribution. This tool was developed at Lawrence Livermore National Laboratory.
Jupyter Notebook
39
star
63

DJINN

Deep jointly-informed neural networks -- as easy-to-use algorithm for designing/initializing neural nets
Python
39
star
64

CxxPolyFit

A simple library for producing multidimensional polynomial fits for C++
Fortran
37
star
65

cruise

User space POSIX-like file system in main memory
C
35
star
66

Kripke

Kripke is a simple, scalable, 3D Sn deterministic particle transport code
C++
35
star
67

UEDGE

2D fluid simulation of plasma and neutrals in magnetic fusion devices
Fortran
34
star
68

wrap

MPI wrapper generator, for writing PMPI tool libraries
Python
34
star
69

acrotensor

A C++ library for computing large scale tensor contractions.
C++
34
star
70

AMPE

Adaptive Mesh Phase-field Evolution
C++
34
star
71

MACSio

A Multi-purpose, Application-Centric, Scalable I/O Proxy Application
C
34
star
72

zero-rk

Zero-order Reaction Kinetics (Zero-RK) is a software package that simulates chemically reacting systems in a computationally efficient manner.
C++
33
star
73

ddcMD

A fully GPU-accelerated molecular dynamics program for the Martini force field
C
33
star
74

GPLaSDI

Python
32
star
75

Quicksilver

A proxy app for the Monte Carlo Transport Code, Mercury. LLNL-CODE-684037
C++
32
star
76

mpibind

Pragmatic, Productive, and Portable Affinity for HPC
C
32
star
77

FPChecker

A dynamic analysis tool to detect floating-point errors in HPC applications.
Python
31
star
78

CallFlow

Visualization tool for analyzing call trees and graphs
Vue
30
star
79

FGPU

Fortran
30
star
80

graphite

A repository for implementing graph network models based on atomic structures.
Jupyter Notebook
30
star
81

ygm

C++
29
star
82

AMG

Algebraic multigrid benchmark
C
28
star
83

gLaSDI

Python
28
star
84

Silo

Mesh and Field I/O Library and Scientific Database
C
28
star
85

CARE

CHAI and RAJA provide an excellent base on which to build portable codes. CARE expands that functionality, adding new features such as loop fusion capability and a portable interface for many numerical algorithms. It provides all the basics for anyone wanting to write portable code.
C++
28
star
86

hatchet

Graph-indexed Pandas DataFrames for analyzing hierarchical performance data
JavaScript
28
star
87

burstfs

C
27
star
88

ravel

Ravel MPI trace visualization tool
C++
27
star
89

mpiGraph

MPI benchmark to generate network bandwidth images
Perl
27
star
90

macc

Robust neural network surrogate for inertial confinement fusion
Python
26
star
91

benchpark

An open collaborative repository for reproducible specifications of HPC benchmarks and cross site benchmarking environments
Python
26
star
92

Tribol

Modular interface physics library featuring state-of-the-art contact physics methods.
C++
25
star
93

uberenv

Automates using spack to build and deploy software
Shell
25
star
94

havoqgt

C++
25
star
95

muster

Massively Scalable Clustering
C++
23
star
96

MemAxes

Interactive Visualization of Memory Access Samples
C++
23
star
97

cram

Tool to run many small MPI jobs inside of one large MPI job.
Python
23
star
98

MuyGPyS

A fast, pure python implementation of the MuyGPs Gaussian process realization and training algorithm.
Python
23
star
99

mdtest

Used for testing the metadata performance of a file system
C
23
star
100

SoRa

SoRa uses genetic programming to find mathematical representations from experimental data
Python
23
star