• Stars
    star
    119
  • Rank 297,930 (Top 6 %)
  • Language HCL
  • License
    MIT License
  • Created over 5 years ago
  • Updated 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Terraform modules to replicate the HPC user experience in the cloud

Magic Castle

DOI Build Status

The Digital Research Alliance of Canada provides HPC infrastructure and support to every academic research institution in Canada. The Alliance uses CVMFS, a software distribution system developed at CERN, to make its research software stack available on its HPC clusters, and anywhere else with internet access. This enables replication of the user experience outside of The Alliance physical infrastructure.

From these new possibilities emerged an open-source software project named Magic Castle, which aims to recreate the HPC user experience in public clouds. Magic Castle uses the open-source software Terraform and HashiCorp Language (HCL) to define the virtual machines, volumes, and networks that are required to replicate a virtual HPC infrastructure. The infrastructure definition is packaged as a Terraform module that users can customize as they require. After deployment, the user is provided with a complete HPC cluster software environment including a Slurm scheduler, a Globus Endpoint, JupyterHub, LDAP, DNS, and over 3000 research software applications compiled by experts with EasyBuild. Magic Castle is compatible with AWS, Microsoft Azure, Google Cloud, OpenStack, and OVH.

Setup

How Magic Castle Works

This software project integrates multiple parts that come into play at different steps of spawning the cluster. The following list enumerates the steps involved in order for users to better grasp what is happening when they create clusters.

We will refer to the user of Magic Castle as the operator.

  1. After downloading the latest release of the cloud provider of choice and adapting the main configuration file, the operator launches Terraform and accepts the proposed plan.
  2. Terraform communicates with the cloud provider REST API and requests the creation of the virtual machines.
  3. For each virtual machine creation request, Magic Castle provides a cloud-init file. This file is used to initialize the virtual machine base configuration and installs puppet agent. The cloud-init file of the puppet tagged virtual machine installs and configures a Puppet primary server.
  4. Terraform uploads on the Puppet primary server instance a YAML file containing information about the roles of each instances specified as tags.
  5. The Puppet agents communicate with the Puppet primary server to retrieve and apply their configuration based on the tags defined in the preceding YAML file.

Talks, slides and videos

List of other cloud HPC cluster open-source projects

When I think about the DevOps landscape, we have so many people just like chefs in a restaurant that are experimenting with different ways of doing things. Once they get it, then they create those recipes. Those recipes in our world is source code. [...] That's why we will always have duplicates and similar projects, because there's going to be one ingredient that's going to be slightly different to make you preferred over something else

Kelsey Hightower, Sourcegraph Podcast, Episode 16, 2020

Contributing / Customizing

Refer to the reference design and the developer documentation.

More Repositories

1

molmodsim-md-theory-lesson-novice

Some practical theoretic background needed for running MD simulations
HTML
20
star
2

lustre-obj-copytool

Object copytool for Lustre's HSM
C
16
star
3

mc-hub

Web interface to launch Magic Castles without knowing anything about Terraform
Python
12
star
4

puppet-jupyterhub

Puppet module to deploy a JupyterHub that submits job to Slurm
Puppet
11
star
5

puppet-magic_castle

Puppet Environment repo for Magic Castle - https://github.com/ComputeCanada/magic_castle
Puppet
11
star
6

software-stack

Repository to host issues relative to the Compute Canada software stack
11
star
7

wheels_builder

Shell
9
star
8

dh-carpentry

Digital Humanities Content for Software Carpentry and Data Carpentry courses
Jupyter Notebook
8
star
9

software-stack-config

Lua
8
star
10

slurm_utils

Various Slurm utilities scripts for Compute Canada staff and users.
Shell
7
star
11

easybuild-computecanada-config

Repository to host custom configuration for Compute Canada's EasyBuild installation
Python
6
star
12

spank-cc-tmpfs_mounts

SPANK plugin for SLURM that allows you to create in-memory, in-cgroup private tmpfs mounts per each job of users. They will automatically clean up when the cgroup is released at the end of their job. Generally used for /tmp, /dev/shm and /var/tmp.
C
6
star
13

molmodsim-amber-md-lesson

MD simulations using Amber
Jupyter Notebook
6
star
14

dhsi-coding-fundamentals-2019

Course content for 2019 DHSI course. This is a break from previous repos dhsi-coding-fundamentals-2018
Jupyter Notebook
5
star
15

k8s-shibboleth-idp-crsc2018

Kubernetes deployment for the Shibboleth IdP for CRSC presentation
4
star
16

matlab-parallel-server-samples

A few validation scripts for your Matlab Parallel Server client configuration
MATLAB
4
star
17

software-stack-custom

This repository is to host custom scripts and modules that are put in our "custom" folder on our stack
Shell
4
star
18

DHSI-BigData

Content related to the Big Data course run by Compute Canada at the Digital Humanities Summer Institute
Jupyter Notebook
4
star
19

ansible-cvmfs-server

Ansible role for configuring CVMFS stratum servers, provided by the Compute Canada CVMFS National Team.
HTML
3
star
20

easybuild-easyconfigs-installed-avx2

Shell
3
star
21

dhsi-coding-fundamentals-2018

Course content for 2018 DHSI Programming for Human(s|ists) course
Jupyter Notebook
3
star
22

molmodsim-gromacs-md-lesson

Running MD simulations using GROMACS
HTML
3
star
23

gentoo-overlay

Gentoo overlay for /cvmfs/soft.computecanada.ca/gentoo prefix installation.
Shell
3
star
24

wiki_module_bot

Python
3
star
25

easybuild-easyconfigs-installed-avx512

Shell
3
star
26

ansible-mc-hub

Ansible playbook to run MC Hub with SAML authentication and HTTPS on CentOS.
Jinja
3
star
27

NREN-perfsonar

NREN perfSonar configuration
Python
3
star
28

avail_wheels

List available wheels from CVMFS wheelhouse
Python
2
star
29

reframe

Repository to hold our configuration and tests
Python
2
star
30

ahep_interactive_analysis_facility

Astronomy and High Energy Physics Interactive Analysis Facility on Compute Canada Cloud
Jinja
2
star
31

2018-05-09-ualberta-dc

HTML
2
star
32

2018-04-30-ttt-canada

Python
1
star
33

modules_list_export

Python
1
star
34

python-scheduler-scripts

Scripts used get information from the SLURM scheduler on the national systems
Python
1
star
35

docs

Issue tracker for user-facing documentation at Compute Canada
1
star
36

druid-dashboard

DRUID project Dashboard/Manager component.
Python
1
star
37

examples-hyperparameter-search

Code examples for distributed hyperparameter search with Keras.
Python
1
star
38

dhsi_2024

Website for the course Coding for Humanists at DHSI 2024
HTML
1
star
39

xalt_doc

Python
1
star
40

terraform-openstack-cc-basic

Terraform OpenStack module for simple deployments.
HCL
1
star
41

mc.computecanada.dev

1
star
42

ml-workshop-notebooks

Machine Learning workshop Jupyter notebooks
Jupyter Notebook
1
star
43

containers-recipes

Shell
1
star
44

lumerical_container

A singularity recipe to create a lumerical container to run the GUI.
1
star
45

openstack-ansible-wordpress

Shell
1
star