• Stars
    star
    117
  • Rank 300,061 (Top 6 %)
  • Language
  • Created almost 7 years ago
  • Updated over 6 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A quick reference to access NYU High Performance Computing

NYU HPC

A quick reference to access NYU's High Performance Computing Prince Cluster.

The official wiki is here, this is an unofficial document created as a quick-start guide for first-time users with a focus in Python.

Get an account

You need to be affiliated to NYU and have a sponsor.

To get an account approved, follow this steps.

Log in

Once you have been approved, you can access HPC from:

  1. Within the NYU network:

Once logged in, the root should be: /home/NYUNetID, so running pwd should print:

[NYUNetID@log-0 ~]$ pwd
/home/NYUNetID
  1. From an off-campus location:

First, login to the bastion host:

Then login to the cluster:

ssh prince.hpc.nyu.edu

File Systems

You can get acces to three filesystems: /home, /scratch, and /archive.

Scratch is a file system mounted on Prince that is connected to the compute nodes where we can upload files faster. Notice that the content gets periodically flushed.

[NYUNetID@log-0 ~]$ cd /scratch/NYUNetID
[NYUNetID@log-0 ~]$ pwd
/scratch/NYUNetID

/home and /scratch are separate filesystems in separate places, but you should use /scratch to store your files.

Loading Modules

Slurm allows you to load and manage multiple versions and configurations of software packages.

To see available package environments:

module avail

To load a model:

module load [package name]

For example if you want to use Tensorflow-gpu:

module load cudnn/8.0v6.0
module load cuda/8.0.44
module load tensorflow/python3.6/1.3.0

To check what is currently loaded:

module list

To remove all packages:

module purge

To get helpful information about the package:

module show torch/gnu/20170504

Will print something like

--------------------------------------------------------------------------------------------------------------------------------------------------
   /share/apps/modulefiles/torch/gnu/20170504.lua:
--------------------------------------------------------------------------------------------------------------------------------------------------
whatis("Torch: a scientific computing framework with wide support for machine learning algorithms that puts GPUs first")
whatis("Name: torch version: 20170504 compilers: gnu")
load("cmake/intel/3.7.1")
load("cuda/8.0.44")
load("cudnn/8.0v5.1")
load("magma/intel/2.2.0")
...

load(...) are the dependencies that are also loaded when you load a package.

Interactive Mode: Request CPU

You can submit batch jobs in prince to schedule jobs. This requires to write custom bash scripts. Batch jobs are great for longer jobs, and you can also run in interactive mode, which is great for short jobs and troubleshooting.

To run in interactive mode:

[NYUNetID@log-0 ~]$ srun --pty /bin/bash

This will run the default mode: a single CPU core and 2GB memory for 1 hour.

To request more CPU's:

[NYUNetID@log-0 ~]$ srun -n4 -t2:00:00 --mem=4000 --pty /bin/bash
[NYUNetID@c26-16 ~]$ 

That will request 4 compute nodes for 2 hours with 4 Gb of memory.

To exit a request:

[NYUNetID@c26-16 ~]$ exit
[NYUNetID@log-0 ~]$

Interactive Mode: Request GPU

[NYUNetID@log-0 ~]$ srun --gres=gpu:1 --pty /bin/bash
[NYUNetID@gpu-25 ~]$ nvidia-smi
Mon Oct 23 17:49:19 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.48                 Driver Version: 367.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           On   | 0000:12:00.0     Off |                    0 |
| N/A   37C    P8    29W / 149W |      0MiB / 11439MiB |      0%   E. Process |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Submit a job

You can write a script that will be executed when the resources you requested became available.

A simple CPU demo:

## 1) Job settings

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=1
#SBATCH --time=5:00:00
#SBATCH --mem=2GB
#SBATCH --job-name=CPUDemo
#SBATCH --mail-type=END
#SBATCH [email protected]
#SBATCH --output=slurm_%j.out
  
## 2) Everything from here on is going to run:

cd /scratch/NYUNetID/demos
python demo.py

Request GPU:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --gres=gpu:4
#SBATCH --time=10:00:00
#SBATCH --mem=3GB
#SBATCH --job-name=GPUDemo
#SBATCH --mail-type=END
#SBATCH [email protected]
#SBATCH --output=slurm_%j.out

cd /scratch/NYUNetID/trainSomething
source activate ML
python train.py

Submit your job with:

sbatch myscript.s

Monitor the job:

squeue -u $USER

More info here

Setting up a tunnel

To copy data between your workstation and the NYU HPC clusters, you must set up and start an SSH tunnel.

What is a tunnel?

"A tunnel is a mechanism used to ship a foreign protocol across a network that normally wouldn't support it."1

  1. In your local computer root directory, and if you don't have it already, create a folder called /.shh:
mkdir ~/.ssh
  1. Set the permission to that folder:
chmod 700 ~/.ssh
  1. Inside that folder create a new file called config:
touch config
  1. Open that file in any text editor and add this:
# first we create the tunnel, with instructions to pass incoming
# packets on ports 8024, 8025 and 8026 through it and to specific
# locations
Host hpcgwtunnel
   HostName gw.hpc.nyu.edu
   ForwardX11 no
   LocalForward 8025 dumbo.hpc.nyu.edu:22
   LocalForward 8026 prince.hpc.nyu.edu:22
   User NetID 
# next we create an alias for incoming packets on the port. The
# alias corresponds to where the tunnel forwards these packets
Host dumbo
  HostName localhost
  Port 8025
  ForwardX11 yes
  User NetID

Host prince
  HostName localhost
  Port 8026
  ForwardX11 yes
  User NetID

Be sure to replace the NetID for your NYU NetId

Transfer Files

To copy data between your workstation and the NYU HPC clusters, you must set up and start an SSH tunnel. (See previous step)

  1. Create a tunnel
ssh hpcgwtunnel

Once executed you'll see something like this:

Last login: Wed Nov  8 12:15:48 2017 from 74.65.201.238
cv965@hpc-bastion1~>$

This will use the settings in /.ssh/config to create a tunnel. You need to leave this open when transfering files. Leave this terminal tab open and open a new tab to continue the process.

  1. Transfer files

Between your computer and the HPC

  • A File:
scp /Users/local/data.txt NYUNetID@prince:/scratch/NYUNetID/path/
  • A Folder:
scp -r /Users/local/path NYUNetID@prince:/scratch/NYUNetID/path/

Between the HPC and your computer

  • A File:
scp NYUNetID@prince:/scratch/NYUNetID/path/data.txt /Users/local/path/
  • A Folder:
scp -r NYUNetID@prince:/scratch/NYUNetID/path/data.txt /Users/local/path/ 

Screen

Create a ./.screenrc file and append this gist

More Repositories

1

Mappa

A canvas wrapper for Maps 🗺 🌍
JavaScript
359
star
2

scenescoop

A tool to describe the content of videos and suggest similar scenes in other videos/films.
Python
128
star
3

Selected_Stories

An experimental web text editor that runs a LSTM model while you write to suggest new lines
JavaScript
40
star
4

sequential-stories

Using Tensorflow's im2txt model to generate stories in an iOS app.
Objective-C
21
star
5

p5deeplearn

deeplearn.js meets p5
Jupyter Notebook
19
star
6

carbon

Watch local files for changes and share them with the world 🌎
JavaScript
13
star
7

runway_workshop_itpcamp

RunwayML workshop @ ITP Camp 2018
JavaScript
10
star
8

gpt2-slack-bot

A GPT-2 Slack Bot with RunwayML's hosted models.
JavaScript
8
star
9

Trade-Flow

Trade Flow | Visualize and Listen to Economic Trade Data.
JavaScript
8
star
10

deeplearn-chrome_extension

JavaScript
8
star
11

sidewalk_orchestra

An experimental musical app using pose estimation of a live sidewalk video stream.
JavaScript
7
star
12

git-cheatsheet

Git cheatsheet
7
star
13

sfpc

Machine Learning Literacy Workshop @SFPC
JavaScript
6
star
14

rwet

This repo contains work and assignments for the class Reading and Writing Electronic Text
JavaScript
6
star
15

psnotify

A small library that uses Twilio to send an SMS when a job finishes running in Paperspace.
JavaScript
5
star
16

ml5_KNN_example

ml5 KNN Example for Eyebeam
JavaScript
4
star
17

bode.ga

A 24-hour web documentary that captures the everyday interactions and transactions of a small bodega in Queens, NY.
JavaScript
3
star
18

lstm_training

Python
3
star
19

alt_docs

JavaScript
2
star
20

polybius

Save webpages and compare them over time
JavaScript
2
star
21

automating_video

Python
2
star
22

PGAN_Runway

Progressive Growing of GANs (PGAN) ported to Runway
Python
2
star
23

google-autocompleteme

Let Google complete you
Python
2
star
24

interactive-music

JavaScript
2
star
25

Traceroute-your-history

JavaScript
1
star
26

understanding-networks

JavaScript
1
star
27

live-web

JavaScript
1
star
28

usermanual

Generate a pdf file with a set of instructions detailing your daily computer activity in a series of steps
Jupyter Notebook
1
star
29

designexpo

C#
1
star
30

javascript_cheatsheet

Simple Cheatsheet for .js
JavaScript
1
star
31

Nutrition-Facts

Nutrition Fact, a Chrome Extension: get to know the content of the websites you visit.
JavaScript
1
star
32

satellite-alphabet

Satellite based typography
JavaScript
1
star
33

Data-and-Digital-Mapping

This repo contains work and assignments for the class Everything is Spatial: Data and Digital Mapping taught by Mimi Onuoha @ ITP NYU Spring 2017.
JavaScript
1
star