Multi-GPU Training with PyTorch and TensorFlow
About
This workshop provides demostrations of multi-GPU training for PyTorch Distributed Data Parallel (DDP) and PyTorch Lightning. Multi-GPU training in TensorFlow is demonstrated using MirroredStrategy
.
Setup
Make sure you can run Python on Adroit:
$ ssh <YourNetID>@adroit.princeton.edu # VPN required if off-campus
$ git clone https://github.com/PrincetonUniversity/multi_gpu_training.git
$ cd multi_gpu_training
$ module load anaconda3/2021.11
(base) $ python --version
Python 3.9.7
Getting Help
If you encounter any difficulties with the material in this guide then please send an email to [email protected] or attend a help session.
Authorship
This guide was created by Jonathan Halverson and members of PICSciE and Research Computing.