Distributed Deep Learning Reads
Compilation of literature related to distributed deep learning. Pull requests welcome :)
- 100-epoch ImageNet Training with AlexNet in 24 Minutes
- Accumulated Gradient Normalization
- Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
- Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization
- Asynchrony begets Momentum, with an Application to Deep Learning
- Bandwidth Optimal All-reduce Algorithms for Clusters of Workstations
- Bringing HPC Techniques to Deep Learning
- Deep learning with Elastic Averaging SGD
- Distributed Delayed Stochastic Optimization
- Don't Decay the Learning Rate, Increase the Batch Size
- FireCaffe: near-linear acceleration of deep neural network training on compute clusters
- Heterogeneity-aware Distributed Parameter Servers
- Hogwild!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent
- How to scale distributed deep learning?
- ImageNet Training in Minutes
- Joeri Hermans ADAG Blog
- Large Scale Distributed Deep Networks
- Meet Horovod: Uberโs Open Source Distributed Deep Learning Framework for TensorFlow
- More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server
- Omnivore: An Optimizer for Multi-device Deep Learning on CPUs and GPUs
- On Parallelizability of Stochastic Gradient Descent for Speech DNNs
- On Scalable Deep Learning and Parallelizing Gradient Descent
- One weird trick for parallelizing convolutional neural networks
- Parallel training of DNNs with Natural Gradient and Parameter Averaging
- Poseidon: A System Architecture for Efficient GPU-based Deep Learning on Multiple Machines
- PowerAI DDL
- Revisiting Distributed Synchronous SGD
- Scalable Distributed DNN Training Using Commodity GPU Cloud Computing
- SparkNet: Training Deep Networks in Spark
- Staleness-aware Async-SGD for Distributed Deep Learning
- TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning