mnist-distributed-problem
This is a super small repository demonstrating a Problem with DistributedDataParallel. A three layer feed forward neural network is trained with MNIST with and without data parallel with the same hyper parameters. If you configure DistributedDataParalell to use only one node, the model is quite worse in accuracy. If you have any suggestions how to make them equal beside tuning the learning rate please commend or send PR!