Deep learning is a sub-field of machine learning that uses large multi-layer artificial neural networks (referred to as networks henceforth) as the main feature extractor and inference. What differentiates deep learning from the earlier applications of multi-layer networks is the exceptionally large number of layers of the applied network architectures. Deep learning based solutions consist of three main development phases, namely, model design or selection, model training and inference. Model design or selection phase is where an artificial neural network architecture is either designed from scratch or selected from a set of proven architectures that have been applied at the similar problem domain and known to perform well. Once a model architecture is decided, problem specific data is used to train this model to perform well with the problem at hand. Finally, in the inference phase, the trained network is applied to the problem at hand. The training phase at its core consist of two stages, namely, loss function design or selection and network parameter optimization. Loss function design or selection is mathematically defining what your chosen network is expected to do well. Once a loss function is decided, network parameter optimization stage uses some optimization technique (such as back-propagation) to obtain a set of network parameters that minimize the chosen loss function for the available problem specific data. Putting it in its simplest form, we have invented a new way of training deep networks that improves inference accuracy on test data. This new training approach is applicable to various types of deep neural network architectures. To use the Deep Optimizer Framework, the users do not need to change anything in their network design. The users design their own network architecture without any structural or optimization limitations. They simply activate the Deep Optimizer Framework to get better test accuracies for their deep learning applications. Any regularizer and any loss function can be used. In fact, Deep Optimizer Framework is invisible to the user, it only changes the training mechanism for better test accuracy. The proposed approach is NOT - A new loss function such as Hinge loss - A new optimization technique such as Adam optimizer - A new data augmentation technique such as affine image warps, adding noise or GAN based data creation - A network structure modification such as residual blocks as used in ResNet or random dropouts for regularization. - A way of network coefficient modification such as increasing/decreasing coefficient bit-depths or zeroing our selected network parameters.