• Stars
    star
    114
  • Rank 308,031 (Top 7 %)
  • Language
    Python
  • Created over 6 years ago
  • Updated over 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

AdamW optimizer for Keras

Fixing Weight Decay Regularization in Adam - For Keras ⚑ πŸ˜ƒ

Implementation of the AdamW optimizer(Ilya Loshchilov, Frank Hutter) for Keras.

Tested on this system

  • python 3.6
  • Keras 2.1.6
  • tensorflow(-gpu) 1.8.0

Usage

Additionally to a usual Keras setup for neural nets building (see Keras for details)

from AdamW import AdamW

adamw = AdamW(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0., weight_decay=0.025, batch_size=1, samples_per_epoch=1, epochs=1)

Then nothing change compared to the usual usage of an optimizer in Keras after the definition of a model's architecture

model = Sequential()
<definition of the model_architecture>
model.compile(loss="mse", optimizer=adamw, metrics=[metrics.mse], ...)

Note that the size of a batch (batch_size), number of training samples per epoch (samples_per_epoch) and the number of epochs (epochs) are necessary to the normalization of the weight decay (paper, Section 4)

Done

  • Weight decay added to the parameters optimization
  • Normalized weight decay added

To be done (eventually - help is welcome)

  • Cosine annealing
  • Warm restarts

Source

ADAM: A METHOD FOR STOCHASTIC OPTIMIZATION, D.P. Kingma, J. Lei Ba

Fixing Weight Decay Regularization in Adam, I. Loshchilov, F. Hutter