AMSGrad-Tensorflow
Simple Tensorflow implementation of On the Convergence of Adam and Beyond
Hyperparameter
- For the default hyperparameter, we set it to the best value in the experiment
learning_rate
= 0.01beta1
= 0.9beta2
= 0.99- Depending on which network you are using, performance may be good at
beta2 = 0.99 (default)
Usage
from AMSGrad import AMSGrad
train_op = AMSGrad(learning_rate=0.01, beta1=0.9, beta2=0.99, epsilon=1e-8).minimize(loss)
Network Architecture
x = fully_connected(inputs=images, units=100)
x = relu(x)
logits = fully_connected(inputs=x, units=10)
Mnist Result (iteration = 30K)
lr=0.1, beta1=0.9, beta2=various
lr=0.01, beta1=0.9, beta2=various
Reference
Author
Junho Kim