K-FAC_pytorch
Pytorch implementation of K-FAC and E-KFAC. (Only support single-GPU training, need modifications for multi-GPU.)
Requiresments
pytorch 0.4.0
torchvision
python 3.6.0
tqdm
tensorboardX
tensorflow
How to run
python main.py --dataset cifar10 --optimizer kfac --network vgg16_bn --epoch 100 --milestone 40,80 --learning_rate 0.01 --damping 0.03 --weight_decay 0.003
Performance
weight_decay repo. (The hyparameters below are not good enough! Especially the weight decay is too small!)
Note: for better hyparameters of K-FAC, please refer toFor K-FAC and E-KFAC, the search range of learning rates, weight decay and dampings are:
(1) learning rate = [3e-2, 1e-2, 3e-3]
(2) weight decay = [1e-2, 3e-3, 1e-3, 3e-4, 1e-4]
(3) damping = [3e-2, 1e-3, 3e-3]
For SGD:
(1) learning rate = [3e-1, 1e-1, 3e-2]
(2) weight decay = [1e-2, 3e-3, 1e-3, 3e-4, 1e-4]
CIFAR10
Optimizer | Model | Acc. | learning rate | weight decay | damping |
---|---|---|---|---|---|
KFAC | VGG16_BN | 93.86% | 0.01 | 0.003 | 0.03 |
E-KFAC | VGG16_BN | 94.00% | 0.003 | 0.01 | 0.03 |
SGD | VGG16_BN | 94.03% | 0.03 | 0.001 | - |
KFAC | ResNet110 | 93.59% | 0.01 | 0.003 | 0.03 |
E-KFAC | ResNet110 | 93.37% | 0.003 | 0.01 | 0.03 |
SGD | ResNet110 | 94.14% | 0.03 | 0.001 | - |
CIFAR100
Optimizer | Model | Acc. | learning rate | weight decay | damping |
---|---|---|---|---|---|
KFAC | VGG16_BN | 74.09% | 0.003 | 0.01 | 0.03 |
E-KFAC | VGG16_BN | 73.20% | 0.01 | 0.01 | 0.03 |
SGD | VGG16_BN | 74.56% | 0.03 | 0.003 | - |
KFAC | ResNet110 | 72.71% | 0.003 | 0.01 | 0.003 |
E-KFAC | ResNet110 | 72.32% | 0.03 | 0.001 | 0.03 |
SGD | ResNet110 | 72.60% | 0.1 | 0.0003 | - |
Others
Please consider cite the following papers for K-FAC:
@inproceedings{martens2015optimizing,
title={Optimizing neural networks with kronecker-factored approximate curvature},
author={Martens, James and Grosse, Roger},
booktitle={International conference on machine learning},
pages={2408--2417},
year={2015}
}
@inproceedings{grosse2016kronecker,
title={A kronecker-factored approximate fisher matrix for convolution layers},
author={Grosse, Roger and Martens, James},
booktitle={International Conference on Machine Learning},
pages={573--582},
year={2016}
}
and for E-KFAC:
@inproceedings{george2018fast,
title={Fast Approximate Natural Gradient Descent in a Kronecker Factored Eigenbasis},
author={George, Thomas and Laurent, C{\'e}sar and Bouthillier, Xavier and Ballas, Nicolas and Vincent, Pascal},
booktitle={Advances in Neural Information Processing Systems},
pages={9550--9560},
year={2018}
}
If you have any questions or suggestions, please feel free to contact me via alecwangcq at gmail , com!