Giter VIP home page Giter VIP logo

kfac-pytorch's Introduction

K-FAC_pytorch

Pytorch implementation of K-FAC and E-KFAC. (Only support single-GPU training, need modifications for multi-GPU.)

Requiresments

pytorch 0.4.0
torchvision
python 3.6.0
tqdm
tensorboardX
tensorflow

How to run

python main.py --dataset cifar10 --optimizer kfac --network vgg16_bn  --epoch 100 --milestone 40,80 --learning_rate 0.01 --damping 0.03 --weight_decay 0.003

Performance

Note: for better hyparameters of K-FAC, please refer to weight_decay repo. (The hyparameters below are not good enough! Especially the weight decay is too small!)

For K-FAC and E-KFAC, the search range of learning rates, weight decay and dampings are:
(1) learning rate = [3e-2, 1e-2, 3e-3]
(2) weight decay = [1e-2, 3e-3, 1e-3, 3e-4, 1e-4]
(3) damping = [3e-2, 1e-3, 3e-3]

For SGD:
(1) learning rate = [3e-1, 1e-1, 3e-2]
(2) weight decay = [1e-2, 3e-3, 1e-3, 3e-4, 1e-4]

CIFAR10

Optimizer Model Acc. learning rate weight decay damping
KFAC VGG16_BN 93.86% 0.01 0.003 0.03
E-KFAC VGG16_BN 94.00% 0.003 0.01 0.03
SGD VGG16_BN 94.03% 0.03 0.001 -
KFAC ResNet110 93.59% 0.01 0.003 0.03
E-KFAC ResNet110 93.37% 0.003 0.01 0.03
SGD ResNet110 94.14% 0.03 0.001 -

CIFAR100

Optimizer Model Acc. learning rate weight decay damping
KFAC VGG16_BN 74.09% 0.003 0.01 0.03
E-KFAC VGG16_BN 73.20% 0.01 0.01 0.03
SGD VGG16_BN 74.56% 0.03 0.003 -
KFAC ResNet110 72.71% 0.003 0.01 0.003
E-KFAC ResNet110 72.32% 0.03 0.001 0.03
SGD ResNet110 72.60% 0.1 0.0003 -

Others

Please consider cite the following papers for K-FAC:

@inproceedings{martens2015optimizing,
  title={Optimizing neural networks with kronecker-factored approximate curvature},
  author={Martens, James and Grosse, Roger},
  booktitle={International conference on machine learning},
  pages={2408--2417},
  year={2015}
}

@inproceedings{grosse2016kronecker,
  title={A kronecker-factored approximate fisher matrix for convolution layers},
  author={Grosse, Roger and Martens, James},
  booktitle={International Conference on Machine Learning},
  pages={573--582},
  year={2016}
}

and for E-KFAC:

@inproceedings{george2018fast,
  title={Fast Approximate Natural Gradient Descent in a Kronecker Factored Eigenbasis},
  author={George, Thomas and Laurent, C{\'e}sar and Bouthillier, Xavier and Ballas, Nicolas and Vincent, Pascal},
  booktitle={Advances in Neural Information Processing Systems},
  pages={9550--9560},
  year={2018}
}

If you have any questions or suggestions, please feel free to contact me via alecwangcq at gmail , com!

kfac-pytorch's People

Contributors

alecwangcq avatar p-wol avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.