• Stars
    star
    201
  • Rank 193,288 (Top 4 %)
  • Language
    Jupyter Notebook
  • License
    Apache License 2.0
  • Created over 4 years ago
  • Updated over 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Collection of the latest, greatest, deep learning optimizers (for Pytorch) - CNN, NLP suitable

Best-Deep-Learning-Optimizers

Collection of the latest, greatest, deep learning optimizers (for Pytorch) - CNN, Transformer, NLP suitable

Current top performers = Have not run benchmarks lately and a lot has changed. Quick recommendations = transformer or CNN = madgrad / adahessian. For CNN only, Ranger.

Updates -

April 2021: Meet Madgrad!
Have added Madgrad with an improvement to weight decay. Madgrad is a new optimizer released by FB AI in February. In testing with transformers for image classification, madgrad blew away the various Adam variants.
However, as spotted by @nestordemeure, the weight decay impl was like adam instead of adamW.
In testing, AdamW style weight decay was the winner and thus the implementation here is with my modification to use AdamW style wd.

Recommendations: test with
a)no weight decay, recommended by Madgrad authors and
b)weight decay at same level you would use for AdamW with this madgrad_wd version.
Important: madgrad is very different than Adam variants...thus recommend you start with madgrad default lr and do quick range of lr tests. Do not just use what worked for you on your dataset with Adam(sh) lr.

Modified madgrad is here: https://github.com/lessw2020/Best-Deep-Learning-Optimizers/tree/master/madgrad

And original madgrad is here: https://github.com/facebookresearch/madgrad

Pending work = there is a new paper discussing Stable Weight Decay as being the ultimate weight decay. Planning to implement and test with madgrad soon.

August 2020 - AdaHessian, the first 'it really works and works really well' second order optimizer added: I tested AdaHessian last month on work datasets and it performed extremely well. It's like training with a guided missile compared to most other optimizers. The big caveat is you will need about 2x the normal GPU memory to run it vs running with a 'first order' optimizer. I am trying to get a Titan GPU with 24GB GPU memory just for this purpose atm.

new version of Ranger with highest accuracy to date for all optimizers tested: April 11 - New version of Ranger released (20.4.11), highest score for accuracy to date.

Ranger has been upgraded to use Gradient Centralization. See: https://arxiv.org/abs/2004.01461 and github: https://github.com/Yonghongwei/Gradient-Centralization

It will now use GC by default, and run it for both conv layers and fc layers. You can turn it on or off with "use_gc" at init to test out the difference on your datasets. (image from gc github).

The summary of gradient centralization: "GC can be viewed as a projected gradient descent method with a constrained loss function. The Lipschitzness of the constrained loss function and its gradient is better so that the training process becomes more efficient and stable."

Note - for optimal accuracy, make sure you use run with a flat lr for some time and then cosine descent the lr (72% - 28% descent), or if you don't have an lr framework... very comparable results by running at one rate for 75%, then stop and decrease lr, and run remaining 28%.

Usage - GC on by default but you can control all aspects at init:


Ranger will print settings at first init so you can confirm optimization is set the way you want it:


Future work: MARTHE, HyperAdam and other optimizers will be tested and posted if they look good.


12/27 - added DiffGrad, and unofficial version 1 support (coded from the paper).
12/28 - added Diff_RGrad = diffGrad + Rectified Adam to start off....seems to work quite well.

Medium article (summary and FastAI example usage): https://medium.com/@lessw/meet-diffgrad-new-deep-learning-optimizer-that-solves-adams-overshoot-issue-ec63e28e01b2

Official diffGrad paper: https://arxiv.org/abs/1909.11015v2

12/31 - AdaMod and DiffMod added. Initial SLS files added (but more work needed).

In Progress:

A - Parabolic Approximation Line Search: https://arxiv.org/abs/1903.11991v2

B - Stochastic Line Search (SLS): pending (needs param group support)

c - AvaGrad

General papers of relevance:

Does Adam stick close to the optimal point? https://arxiv.org/abs/1911.00289v1

Probabalistic line searches for stochastic optimization (2017, matlab only but good theory work): https://arxiv.org/abs/1703.10034v2

More Repositories

1

Ranger-Deep-Learning-Optimizer

Ranger - a synergistic optimizer using RAdam (Rectified Adam), Gradient Centralization and LookAhead in one codebase
Python
1,154
star
2

Ranger21

Ranger deep learning optimizer rewrite to use newest components
Python
320
star
3

mish

Mish Deep Learning Activation Function for PyTorch / FastAI
Jupyter Notebook
161
star
4

res2net-plus

Res2Net architecture with improved stem and Mish activation function
Python
136
star
5

Ranger-Mish-ImageWoof-5

Repo to build on / reproduce the record breaking Ranger-Mish-SelfAttention setup on FastAI ImageWoof dataset 5 epochs
Jupyter Notebook
116
star
6

training-detr

Unofficial Colab on how to train DETR, the intelligent object detector, with your own dataset. DETR = Detection Transformer
Jupyter Notebook
40
star
7

transformer_central

Various transformers for FSDP research
Jupyter Notebook
30
star
8

Ranger22

Testing various improvements to Ranger21 for 2022
Python
18
star
9

mrnet-fastai

Deep Learning CNN using FastAI for the Stanford MRNet Knee MRI diagnosis challenge
Jupyter Notebook
16
star
10

FAdam_PyTorch

an implementation of FAdam (Fisher Adam) in PyTorch
Python
16
star
11

Thunder-Detr

(unofficial) - customized fork of DETR, optimized for intelligent obj detection on 'real world' custom datasets
Jupyter Notebook
12
star
12

triton_kernels_for_fun_and_profit

Custom kernels in Triton language for accelerating LLMs
Python
8
star
13

fsdp_llm

FSDP optimizations for LLM training
Python
6
star
14

t5_11

housing our model example of fine tuning an 11B t5 with FSDP
Python
6
star
15

transformer_framework

framework for plug and play of various transformers (vision and nlp) with FSDP
Python
6
star
16

FTSwishPlus

FTSwish with mean shifting added to increase performance
Python
6
star
17

LightRelu

Customized PyTorch implementation of LiSHT (linear scaled hyperbolic tangent) activation function for deep learning
Python
5
star
18

hyper_efficient_optimizers

Development of hyper efficient optimizers that can match/exceed AdamW, while using reduced memory
Python
5
star
19

fsdp_review

Some eval and profile routines for fsdp
4
star
20

auto-adaptive-ai

auto adaptive framework for intrinsic hyperparameter selection, adaptive padding, normalized weights
Jupyter Notebook
4
star
21

TRelu

An improved activation function for deep learning - Threshold Relu, or TRelu
Python
4
star
22

sigma_reparam

Sigma Reparam for Transformers (based on Apple's paper)
Python
3
star
23

EfficientNet-PyTorch

Unofficial port of Google's new EfficientNet to Pytorch and FastAI
Jupyter Notebook
3
star
24

RangerQH-Testing

Repo for running RangerQH + Res2NetPLus with LIP Pooling
Jupyter Notebook
3
star
25

facial-keypoint-detection

Facial keypoint detection CNN - custom architecture using partial convolution padding
Jupyter Notebook
3
star
26

AutoOpt-for-FastAI

Integrate Ebay's AutoOpt Deep Learning Optimizer into the FastAI framework
3
star
27

skycraft2

Minecraft in the sky, written in Python
Python
2
star
28

perception_tools

additional utils for working with Unity perception package
Jupyter Notebook
2
star
29

QuantFour_AdamW_Cuda

Fused 4bit AdamW in Cuda
Python
2
star
30

PolarBearLLM

testing new TransFormer, MoE, and TransNormer features
Python
2
star
31

unet-seg

Jupyter Notebook
2
star
32

FTSwish

Flattened Threshold Swish Activation function - PyTorch implementation
Python
2
star
33

coordinate_clipped_Optimizers

coordinate wise clipped Optimizers in PyTorch
Python
2
star
34

snowfall

helpful image handling utils - abstracts various file and opencv and pil features into result oriented functions
Python
2
star
35

style-transfer-vgg

Artistic Style transfer using VGG19
Jupyter Notebook
2
star
36

cuda-kernel-dev

in progress cuda kernels
Cuda
2
star
37

Curriculum-Learning-Dropout

Implementation of Curriculum Learning Dropout for FastAI framework
Jupyter Notebook
2
star
38

medExam

Training an AI with FSDP to take the US medical exam
1
star
39

5D-Compiler

Auto-Parallelization Compiler using 4D Parallel + Checkpointing (5D)
Python
1
star
40

aot_fsdp

When AOT Autograd meets FSDP = large models train faster
1
star
41

alibi_positional_embeddings

Alibi in PyTorch
Python
1
star
42

optimal-lr-finder

Automated optimal learning rate finder for PyTorch deep learning with FastAI
1
star
43

ft_linen

experiments with flax re-design to interop with pytorch
Python
1
star
44

linear-graph-slam

Linear Graph SLAM
Jupyter Notebook
1
star
45

bfloat_optimizer

Pure bfloat AdamW+ tweaks
Python
1
star
46

snake-id

FastAI deep learning classifier for snakes
1
star
47

Thunder

AI framework for flexible training and results review (pytorch, vision and tabular)
1
star
48

t5_finetuning

T5 and ExT5 fine tuning
Jupyter Notebook
1
star
49

pretrainer

FSDP codebase for pretraining large language models (LLM)
Python
1
star
50

Fusion

Advanced yet low code framework for fully sharded distributed training
Python
1
star
51

hsdp_demo

Tutorial repo for PyTorch FSDP running HSDP on single node.
Python
1
star
52

image-captioning-cnn-lstm

Image captioning system combining CNN + LSTM for caption generation
Jupyter Notebook
1
star
53

self-tuning-ai

implementation of self tuning networks in pytorch, based on https://arxiv.org/pdf/1903.03088v1.pdf
1
star
54

triton_flashv2_alibi

working repo for Triton based Flash2 supporting alibi pos embeddings
Python
1
star
55

Pytorch_train_test_split

Function to randomize and split training data into train/test, from same directory
Python
1
star