• Stars
    star
    735
  • Rank 61,652 (Top 2 %)
  • Language
    Jupyter Notebook
  • Created almost 9 years ago
  • Updated over 7 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Evaluation of the CNN design choices performance on ImageNet-2012.

Welcome to evaluation of CNN design choises performance on ImageNet-2012. Here you can find prototxt's of tested nets and full train logs.

upd.: Here is technical report version of this benchmark

If you use results from this benchmark, please cite

@Article{CaffeNetBench2017,
  Title                    = {Systematic evaluation of convolution neural network advances on the Imagenet },
  Author                   = {Dmytro Mishkin and Nikolay Sergievskiy and Jiri Matas},
  Journal                  = {Computer Vision and Image Understanding },
  Year                     = {2017},
  Doi                      = {https://doi.org/10.1016/j.cviu.2017.05.007},
  ISSN                     = {1077-3142},
  Keywords                 = {CNN},
  Url                      = {http://www.sciencedirect.com/science/article/pii/S1077314217300814}
}

**upd2.: Some of the pretrained models are in Releases section. They are licensed for unrestricted use.

***upd3.: Nice paper on noise sensitiveness: Fine-grained Recognition in the Noisy Wild: Sensitivity Analysis of Convolutional Neural Networks Approaches

The basic architecture is similar to CaffeNet, but has several differences:

  1. Images are resized to small side = 128 for speed reasons. Therefore pool5 spatial size is 3x3 instead of 6x6.
  2. fc6 and fc7 layers have 2048 neurons instead of 4096.
  3. Networks are initialized with LSUV-init (code)
  4. Because LRN layers add nothing to accuracy (validated here), they were removed for speed reasons in most experiments.

Taking into account Neural Network Training Variations in Speech and Subsequent Performance Evaluation, results can vary from run to run (data order is the same, but random seeds are different). However, I haven`t experienced results difference for several CaffeNet-ReLU training runs.

On-going evaluations with graphs:

Activations

Name Accuracy LogLoss Comments
ReLU 0.470 2.36 With LRN layers
ReLU 0.471 2.36 No LRN, as in rest
TanH 0.401 2.78
1.73TanH(2x/3) 0.423 2.66 As recommended in Efficient BackProp, LeCun98
ArcSinH 0.417 2.71
VLReLU 0.469 2.40 y=max(x,x/3)
RReLU 0.478 2.32
Maxout 0.482 2.30 sqrt(2) narrower layers, 2 pieces. Same complexity, as for ReLU
Maxout 0.517 2.12 same width layers, 2 pieces
PReLU 0.485 2.29
ELU 0.488 2.28 alpha=1, as in paper
ELU 0.485 2.29 alpha=0.5
(ELU+LReLU) / 2 0.486 2.28 alpha=1, slope=0.05
SELU = Scaled ELU 0.470 2.38 1.05070 * ELU(x,alpha = 1.6732)
FReLU = ReLU + (learned) bias 0.488 2.27
[FELU = ELU + (learned) bias] 0.489 2.28
Shifted Softplus 0.486 2.29 Shifted BNLL aka softplus, y = log(1 + exp(x)) - log(2). Same as ELU, as expected
No, with max pooling 0.389 2.93 No non-linearity
No, no max pooling 0.035 6.28 No non-linearity, strided convolution
APL2 0.471 2.38 2 linear pieces. Unlike other activations, current author`s implementation leads to different parameters for each x,y position of neuron
APL5 0.465 2.39 5 linear pieces. Unlike other activations, current author`s implementation leads to different parameters for each x,y position of neuron
ConvReLU,FCMaxout2 0.490 2.26 ReLU in convolution, Maxout (sqrt(2) narrower) 2 pieces in FC. Inspired by kaggle and INVESTIGATION OF MAXOUT NETWORKS FOR SPEECH RECOGNITION*
ConvELU,FCMaxout2 0.499 2.22 ELU in convolution, Maxout (sqrt(2) narrower) 2 pieces in FC.

The above analyses show that the bottom layers seem to waste a large portion of the additional parametrisation (figure 2 (a,e)) thus could be replaced, for example, by smaller ReLU layers. Similarly, maxout units in higher layers seem to use piecewise-linear components in a more active way suggesting the use of larger pools._

Prototxt, logs

Pooling type

Name Accuracy LogLoss Comments
MaxPool 0.471 2.36
Stochastic 0.438 2.54 Underfitting, may be try without Dropout
Stochastic, no dropout 0.429 2.96 Stoch pool does not prevent overfitting without dropout :(. Good start,bad finish
AvgPool 0.435 2.56
Max+AvgPool 0.483 2.29 Element-wise sum
NoPool 0.472 2.35 Strided conv2,conv3,conv4
General - - Depends on arch, click for details

Pooling window/stride

Name Accuracy LogLoss Comments
MaxPool 3x3/2 0.471 2.36 default alexnet
MaxPool 2x2/2 0.484 2.29 Leads to larger feature map, Pool5=4x4 instead of 3x3
MaxPool 3x3/2 pad1 0.488 2.25 Leads to even larger feature map, Pool5=5x5 instead of 3x3

Prototxt, logs

CLF architecture

Name Accuracy LogLoss Comments
Default ReLU 0.470 2.36 fc6 = conv 3x3x2048 -> fc7 2048 -> 1000 fc8
Conv5-fc6=2048C3_2048C1_clf_avg 0.494 2.34 no pool5 -> fc6 = conv 3x3x2048 -> fc7=conv 1x1x2048 -> fc8 as 1x1 conv -> ave_pool.
Pool5-fc6=2048C3_2048C1_avg_clf 0.489 2.28 no pool5 -> fc6 = conv 3x3x2048 -> fc7=conv 1x1x2048 -> ave_pool -> fc8
SPP2-FC-FC 0.471 2.36 pool5 = SPP with 2 levels (2x2 and 1x1) -> FC6 -> FC7
SPP3-FC-FC 0.483 2.30 pool5 = SPP with 3 levels (3x3 and 2x2 and 1x1) -> FC6 -> FC7
fc6=512C3_1024C3_1536C1 0.482 2.52 pool5 zero pad -> fc6 = conv 3x3x512 -> fc7=conv 3x3x1024 -> 1x1x1536 -> fc8 as 1x1 conv -> ave_pool.
fc6=512C3_1024C3_1536C1_drop 0.491 2.29 pool5 zero pad -> fc6 = conv 3x3x512 -> fc7=conv 3x3x1024 -> drop 0.3 -> 1x1x1536 -> drop 0.5-> fc8 as 1x1 conv -> ave_pool.
Default ReLU, 4096 0.497 2.24 fc6 = conv 3x3x4096 -> fc7 4096 -> 1000 fc8 == original caffenet

pool5pad following nets mistakenly were trained with ELU non-linearity instead of default ReLU

Name Accuracy LogLoss Comments
Default ELU 0.488 2.28 fc6 = conv 3x3x2048 -> fc7 2048 -> 1000 fc8
pool5pad_fc6ave 0.481 2.32 pool5 zero pad -> fc6 = conv 3x3x2048 -> AvePool -> as usual
pool5pad_fc6ave_fc7as1x1fc8ave 0.511 2.21 pool5 zero pad -> fc6 = conv 3x3x2048 -> fc7 as 1x1 conv -> fc8 as 1x1 conv -> ave_pool.
pool5pad_fc6ave_fc7as1x1avefc8 0.508 2.22 pool5 zero pad -> fc6 = conv 3x3x2048 -> fc7 as 1x1 conv -> ave_pool -> fc8
pool5pad_fc6ave_fc7as1x1_avemax_fc8 0.509 2.19 pool5 zero pad -> fc6 = conv 3x3x2048 -> fc7 as 1x1 conv -> fc8 as 1x1 conv -> ave_pool + max_pool.

Prototxt, logs

Conv1 parameters

Name Accuracy LogLoss Comments
Default, 128_K11_S4 0.471 2.36 Input size =128x128px, conv1 = 11x11x96, stride = 4
224_K11_S8 0.453 2.45 Input size =256x256px, conv1 = 11x11x96, stride = 8. Not finished yet
160_K11_S5 0.470 2.35 Input size =160x160px, conv1 = 11x11x96, stride = 5
96_K7_S3 0.459 2.43 Input size =96x96px, conv1 = 7x7x96, stride = 3
64_K5_S2 0.445 2.50 Input size =64x64px, conv1 = 5x5x96, stride = 2
32_K3_S1 0.390 2.84 Input size =32x32px, conv1 = 3x3x96, stride = 1
4x slower, 227_K11_S4 0.565 1.87 Input size = 227x227px, conv1 = 11x11x96, stride = 4, Not finished yet

prototxt, logs

Squeezing representation

For example, for using activations in image retrieval.

Name Accuracy LogLoss Comments
pool5pad_fc6ave_fc7as1x1fc8ave 0.508 2.22 Baseline. pool5 zero pad -> fc6 = conv 3x3x2048 -> fc7 as 1x1 conv -> ave_pool -> fc8 as 1x1 conv.
pool5pad_fc6ave_fc7as1x1=512_fc8ave 0.489 2.30 fc7 as 1x1 conv = 512
pool5pad_fc6ave_fc7as1x1_bottleneck=512_fc8ave 0.490 2.28 fc7 as 1x1 conv = 2048 then fc7a = 512

Prototxt, logs

Solvers

Name Accuracy LogLoss Comments
SGD with momentum 0.471 2.36
Nesterov 0.473 2.34
RMSProp 0.327 3.20 rms_decay=0.9, delta=1.0
RMSProp 0.453 2.45 rms_decay=0.9, delta=1.0, base_lr: 0.045, stepsize=10K. gamma=0.94 (from here)
RMSProp 0.451 2.43 rms_decay=0.9, delta=1.0, base_lr: 0.1, stepsize=10K. gamma=0.94
RMSProp 0.472 2.36 rms_decay=0.9, delta=1.0, base_lr: 0.1, stepsize=5K. gamma=0.94
RMSProp 0.486 2.28 rms_decay=0.9, delta=1.0, lr=0.1, linear lr_policy
SGD with momentum, linear 0.493 2.24 linear lr_policy

Not converge at all:

ADAM: lr=0.001 m=0.9 m2=0.999 delta=1e-8 lr=0.001 m=0.95 m2=0.999 delta=1e-8 lr=0.001 m=0.95 m2=0.999 delta=1e-7 lr=0.01 m=0.9 m2=0.999 delta=1e-8 lr=0.01 m=0.9 m2=0.999 delta=1e-7 lr=0.01 m=0.9 m2=0.999 delta=1e-9 lr=0.01 m=0.9 m2=0.99 delta=1e-8 lr=0.01 m=0.9 m2=0.999 delta=1e-8 lr=0.01 m=0.95 m2=0.999 delta=1e-9

AdaDelta: delta: 1e-5

RMSProp, lr=0.01, rms_decay=0.5 lr=0.01, rms_decay=0.9 lr=0.01, rms_decay=0.95 lr=0.01, rms_decay=0.98 lr=0.001, rms_decay=0.9 lr=0.001, rms_decay=0.98

Converge, but much worse that SGD: Adagrad, lr=0.01, lr=0.02 AdaDelta: delta: 1e-6, delta: 1e-7, delta: 1e-8 RMSProp, lr=0.01, rms_decay=0.99

Prototxt, logs

LR-policy

Name Accuracy LogLoss Comments
Step 100K 0.471 2.36 Default caffenet solver, max_iter=320K
Poly lr, p=0.5, sqrt 0.483 2.29 bvlc_quick_googlenet_solver, All the way worse than "step", leading at finish
Poly lr, p=2.0, sqr 0.483 2.299
Poly lr, p=1.0, linear 0.493 2.24
Poly lr, p=1.0, linear 0.466 2.39 max_iter=160K
Exp, 0.035 0.441 2.53 max_iter=160K, stepsize=2K, gamma=0.915, same as in base_dereyly

LR-policy-BatchNorm-Dropout = 0.2

Name Accuracy LogLoss Comments
Step 100K 0.527 2.09 Default caffenet solver, max_iter=320K
Poly lr, p=1.0, linear 0.496 2.24 max_iter=105K,
Poly lr, p=1.0, start_lr=0.02 0.505 2.21 max_iter=105K
Exp, 0.035 0.506 2.19 max_iter=160K, stepsize=2K, gamma=0.915, same as in base_dereyly

Prototxt, logs

Regularization

Name Accuracy LogLoss Comments
default 0.471 2.36 weight_decay=0.0005, L2, fc-dropout=0.5
wd=0.0001 0.450 2.48 weight_decay=0.0001, L2, fc-dropout=0.5
wd=0.00001 0.450 2.48 weight_decay=0.00001, L2, fc-dropout=0.5
wd=0.00001_L1 0.453 2.45 weight_decay=0.00001, L1, fc-dropout=0.5
drop=0.3 0.497 2.25 weight_decay=0.0005, L2, fc-dropout=0.3
drop=0.2 0.494 2.28 weight_decay=0.0005, L2, fc-dropout=0.2
drop=0.1 0.473 2.45 weight_decay=0.0005, L2, fc-dropout=0.1. Same acc, as in 0.5, but bigger logloss

Prototxt, logs

Dropout and width

Hypothesis about "same effective neurons = same performance" looks unvalidated

Name Accuracy LogLoss Comments
fc6,fc7=2048, dropout=0.5 0.471 2.36 default
fc6,fc7=2048, dropout=0.3 0.497 2.25 best for fc6,fc7=2048. (1-0.3)*2048=1433 neurons work each time
fc6,fc7=4096, dropout=0.65 0.465 2.38 (1-0.65)*4096=1433 neurons work each time
fc6,fc7=6144, dropout=0.77 0.447 2.48 (1-0.77)*6144=1433 neurons work each time
fc6,fc7=4096, dropout=0.5 0.497 2.24
fc6,fc7=1433, dropout=0 0.456 2.52

Prototxt, logs

Architectures

CaffeNet only

Name Accuracy LogLoss Comments
CaffeNet256 0.565 1.87 Reference BVLC model, LSUV init
CaffeNet128 0.470 2.36 Pool5 = 3x3
CaffeNet128_4096 0.497 2.24 Pool5 = 3x3, fc6-fc7=4096
CaffeNet128All 0.530 2.05 All improvements without caffenet arch change: ELU + SPP + color_trans3-10-3 + Nesterov+ (AVE+MAX) Pool + linear lr_policy
+ 0.06 Gain over vanilla caffenet128. "Sum of gains" = 0.018 + 0.013 + 0.015 + 0.003 + 0.013 + 0.023 = 0.085
SqueezeNet128 0.530 2.08 Reference solver, but linear lr_policy and batch_size=256 (320K iters). WITHOUT tricks like ELU, SPP, AVE+MAX, etc.
SqueezeNet128 0.547 2.08 New SqueezeNet solver. WITHOUT tricks like ELU, SPP, AVE+MAX, etc.
SqueezeNet224 0.592 1.80 New SqueezeNet solver. WITHOUT tricks like ELU, SPP, AVE+MAX, etc., 2 GPU
CaffeNet256All 0.613 1.64 All improvements without caffenet arch change: ELU + SPP + color_trans3-10-3 + Nesterov+ (AVE+MAX) Pool + linear lr_policy
CaffeNet128, no pad 0.411 2.70 No padding, but conv1 stride=2 instead of 4 to keep size of pool5 the same
CaffeNet128, dropout in conv 0.426 2.60 Dropout before pool2=0.1, after conv3 = 0.1, after conv4 = 0.2
CaffeNet128SPP 0.483 2.30 SPP= 3x3 + 2x2 + 1x1
DarkNet128BN 0.502 2.25 16C3->MP2->32C3->MP2->64C3->MP2->128C3->MP2->256C3->MP2->512C3->MP2->1024C3->1000CLF.BN
+ PreLU + base_lr=0.035, exp lr_policy, 160K iters
NiN128 0.519 2.15 Step lr_policy. Be carefull to not use dropout on maxpool in-place

Others

Name Accuracy LogLoss Comments
DarkNetBN 0.502 2.25 16C3->MP2->32C3->MP2->64C3->MP2->128C3->MP2->256C3->MP2->512C3->MP2->1024C3->1000CLF.BN
HeNet2x2 0.561 1.88 No SPP, Pool5 = 3x3, VLReLU, J' from paper
HeNet3x1 0.560 1.88 No SPP, Pool5 = 3x3, VLReLU, J' from paper, 2x2->3x1
GoogLeNet128 0.619 1.61 linear lr_policy, batch_size=256. obviously slower than caffenet
[GoogLeNet128_BN_lim0606][https://github.com/lim0606/caffe-googlenet-bn] 0.645 1.54 BN before ReLU + scale bias, linear LR, batch_size = 128, base_lr = 0.005, 640K iter, LSUV init.!!!! 5x5 replaced by two 3x3, no in-place
GoogLeNet128Res 0.634 1.56 linear lr_policy, batch_size=256. Resudial connections between inception block. No BN
GoogLeNet128Res_color 0.638 1.52 linear lr_policy, batch_size=256. Resudial connections between inception block. No BN. + color_trans3-10-3
googlenet_loss2_clf 0.571 1.80 from net above, aux classifier after inception_4d
googlenet_loss1_clf 0.520 2.06 from net above, aux classifier after inception_4a
fitnet1_elu 0.333 3.21
VGGNet16_128 0.651 1.46 Surprisingly much better that GoogLeNet128, even with step-based solver.
VGGNet16_128_All 0.682 1.47 ELU (a=0.5. a=1 leads to divergence :( ), avg+max pool, color conversion, linear lr_policy

ResNet attempts are moved to ResNets.md

ResNets, good attempts

Name Accuracy LogLoss Comments
ResNet-50ELU-2xThinner 0.616 1.63 Without BN, ELU, dropout=0.2 before classifier. 2x thinner, than in paper. Quite fast. No large overfitting (unlike upper table)
GoogLeNet-128 0.619 1.61 For reference. linear lr_policy, batch_size=256.
GoogLeNet128Res 0.634 1.56 linear lr_policy, batch_size=256. Resudial connections between inception block. No BN
VggLikeResNet-50-ELU-RoR-var 0.626 1.59 Step LR policy, max_iter = 200K, no BN, 4x thinner than VGG, Residual on residual .
VggLikeResNet-50-ELU 0.632 1.57 Step LR policy, max_iter = 200K, no BN, 4x thinner than VGG. More RoR .
VggLikeResNet-50-ELU-RoR 1x5 0.628 1.58 Step LR policy, max_iter = 200K, no BN, 4x thinner than VGG. 1x5 layers
VggLikeResNet-50-ELU-RoR 1x3 0.631 1.58 Step LR policy, max_iter = 200K, no BN, 4x thinner than VGG .

Train augmentation

Name Accuracy LogLoss Comments
Default 0.471 2.36 Random flip, random crop 128x128 from 144xN, N > 144
Drop 0.1 0.306 3.56 + Input dropout 10%. not finished, 186K iters result
Multiscale 0.462 2.40 Random flip, random crop 128x128 from ( 144xN, - 50%, 188xN - 20%, 256xN - 20%, 130xN - 10%)
5 deg rot 0.448 2.47 Random rotation to [0..5] degrees.

Prototxt, logs

Colorspace

Name Accuracy LogLoss Comments
RGB 0.471 2.36 default, no changes. Input = 0.04 * (Img - [104, 117,124])
RGB_by_BN 0.469 2.38 Input = BatchNorm(Img)
CLAHE 0.467 2.38 RGB -> LAB -> CLAHE(L)->RGB->BatchNorm(RGB)
HISTEQ 0.448 2.48 RGB -> HiestEq
YCrCb 0.458 2.42 RGB->YCrCb->BatchNorm(YCrCb)
HSV 0.451 2.46 RGB->HSV->BatchNorm(HSV)
Lab - - Doesn`t leave 6.90 loss after 1.5K iters
RGB->10->3 TanH 0.463 2.40 RGB -> conv1x1x10 tanh -> conv1x1x3 tanh
RGB->10->3 VlReLU 0.485 2.28 RGB -> conv1x1x10 vlrelu -> conv1x1x3 vlrelu
RGB->10->3 Maxout 0.488 2.26 RGB -> conv1x1x10 maxout(2) -> conv1x1x3 maxout(2)
RGB->16->3 VlReLU 0.483 2.30 RGB -> conv1x1x16 vlrelu -> conv1x1x3 vlrelu
RGB->3->3 VlReLU 0.480 2.32 RGB -> conv1x1x3 vlrelu -> conv1x1x3 vlrelu
RGB->10->3 VlReLU->sum(RGB) 0.482 2.30 RGB -> conv1x1x10 vlrelu -> conv1x1x3 -> sum(RGB) ->vlrelu
RGB and log(RGB)->10->3 VlReLU 0.482 2.29 RGB and log (RGB) -> conv1x1x10 vlrelu -> conv1x1x3 vlrelu
RGB and log(RGB) and log (256-RGB)->10->3 VlReLU 0.484 2.29 RGB and log (RGB) and log (256 - RGB) -> conv1x1x10 vlrelu -> conv1x1x3 vlrelu
NN-Scale 0.467 2.38 Nearest neightbor instead of linear interpolation for rescale. Faster, but worse :(
concat_rgb_each_pool 0.441 2.51 Concat avepoolRGB with each pool
OpenCV RGB2Gray 0.413 2.70 RGB->Grayscale Gray = 0.299 R + 0.587 G + 0.114 B
Learned RGB2Gray 0.419 2.66 RGB->conv1x1x1. Gray = -1.779 *R + 6.511 * G + 1.493 *B + 3.279

Prototxt, logs

Batch normalization

BN-paper, caffe-PR Note, that results are obtained without mentioned in paper y=kx+b additional layer.

BN -- before or after ReLU?

Name Accuracy LogLoss Comments
Before 0.474 2.35 As in paper
Before + scale&bias layer 0.478 2.33 As in paper
After 0.499 2.21
After + scale&bias layer 0.493 2.24

So in all next experiments, BN is put after non-linearity

BN and activations

Name Accuracy LogLoss Comments
ReLU 0.499 2.21
RReLU 0.500 2.20
PReLU 0.503 2.19
ELU 0.498 2.23
Maxout 0.487 2.28
Sigmoid 0.475 2.35
TanH 0.448 2.50
No 0.384 2.96

BN and dropout

ReLU non-linearity, fc6 and fc7 layer only

Name Accuracy LogLoss Comments
Dropout = 0.5 0.499 2.21
Dropout = 0.2 0.527 2.09
Dropout = 0 0.513 2.19

Prototxt, logs

BN-arch-init

Name Accuracy LogLoss Comments
Caffenet 0.471 2.36
Caffenet BN Before + scale&bias layer LSUV 0.478 2.33
Caffenet BN Before + scale&bias layer Ortho 0.482 2.31
Caffenet BN After LSUV 0.499 2.21
Caffenet BN After Ortho 0.500 2.20
Name Accuracy LogLoss Comments
GoogLeNet128 0.619 1.61
GoogLeNet BN Before + scale&bias layer LSUV 0.603 1.68
GoogLeNet BN Before + scale&bias layer Ortho 0.607 1.67
GoogLeNet BN After LSUV 0.596 1.70
GoogLeNet BN After Ortho 0.584 1.77
[GoogLeNet128_BN_lim0606][https://github.com/lim0606/caffe-googlenet-bn] 0.645 1.54 BN before ReLU + scale bias, linear LR, batch_size = 128, base_lr = 0.005, 640K iter, LSUV init, 5x5 replaced with 3x3 + 3x3. 3x3 replaced with 3x1+1x3

Prototxt, logs

Batch size, ReLU

Tanh results are moved [here] (https://github.com/ducha-aiki/caffenet-benchmark/blob/master/BatchSize.md)

Name Accuracy LogLoss Comments
BS=1024, 4xlr 0.465 2.38 lr=0.04, 80K iters
BS=1024 0.419 2.65 lr=0.01, 80K iters
BS=512, 2xlr 0.469 2.37 lr=0.02, 160K iters
BS=512 0.455 2.46 lr=0.01, 160K iters
BS=256, default 0.471 2.36 lr=0.01, 320K iters
BS=128 0.472 2.35 lr=0.01, 640K iters
BS=128, 1/2 lr 0.470 2.36 lr=0.005, 640K iters
BS=64 0.471 2.34 lr=0.01, 1280K iters
BS=64, 1/4 lr 0.475 2.34 lr=0.0025, 1280K iters
BS=32 0.463 2.40 lr=0.01, 2560K iter
BS=32, 1/8 lr 0.470 2.37 lr=0.00125, 2560K iter
BS=1, 1/256 lr 0.474 2.35 lr=3.9063e-05, 81920K iter. Online training

Prototxt, logs

So general recommendation: too big batch_sizes leads to a bit inferior results, but in general batch_size should be selected based computation speed. If learning rate is adjusted, than no practial differenc e between different batch sizes.

From contributors

Base net is caffenet+BN+ReLU+drop=0.2 There difference in filters (main, 5x5 -> 3x3 + 3x3 or 1x5+5x1) and solver.

Name Accuracy LogLoss Comments
Base 0.527 2.09
Base_dereyly_lr, noBN, ReLU 0.441 2.53 max_iter=160K, stepsize=2K, gamma=0.915, but default caffenet
Base_dereyly 5x1, noBN, ReLU 0.474 2.31 5x5->1x5+5x1
Base_dereyly_PReLU 0.550 1.93 BN, PreLU + base_lr=0.035, exp lr_policy, 160K iters, 5x5->3x3+3x3
Base_dereyly 3x1 0.553 1.92 PreLU + base_lr=0.035, exp lr_policy, 160K iters, 5x5->1x3+1x3+3x1+1x3
Base_dereyly 3x1 scale aug 0.530 2.04 Same as previous, img: 128 crop from (128...300)px image, test resize to 144, crop 128
Base_dereyly 3x1 scale aug 0.512 2.17 Same as previous, img: 128 crop from (128...300)px image, test resize to (128+300)/2, crop 128
Base_dereyly 3x1->5x1 0.546 1.97* PreLU + base_lr=0.035, exp lr_policy, 160K iters, 5x5->1x5+1x5+5x1+1x5
Base_dereyly 3x1,halfBN 0.544 1.95 PreLU + base_lr=0.035, exp lr_policy, 160K iters,5x5->1x3+1x3+3x1+1x3, BN only for pool and fc6
Base_dereyly 5x1 0.540 2.00 PreLU + base_lr=0.035, exp lr_policy, 160K iters, 5x5->1x5+5x1
DarkNetBN 0.502 2.25 16C3->MP2->32C3->MP2->64C3->MP2->128C3->MP2->256C3->MP2->512C3->MP2->1024C3->1000CLF.BN
+ PreLU + base_lr=0.035, exp lr_policy, 160K iters

Prototxt, logs

Residual experiments

Name Accuracy LogLoss Comments
VGG-Like 0.521 2.14 1st layer = 7x7 stride 2, unlike VGG. All other layer = 1/2 VGG width
VGG-LikeRes 0.576 1.83 with residual connections, no BN
VGG-LikeResDrop 0.568 1.91 with residual connections, no BN , dropout in conv

Prototxt, logs

Network width

Name Accuracy LogLoss Comments
4sqrt(2)x wider 0.565 1.96 Start overfitting
4x wider 0.563 1.92 Still no overfitting %)
2sqrt(2)x wider 0.552 1.94
2 wider 0.533 2.04
sqrt(2) wider 0.506 2.17
Default 0.471 2.36
sqrt(2)x narrower 0.460 2.41
2x narrower 0.416 2.68
2sqrt(2)x narrower 0.340 3.11 no group conv
2sqrt(2)x narrower 0.318 3.25
4x narrower 0.256 3.33

logs

Dataset size

Name Accuracy LogLoss Comments
Default, 1.2M images 0.471 2.36
800K images 0.438 2.54
600K images 0.425 2.63
400K images 0.393 2.92
200K images 0.305 4.04

Dataset size, no RGB scaling

Or why input var=1 for LSUV is so important

Name Accuracy LogLoss Comments
800K images 0.438 2.54
600K images 0.425 2.63
600K images, no scale 0.379 2.92
400K images 0.393 2.92
400K images, no scale 0.357 3.10
200K images 0.305 4.04
200K images, no scale 0.277 4.06

logs

Input image size

Name Accuracy LogLoss Comments
64x64 0.309 3.34
96x96 0.414 2.69
128x128 0.471 2.36
180x180 0.521 2.10
224x224 0.565 1.87
300x300 0.559 2.03 In progress, results for 115K

logs

Dataset quality

Name Accuracy LogLoss Comments
Default, clean labels 0.471 2.36
5% incorrect labels 0.458 2.45
10% incorrect labels 0.447 2.58
15% incorrect labels 0.437 2.69
50% incorrect labels 0.347 3.44

logs

Conv1 depth

Name Accuracy LogLoss Comments
Default, no 1x1 or 3x3 0.471 2.36 conv1 -> pool1
+ 1x1x96 NiN 0.490 2.24 conv1 -> 96C1 -> pool1
+ 3x (1x1x96 NiN) 0.509 2.10 conv1 -> 3x(96C1) -> pool1
+ 5x (1x1x96 NiN) 0.514 2.11 conv1 -> 5x(96C1) -> pool1
+ 7x (1x1x96 NiN) 0.514 2.11 conv1 -> 7x(96C1) -> pool1
+ 9x (1x1x96 NiN) 0.516 2.10 conv1 -> 9x(96C1) -> pool1
+ 9x (1x1x96 NiN)R 0.509 2.13 conv1 -> Residual9x(96C1) -> pool1. 276k iters
+ 1x (3x3x96 NiN) 0.500 2.19 conv1 -> 1x(96C3) -> pool1
+ 3x (3x3x96 NiN) 0.538 1.99 conv1 -> 1x(96C3) -> pool1
+ 5x (3x3x96 NiN) 0.551 1.91 conv1 -> 1x(96C3) -> pool1

logs

Other

ReLU non-linearity, fc6 and fc7 layer only

Name Accuracy LogLoss Comments
Default 0.471 2.36 bias lr_rate = 2x weights lr_rate
1x 0.470 2.37 bias lr_rate = 1x weights lr_rate
5x 0.472 2.35 bias lr_rate = 5x weights lr_rate
NoBias 0.445 2.50 Biases initialized with zeros, lr_rate = 0

Prototxt, logs

The PRs with test are welcomed

P.S. Logs are merged from lots of "save-resume", because were trained at nights, so plot "Anything vs. seconds" will give weird results.

More Repositories

1

affnet

Code and weights for local feature affine shape estimation paper "Repeatability Is Not Enough: Learning Discriminative Affine Regions via Discriminability"
Python
238
star
2

pydegensac

Advanced RANSAC (DEGENSAC) with bells and whistles for H and F estimation
C++
226
star
3

pytorch-sift

PyTorch implementation of SIFT descriptor
Jupyter Notebook
158
star
4

manifold-diffusion

Diffusion on manifolds for image retrieval
Python
122
star
5

LSUVinit

Reference caffe implementation of LSUV initialization
C++
112
star
6

pyransac

Fast and accurate python RANSAC with LO, LAF-check
C++
90
star
7

mods

MODS (Matching On Demand with view Synthesis) is algorithm for wide-baseline matching.
C
85
star
8

ransac-tutorial-2020-data

Starter kit for the CVPR 2020 RANSAC tutorial benchmark
Jupyter Notebook
73
star
9

pymagsac

MAGSAC: marginalizing sample consensus, python version
C++
73
star
10

LSUV-keras

Simple implementation of the LSUV initialization in keras
Python
66
star
11

google-retrieval-challenge-2019-fastai-starter

fast.ai starter kit for Google Landmark Retrieval 2019 challenge
Jupyter Notebook
63
star
12

navigation-benchmark

Code for "Benchmarking Classic and Learned Navigation in Complex 3D Environments" paper
Python
62
star
13

LSUV-pytorch

Simple implementation of the LSUV initialization in PyTorch
Python
53
star
14

mods-light-zmq

MODS with external deep descriptors/detectors
C++
51
star
15

whale-identification-2018

Solution to Whale Identification Challenge 2018
Jupyter Notebook
51
star
16

extract_patches

Function for local patch extraction from OpenCV keypoints with proper bluring
Jupyter Notebook
42
star
17

numpy-sift

Numpy implementation of SIFT descriptor
Jupyter Notebook
38
star
18

matching-strategies-comparison

Comparison of the matching strategies for local feature descriptor
Jupyter Notebook
31
star
19

kornia_moons

Conversions between kornia and other computer vision libraries formats
Jupyter Notebook
28
star
20

hesaff-pytorch

PyTorch implementation of Hessian-Affine local feature detector
Python
22
star
21

hardnet-in-fastai2-and-kornia

Re-implementation of local descriptor HardNet training in fasta2+kornia
Jupyter Notebook
21
star
22

fast_atan2

Realization of the atan2 approximations faster than standard function.
C++
20
star
23

imc2021-sample-kornia-submission

Tutorial on how to create submission to Image Matching Challenge 2021
Jupyter Notebook
19
star
24

wide-baseline-stereo-blog

Blog about wide baseline stereo and local features
Jupyter Notebook
16
star
25

cpp-extract-patches

C++ header-only lib for extracting local patches
Jupyter Notebook
15
star
26

extract-patches-old

Simple function for local patch extraction from OpenCV keypoints.
Jupyter Notebook
14
star
27

pixelstitch

Simple tool for labelling the correspondences
Jupyter Notebook
14
star
28

imc2023-kornia-starter-pack

Simple jupyter notebook for 3D reconstruction using kornia and pycolmap
Jupyter Notebook
12
star
29

brown_phototour_revisited

New testing protocol for learning local patch descriptors on Brown Phototour dataset
Jupyter Notebook
12
star
30

vs3-cnn-labs

Computer vision labs for Vision and Sports Summer School 2022
Jupyter Notebook
10
star
31

wxbs-descriptors-benchmark

W1BS local patch descriptors benchmark
Python
9
star
32

mpv-templates-backup

Jupyter Notebook
9
star
33

keras-sift

Jupyter Notebook
7
star
34

local_feature_tutorial

Some examples of how one can use local features
Jupyter Notebook
6
star
35

imagewoofv2-fastv2-maxpoolblur

Jupyter Notebook
5
star
36

caffe-preprocessing-scripts

Creates caffe lmdb from bunch of dirs with images. Clean-up, check, resize included
Shell
5
star
37

wxbs-benchmark

Code for benchmarking image matchers on WxBS dataset/
Jupyter Notebook
3
star
38

creating-data-for-imc

Scripts for creating benchmark data for IMC 2021 competition
Python
3
star
39

ucn-pytorch

Jupyter Notebook
3
star
40

wisv2019-competition

Helper scripts for participation in http://cvg.dsi.unifi.it/cvg/index.php?id=caip-2019-contest
Jupyter Notebook
3
star
41

zeromqransac

C
2
star
42

COVID19-confirmed-cases-plot

Simple jupyter notebook to plot from John Hopkins data
Jupyter Notebook
2
star
43

brown-revisited

Brown patch matching benchmark revisited
Jupyter Notebook
1
star
44

nbu-reports-convert

Combine nbu reports to singlcsv
Roff
1
star
45

lsuv

Python package for neural network initialization
Python
1
star
46

sam1

Howework
Python
1
star