• Stars
    star
    915
  • Rank 49,917 (Top 1.0 %)
  • Language
    MATLAB
  • License
    MIT License
  • Created over 7 years ago
  • Updated almost 6 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Memory consumption and FLOP count estimates for convnets

convnet-burden

Estimates of memory consumption and FLOP counts for various convolutional neural networks.

Image Classification Architectures

The numbers below are given for single element batches.

model input size param mem feat. mem flops src performance
alexnet 227 x 227 233 MB 3 MB 727 MFLOPs MCN 41.80 / 19.20
caffenet 224 x 224 233 MB 3 MB 724 MFLOPs MCN 42.60 / 19.70
squeezenet1-0 224 x 224 5 MB 30 MB 837 MFLOPs PT 41.90 / 19.58
squeezenet1-1 224 x 224 5 MB 17 MB 360 MFLOPs PT 41.81 / 19.38
vgg-f 224 x 224 232 MB 4 MB 727 MFLOPs MCN 41.40 / 19.10
vgg-m 224 x 224 393 MB 12 MB 2 GFLOPs MCN 36.90 / 15.50
vgg-s 224 x 224 393 MB 12 MB 3 GFLOPs MCN 37.00 / 15.80
vgg-m-2048 224 x 224 353 MB 12 MB 2 GFLOPs MCN 37.10 / 15.80
vgg-m-1024 224 x 224 333 MB 12 MB 2 GFLOPs MCN 37.80 / 16.10
vgg-m-128 224 x 224 315 MB 12 MB 2 GFLOPs MCN 40.80 / 18.40
vgg-vd-16-atrous 224 x 224 82 MB 58 MB 16 GFLOPs N/A - / -
vgg-vd-16 224 x 224 528 MB 58 MB 16 GFLOPs MCN 28.50 / 9.90
vgg-vd-19 224 x 224 548 MB 63 MB 20 GFLOPs MCN 28.70 / 9.90
googlenet 224 x 224 51 MB 26 MB 2 GFLOPs MCN 34.20 / 12.90
resnet18 224 x 224 45 MB 23 MB 2 GFLOPs PT 30.24 / 10.92
resnet34 224 x 224 83 MB 35 MB 4 GFLOPs PT 26.70 / 8.58
resnet-50 224 x 224 98 MB 103 MB 4 GFLOPs MCN 24.60 / 7.70
resnet-101 224 x 224 170 MB 155 MB 8 GFLOPs MCN 23.40 / 7.00
resnet-152 224 x 224 230 MB 219 MB 11 GFLOPs MCN 23.00 / 6.70
resnext-50-32x4d 224 x 224 96 MB 132 MB 4 GFLOPs L1 22.60 / 6.49
resnext-101-32x4d 224 x 224 169 MB 197 MB 8 GFLOPs L1 21.55 / 5.93
resnext-101-64x4d 224 x 224 319 MB 273 MB 16 GFLOPs PT 20.81 / 5.66
inception-v3 299 x 299 91 MB 89 MB 6 GFLOPs PT 22.55 / 6.44
SE-ResNet-50 224 x 224 107 MB 103 MB 4 GFLOPs SE 22.37 / 6.36
SE-ResNet-101 224 x 224 189 MB 155 MB 8 GFLOPs SE 21.75 / 5.72
SE-ResNet-152 224 x 224 255 MB 220 MB 11 GFLOPs SE 21.34 / 5.54
SE-ResNeXt-50-32x4d 224 x 224 105 MB 132 MB 4 GFLOPs SE 20.97 / 5.54
SE-ResNeXt-101-32x4d 224 x 224 187 MB 197 MB 8 GFLOPs SE 19.81 / 4.96
SENet 224 x 224 440 MB 347 MB 21 GFLOPs SE 18.68 / 4.47
SE-BN-Inception 224 x 224 46 MB 43 MB 2 GFLOPs SE 23.62 / 7.04
densenet121 224 x 224 31 MB 126 MB 3 GFLOPs PT 25.35 / 7.83
densenet161 224 x 224 110 MB 235 MB 8 GFLOPs PT 22.35 / 6.20
densenet169 224 x 224 55 MB 152 MB 3 GFLOPs PT 24.00 / 7.00
densenet201 224 x 224 77 MB 196 MB 4 GFLOPs PT 22.80 / 6.43
mcn-mobilenet 224 x 224 16 MB 38 MB 579 MFLOPs AU 29.40 / -

Click on the model name for a more detailed breakdown of feature extraction costs at different input image/batch sizes if needed. The performance numbers are reported as top-1 error/top-5 error on the 2012 ILSVRC validation data. The src column indicates the source of the benchmark scores using the following abberviations:

  • MCN - scores obtained from the matconvnet website.
  • PT - scores obtained from the PyTorch torchvision module.
  • L1 - evaluated locally (follow link to view benchmark code).
  • AU - numbers reported by the paper authors.

These numbers provide an estimate of performance, but note that there may be small differences between the evaluation scripts from different sources.

References:

  • alexnet - Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012.
  • squeezenet - Iandola, Forrest N., et al. "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size." arXiv preprint arXiv:1602.07360 (2016).
  • vgg-m - Chatfield, Ken, et al. "Return of the devil in the details: Delving deep into convolutional nets." arXiv preprint arXiv:1405.3531 (2014).
  • vgg-vd-16/vgg-vd-19 - Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014).
  • vgg-vd-16-reduced - Liu, Wei, Andrew Rabinovich, and Alexander C. Berg. "Parsenet: Looking wider to see better." arXiv preprint arXiv:1506.04579 (2015)
  • googlenet - Szegedy, Christian, et al. "Going deeper with convolutions." Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.
  • inception - Szegedy, Christian, et al. "Rethinking the inception architecture for computer vision." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
  • resnet - He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
  • resnext - Xie, Saining, et al. "Aggregated residual transformations for deep neural networks." arXiv preprint arXiv:1611.05431 (2016).
  • SENets - Jie Hu, Li Shen and Gang Sun. "Squeeze-and-Excitation Networks." arXiv preprint arXiv:1709.01507 (2017).
  • Densenet - Huang, Gao, et al. "Densely connected convolutional networks." CVPR, (2017).

Object Detection Architectures

model input size param memory feature memory flops
rfcn-res50-pascal 600 x 850 122 MB 1 GB 79 GFLOPS
rfcn-res101-pascal 600 x 850 194 MB 2 GB 117 GFLOPS
ssd-pascal-vggvd-300 300 x 300 100 MB 116 MB 31 GFLOPS
ssd-pascal-vggvd-512 512 x 512 104 MB 337 MB 91 GFLOPS
ssd-pascal-mobilenet-ft 300 x 300 22 MB 37 MB 1 GFLOPs
faster-rcnn-vggvd-pascal 600 x 850 523 MB 600 MB 172 GFLOPS

The input sizes used are "typical" for each of the architectures listed, but can be varied. Anchor/priorbox generation and roi/psroi-pooling are not included in flop estimates. The ssd-pascal-mobilenet-ft detector uses the MobileNet feature extractor (the model used here was imported from the architecture made available by chuanqi305).

References:

  • faster-rcnn - Ren, Shaoqing, et al. "Faster R-CNN: Towards real-time object detection with region proposal networks." Advances in neural information processing systems. 2015..
  • r-fcn - Li, Yi, Kaiming He, and Jian Sun. "R-fcn: Object detection via region-based fully convolutional networks." Advances in Neural Information Processing Systems. 2016.
  • ssd - Liu, Wei, et al. "Ssd: Single shot multibox detector." European conference on computer vision. Springer, Cham, 2016.
  • mobilenets - Howard, Andrew G., Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. "Mobilenets: Efficient convolutional neural networks for mobile vision applications." arXiv preprint arXiv:1704.04861 (2017).

Semantic Segmentation Architectures

model input size param memory feature memory flops
pascal-fcn32s 384 x 384 519 MB 423 MB 125 GFLOPS
pascal-fcn16s 384 x 384 514 MB 424 MB 125 GFLOPS
pascal-fcn8s 384 x 384 513 MB 426 MB 125 GFLOPS
deeplab-vggvd-v2 513 x 513 144 MB 755 MB 202 GFLOPs
deeplab-res101-v2 513 x 513 505 MB 4 GB 346 GFLOPs

In this case, the input sizes are those which are typically taken as input crops during training. The deeplab-res101-v2 model uses multi-scale input, with scales x1, x0.75, x0.5 (computed relative to the given input size).

References:

  • pascal-fcn - Long, Jonathan, Evan Shelhamer, and Trevor Darrell. "Fully convolutional networks for semantic segmentation." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015..
  • deeplab - DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs Liang-Chieh Chen^, George Papandreou^, Iasonas Kokkinos, Kevin Murphy, and Alan L. Yuille (^equal contribution) Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

Keypoint Detection Architectures

model input size param memory feature memory flops
multipose-mpi 368 x 368 196 MB 245 MB 134 GFLOPS
multipose-coco 368 x 368 200 MB 246 MB 136 GFLOPS

References:

  • multipose - Cao, Zhe, et al. "Realtime multi-person 2d pose estimation using part affinity fields." arXiv preprint arXiv:1611.08050 (2016)..

Notes and Assumptions

The numbers for each architecture should be reasonably framework agnostic. It is assumed that all weights and activations are stored as floats (with 4 bytes per datum) and that all relus are performed in-place. Feature memory therefore represents an estimate of the total memory consumption of the features computed via a forward pass of the network for a given input, assuming that memory is not re-used (the exception to this is that, as noted above, relus are performed in-place and do not add to the feature memory total). In practice, many frameworks will clear features from memory when they are no-longer required by the execution path and will therefore require less memory than is noted here. The feature memory statistic is simply a rough guide as to "how big" the activations of the network look.

Fused multiply-adds are counted as single operations. The numbers should be considered to be rough approximations - modern hardware makes it very difficult to accurately count operations (and even if you could, pipelining etc. means that it is not necessarily a good estimate of inference time).

The tool for computing the estimates is implemented as a module for the autonn wrapper of matconvnet and is included in this repo, so feel free to take a look for extra details. This module can be installed with the vl_contrib package manager (it has two dependencies which can be installed in a similar manner: autonn and mcnExtraLayers). Matconvnet versions of all of the models can be obtained from either here or here.

For further reading on the topic, the 2017 ICLR submission An analysis of deep neural network models for practical applications is interesting. If you find any issues, or would like to add additional models, add an issue/PR.

More Repositories

1

collaborative-experts

Video embeddings for retrieval with natural language queries
Python
331
star
2

SIGBOVIK17-GUNs

Supporting public code for SIGBOVIK17 submission
Python
193
star
3

shot-detection-benchmarks

A comparison of ffmpeg, Shotdetect and PySceneDetect for shot transition detection
Jupyter Notebook
112
star
4

mcnCrossModalEmotions

Supporting code for "Emotion Recognition in Speech using Cross-Modal Transfer in the Wild"
MATLAB
100
star
5

slurm_gpustat

A simple command line tool to show GPU usage on a SLURM cluster
Python
99
star
6

pytorch-mcn

Convert models from MatConvNet to PyTorch
Python
93
star
7

mcnPyTorch

Model converter (PyTorch -> MatConvNet)
Python
61
star
8

mcnGroupNorm

Small scale experiments with group normalization
MATLAB
58
star
9

algorithms-and-data-structures

Video descriptions and minimalist Python implementations of algorithms and data structures.
Python
56
star
10

yaspi

yaspi - Yet Another Slurm Python Interface
Python
41
star
11

mcnExtraLayers

Extra layers and utilities for matconvnet
MATLAB
38
star
12

mcnSENets

Squeeze-and-Excitation Networks
MATLAB
38
star
13

mcnSSD

A matconvnet implementation of the Single Shot Detector
MATLAB
36
star
14

foundation-models

Video descriptions of research papers relating to foundation models and scaling
27
star
15

pytorch-benchmarks

convenience utilities for model validation
Python
23
star
16

wider2pascal

A simple script for generating Pascal VOC devkit-style annotations for the WIDER faces dataset
MATLAB
21
star
17

mcnRobustLoss

robust regression loss functions
MATLAB
17
star
18

LearningGrimacesByWatchingTV

Code to accompany the paper "Learning Grimaces By Watching TV" and FaceValue dataset
MATLAB
12
star
19

mcnFasterRCNN

Training code for the Faster-RCNN detector
MATLAB
11
star
20

CReLU

simple experiments to reproduce the CReLU paper
MATLAB
11
star
21

matconvnet-ResNet

Deep Residual Networks for MatConvNet
MATLAB
10
star
22

pts_loader

A simple python function to load point cloud data from .pts files
Python
10
star
23

watchlogs

A simple utility for watching multiple logs
Python
7
star
24

large-language-models-are-few-shot-publication-scoopers

7
star
25

mcnToyOptim

Toy dataset examples for solvers
MATLAB
7
star
26

A-23MW-data-centre-is-all-you-need

Paper: https://arxiv.org/abs/2203.17265, Video: https://www.youtube.com/watch?v=VUKvgsLu9yo
7
star
27

mcnDeepLab

support for DeepLab segmentation
MATLAB
6
star
28

mcnBReNorm

batch renomalization
MATLAB
6
star
29

samuel-api

samuel-api is a biological competitor to GPT-4
5
star
30

mcnDatasets

imdb constructors/utils for some common datasets
MATLAB
5
star
31

mcnMaxout

Maxout networks for MatConvNet
MATLAB
5
star
32

zsvision

A small collection of python utilities for computer vision tasks
Python
4
star
33

mcnRFCN

Matconvnet implementation of R-FCN detector [no longer maintained]
MATLAB
4
star
34

scientists-on-youtube

a small collection of interviews and lectures
4
star
35

mcnDistilledTransfer

MATLAB
4
star
36

blockchainSimulation

A simple script to reproduce the simulations described in the bitcoin paper
MATLAB
3
star
37

matlab-zsvision

a few python/MATLAB scripts for computer vision
MATLAB
3
star
38

derivations

TeX
3
star
39

mcnIm2row

MATLAB interface for im2row function
Cuda
3
star
40

mcnNMS

Non-maximum suppression for MatConvNet
Cuda
3
star
41

Learning_C

C
2
star
42

mcnMovingStats

Online Estimation of feature statistics
MATLAB
2
star
43

mcnResNeXt

ResNeXt for matconvnet
MATLAB
2
star
44

atrous-benchmark

rough GPU benchmark for atrous vgg-vd-16 model in matconvnet
MATLAB
2
star
45

K-and-R

Notes/scripts for the programming exercises in "The C Programming Language", by Kernighan and Ritchie
C
2
star
46

mcnUtils

some utilities for matconvnet
MATLAB
1
star
47

tf-models

Python
1
star
48

ZR-face-detector

Some minor modifications to the Zhu-Ramanan face detector
C++
1
star
49

SIGBOVIK18-STNs

Supporting public code for SIGBOVIK18 submission
Python
1
star
50

sarcasm_detector

A project for detecting sarcastic content in tweets
JavaScript
1
star
51

mcnTensorflow

project to to import tensorflow models for matconvnet
Python
1
star
52

AI-news

1
star
53

prompt_formatting_in_latex

Python
1
star
54

generator_tutorial

Python
1
star
55

albanie

1
star
56

mcnQRelu

A simple, efficient matconvnet CUDA implementation of the leaky ReLU function
Cuda
1
star
57

caffe-utils

C++
1
star
58

grimaces

CNN training code for Tour project
MATLAB
1
star
59

robot_challenge

code base for AIMS robot challenge
MATLAB
1
star
60

mcnColors

Color transformation kernels
Cuda
1
star
61

building-matconvnet-modules

a short demonstration of how to build matconvnet modules
MATLAB
1
star
62

typst2mathjax

A minimalist tool to convert from typst to mathjax
Python
1
star