• Stars
    star
    810
  • Rank 55,872 (Top 2 %)
  • Language
    Python
  • License
    MIT License
  • Created over 1 year ago
  • Updated 11 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

VanillaNet: the Power of Minimalism in Deep Learning

Official PyTorch implementation of VanillaNet, from the following paper:
VanillaNet: the Power of Minimalism in Deep Learning
Hanting chen, Yunhe Wang, Jianyuan Guo and Dacheng Tao

VanillaNet is an innovative neural network architecture that focuses on simplicity and efficiency. Moving away from complex features such as shortcuts and attention mechanisms, VanillaNet uses a reduced number of layers while still maintaining excellent performance. This project showcases that it's possible to achieve effective results with a lean architecture, thereby setting a new path in the field of computer vision and challenging the status quo of foundation models.

News

2023.06.02 In addition to the reported speed in the paper, we have also measured the speed with NVIDIA TensorRT on A100 and the speed on HUAWEI Ascend 910. The inference speed of VanillaNet is superior to other counterparts. 🍺

Comparison of Depth and Speed

VanillaNet achieves comparable performance to prevalent computer vision foundation models, yet boasts a reduced depth and enhanced inference speed:

  • 11-layers' VanillaNet achieves about 81% Top-1 accuracy with 3.59ms, over 100% speed increase compared to ResNet-50 (7.64ms).
  • 13 layers' VanillaNet (1.5x*) achieves about 83% Top-1 accuracy with 9.72ms, over 100% speed increase compared to Swin-S (20.25ms).
  • With tensorRT FP32 on A100, 11 layers' VanillaNet achieves about 81% Top-1 accuracy with 0.69ms, over 100% speed increase compared to Swin-T (1.41ms) and ResNet-101 (1.58ms).
name Params(M) FLOPs(B) Latency(ms)
Pytorch
A100
Latency(ms)
MindSpore
Ascend 910
Latency(ms)
TRT FP32
A100
Latency(ms)
TRT FP16
A100
Acc(%)
Swin-T 28.3 4.5 10.51 2.24 1.41 0.98 81.18
ResNet-18 11.7 1.8 3.12 0.60 0.41 0.28 70.6
ResNet-34 21.8 3.7 5.57 0.97 0.77 0.49 75.5
ResNet-50 25.6 4.1 7.64 1.23 0.80 0.54 79.8
ResNet-101 45.0 8.0 - 2.34 1.58 1.04 81.3
ResNet-152 60.2 11.5 - 3.40 2.30 1.49 81.8
VanillaNet-5 15.5 5.2 1.61 0.47 0.33 0.27 72.49
VanillaNet-6 32.5 6.0 2.01 0.61 0.40 0.33 76.36
VanillaNet-7 32.8 6.9 2.27 0.88 0.47 0.39 77.98
VanillaNet-8 37.1 7.7 2.56 0.96 0.52 0.45 79.13
VanillaNet-9 41.4 8.6 2.91 1.02 0.58 0.49 79.87
VanillaNet-10 45.7 9.4 3.24 1.11 0.63 0.53 80.57
VanillaNet-11 50.0 10.3 3.59 1.17 0.69 0.58 81.08
VanillaNet-12 54.3 11.1 3.82 1.26 0.75 0.62 81.55
VanillaNet-13 58.6 11.9 4.26 1.33 0.82 0.67 82.05

Downstream Tasks

Please refer to this page.

VanillaNet achieves a higher Frames Per Second (FPS) in detection and segmentation tasks.

Catalog

  • ImageNet-1K Testing Code
  • ImageNet-1K Training Code of VanillaNet-5 to VanillaNet-10
  • ImageNet-1K Pretrained Weights of VanillaNet-5 to VanillaNet-10
  • ImageNet-1K Training Code of VanillaNet-11 to VanillaNet-13
  • ImageNet-1K Pretrained Weights of VanillaNet-11 to VanillaNet-13
  • Downstream Transfer (Detection, Segmentation) Code

Results and Pre-trained Models

ImageNet-1K trained models

name #params(M) FLOPs(B) Lacency(ms) Acc(%) model
VanillaNet-5 15.5 5.2 1.61 72.49 model
VanillaNet-6 32.5 6.0 2.01 76.36 model
VanillaNet-7 32.8 6.9 2.27 77.98 model
VanillaNet-8 37.1 7.7 2.56 79.13 model
VanillaNet-9 41.4 8.6 2.91 79.87 model
VanillaNet-10 45.7 9.4 3.24 80.57 model
VanillaNet-11 50.0 10.3 3.59 81.08 model
VanillaNet-12 54.3 11.1 3.82 81.55 model
VanillaNet-13 58.6 11.9 4.26 82.05 model
VanillaNet-13-1.5x 127.8 26.5 7.83 82.53 model
VanillaNet-13-1.5x† 127.8 48.9 9.72 83.11 model

Installation

The results are produced with torch==1.10.2+cu113 torchvision==0.11.3+cu113 timm==0.6.12. Other versions might also work.

Install Pytorch and, torchvision following official instructions.

Install required packages:

pip install timm==0.6.12
pip install cupy-cuda113
pip install torchprofile
pip install einops
pip install tensorboardX
pip install terminaltables

Dataset Preparation

Download the ImageNet-1K classification dataset and structure the data as follows:

/path/to/imagenet-1k/
  train/
    class1/
      img1.jpeg
    class2/
      img2.jpeg
  val/
    class1/
      img3.jpeg
    class2/
      img4.jpeg

Testing

We give an example evaluation command for VanillaNet-5:

without deploy:

python -m torch.distributed.launch --nproc_per_node=1 main.py --model vanillanet_5 --data_path /path/to/imagenet-1k/ --real_labels /path/to/imagenet_real_labels.json --finetune /path/to/vanillanet_5.pth --eval True --model_key model_ema --crop_pct 0.875

with deploy:

python -m torch.distributed.launch --nproc_per_node=1 main.py --model vanillanet_5 --data_path /path/to/imagenet-1k/ --real_labels /path/to/imagenet_real_labels.json --finetune /path/to/vanillanet_5.pth --eval True --model_key model_ema --crop_pct 0.875 --switch_to_deploy /path/to/vanillanet_5_deploy.pth

Training

You can use the following command to train VanillaNet-5 on a single machine with 8 GPUs:

python -m torch.distributed.launch --nproc_per_node=8 main.py \
--model vanillanet_5 \
--data_path /path/to/imagenet-1k \
--batch_size 128 --update_freq 1  --epochs 300 --decay_epochs 100 \ 
--lr 3.5e-3 --weight_decay 0.35  --drop 0.05 \
--opt lamb --aa rand-m7-mstd0.5-inc1 --mixup 0.1 --bce_loss \
--output_dir /path/to/save_results \
--model_ema true --model_ema_eval true --model_ema_decay 0.99996 \
--use_amp true 
  • Here, the effective batch size = --nproc_per_node * --batch_size * --update_freq. In the example above, the effective batch size is 8*128*1 = 1024.

To train other VanillaNet variants, --model need to be changed. Examples are given below.

VanillaNet-6
python -m torch.distributed.launch --nproc_per_node=8 main.py \
--model vanillanet_6 \
--data_path /path/to/imagenet-1k \
--batch_size 128 --update_freq 1  --epochs 300 --decay_epochs 100 \ 
--lr 4.8e-3 --weight_decay 0.32  --drop 0.05 \
--layer_decay 0.8 --layer_decay_num_layers 4 \
--opt lamb --aa rand-m7-mstd0.5-inc1 --mixup 0.15 --bce_loss \
--output_dir /path/to/save_results \
--model_ema true --model_ema_eval true --model_ema_decay 0.99996 \
--use_amp true 
VanillaNet-7
python -m torch.distributed.launch --nproc_per_node=8 main.py \
--model vanillanet_7 \
--data_path /path/to/imagenet-1k \
--batch_size 128 --update_freq 1  --epochs 300 --decay_epochs 100 \ 
--lr 4.7e-3 --weight_decay 0.35  --drop 0.05 \
--layer_decay 0.8 --layer_decay_num_layers 5 \
--opt lamb --aa rand-m7-mstd0.5-inc1 --mixup 0.4 --bce_loss \
--output_dir /path/to/save_results \
--model_ema true --model_ema_eval true --model_ema_decay 0.99996 \
--use_amp true 
VanillaNet-8
python -m torch.distributed.launch --nproc_per_node=8 main.py \
--model vanillanet_8 \
--data_path /path/to/imagenet-1k \
--batch_size 128 --update_freq 1  --epochs 300 --decay_epochs 100 \ 
--lr 3.5e-3 --weight_decay 0.3  --drop 0.05 \
--opt lamb --aa rand-m7-mstd0.5-inc1 --mixup 0.4 --bce_loss \
--output_dir /path/to/save_results \
--model_ema true --model_ema_eval true --model_ema_decay 0.99996 \
--use_amp true 
VanillaNet-9
python -m torch.distributed.launch --nproc_per_node=8 main.py \
--model vanillanet_9 \
--data_path /path/to/imagenet-1k \
--batch_size 128 --update_freq 1  --epochs 300 --decay_epochs 100 \ 
--lr 3.5e-3 --weight_decay 0.3  --drop 0.05 \
--opt lamb --aa rand-m7-mstd0.5-inc1 --mixup 0.4 --bce_loss \
--output_dir /path/to/save_results \
--model_ema true --model_ema_eval true --model_ema_decay 0.99996 \
--use_amp true 
VanillaNet-10
python -m torch.distributed.launch --nproc_per_node=8 main.py \
--model vanillanet_10 \
--data_path /path/to/imagenet-1k \
--batch_size 128 --update_freq 1  --epochs 300 --decay_epochs 100 \ 
--lr 3.5e-3 --weight_decay 0.25  --drop 0.05 \
--opt lamb --aa rand-m7-mstd0.5-inc1 --mixup 0.4 --bce_loss \
--output_dir /path/to/save_results \
--model_ema true --model_ema_eval true --model_ema_decay 0.99996 \
--use_amp true 

Acknowledgement

This repository is built using the timm library, DeiT, BEiT, RegVGG, and ConvNeXt repositories.

License

This project is released under the MIT license. Please see the LICENSE file for more information.

Instruction pdf

A instruction pdf (Chinese version) can be found here

Citation

If our work is useful for your research, please consider citing:

@article{chen2023vanillanet,
  title={VanillaNet: the Power of Minimalism in Deep Learning},
  author={Chen, Hanting and Wang, Yunhe and Guo, Jianyuan and Tao, Dacheng},
  journal={arXiv preprint arXiv:2305.12972},
  year={2023}
}

More Repositories

1

Efficient-AI-Backbones

Efficient AI Backbones including GhostNet, TNT and MLP, developed by Huawei Noah's Ark Lab.
Python
3,965
star
2

HEBO

Bayesian optimisation & Reinforcement Learning library developped by Huawei Noah's Ark Lab
Jupyter Notebook
3,195
star
3

Pretrained-Language-Model

Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.
Python
2,961
star
4

Efficient-Computing

Efficient computing methods developed by Huawei Noah's Ark Lab
Jupyter Notebook
1,116
star
5

AdderNet

Code for paper " AdderNet: Do We Really Need Multiplications in Deep Learning?"
Python
952
star
6

trustworthyAI

Trustworthy AI related projects
Python
949
star
7

SMARTS

Scalable Multi-Agent RL Training School for Autonomous Driving
Python
922
star
8

bolt

Bolt is a deep learning library with high performance and heterogeneous flexibility.
C++
896
star
9

noah-research

Noah Research
Python
855
star
10

vega

AutoML tools chain
Python
840
star
11

Speech-Backbones

This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.
Jupyter Notebook
547
star
12

streamDM

Stream Data Mining Library for Spark Streaming
Scala
490
star
13

Pretrained-IPT

Python
406
star
14

xingtian

xingtian is a componentized library for the development and verification of reinforcement learning algorithms
Python
305
star
15

benchmark

HTML
274
star
16

Disout

Code for AAAI 2020 paper, Beyond Dropout: Feature Map Distortion to Regularize Deep Neural Networks (Disout).
Python
219
star
17

BGCN

A Tensorflow implementation of "Bayesian Graph Convolutional Neural Networks" (AAAI 2019).
Python
152
star
18

BHT-ARIMA

Code for paper: Block Hankel Tensor ARIMA for Multiple Short Time Series Forecasting (AAAI-20)
Python
97
star
19

multi_hyp_cc

[CVPR2020] A Multi-Hypothesis Approach to Color Constancy
Python
82
star
20

Efficient-NLP

Python
79
star
21

streamDM-Cpp

stream Machine Learning in C++
C++
68
star
22

Federated-Learning

Python
15
star