• Stars
    star
    1,866
  • Rank 24,836 (Top 0.5 %)
  • Language
    Python
  • License
    MIT License
  • Created almost 5 years ago
  • Updated 11 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

[ICLR 2020] Once for All: Train One Network and Specialize it for Efficient Deployment

Once-for-All: Train One Network and Specialize it for Efficient Deployment [arXiv] [Slides] [Video]

@inproceedings{
  cai2020once,
  title={Once for All: Train One Network and Specialize it for Efficient Deployment},
  author={Han Cai and Chuang Gan and Tianzhe Wang and Zhekai Zhang and Song Han},
  booktitle={International Conference on Learning Representations},
  year={2020},
  url={https://arxiv.org/pdf/1908.09791.pdf}
}

[News] Once-for-All is available at PyTorch Hub now!

[News] Once-for-All (OFA) Network is adopted by SONY Neural Architecture Search Library.

[News] Once-for-All (OFA) Network is adopted by ADI MAX78000/MAX78002 Model Training and Synthesis Tool.

[News] Once-for-All (OFA) Network is adopted by Alibaba and ranked 1st in the open division of the MLPerf Inference Benchmark (Datacenter and Edge).

[News] First place in the CVPR 2020 Low-Power Computer Vision Challenge, CPU detection and FPGA track.

[News] OFA-ResNet50 is released.

[News] The hands-on tutorial of OFA is released!

[News] OFA is available via pip! Run pip install ofa to install the whole OFA codebase.

[News] First place in the 4th Low-Power Computer Vision Challenge, both classification and detection track.

[News] First place in the 3rd Low-Power Computer Vision Challenge, DSP track at ICCV’19 using the Once-for-all Network.

Train once, specialize for many deployment scenarios

80% top1 ImageNet accuracy under mobile setting

Consistently outperforms MobileNetV3 on Diverse hardware platforms

OFA-ResNet50 [How to use]

How to use / evaluate OFA Networks

Use

""" OFA Networks.
    Example: ofa_network = ofa_net('ofa_mbv3_d234_e346_k357_w1.0', pretrained=True)
""" 
from ofa.model_zoo import ofa_net
ofa_network = ofa_net(net_id, pretrained=True)
    
# Randomly sample sub-networks from OFA network
ofa_network.sample_active_subnet()
random_subnet = ofa_network.get_active_subnet(preserve_weight=True)
    
# Manually set the sub-network
ofa_network.set_active_subnet(ks=7, e=6, d=4)
manual_subnet = ofa_network.get_active_subnet(preserve_weight=True)

If the above scripts failed to download, you download it manually from Google Drive and put them under $HOME/.torch/ofa_nets/.

Evaluate

python eval_ofa_net.py --path 'Your path to imagenet' --net ofa_mbv3_d234_e346_k357_w1.0

OFA Network Design Space Resolution Width Multiplier Depth Expand Ratio kernel Size
ofa_resnet50 ResNet50D 128 - 224 0.65, 0.8, 1.0 0, 1, 2 0.2, 0.25, 0.35 3
ofa_mbv3_d234_e346_k357_w1.0 MobileNetV3 128 - 224 1.0 2, 3, 4 3, 4, 6 3, 5, 7
ofa_mbv3_d234_e346_k357_w1.2 MobileNetV3 160 - 224 1.2 2, 3, 4 3, 4, 6 3, 5, 7
ofa_proxyless_d234_e346_k357_w1.3 ProxylessNAS 128 - 224 1.3 2, 3, 4 3, 4, 6 3, 5, 7

How to use / evaluate OFA Specialized Networks

Use

""" OFA Specialized Networks.
Example: net, image_size = ofa_specialized('flops@[email protected]_finetune@75', pretrained=True)
""" 
from ofa.model_zoo import ofa_specialized
net, image_size = ofa_specialized(net_id, pretrained=True)

If the above scripts failed to download, you download it manually from Google Drive and put them under $HOME/.torch/ofa_specialized/.

Evaluate

python eval_specialized_net.py --path 'Your path to imagent' --net flops@[email protected]_finetune@75

Model Name Details Top-1 (%) Top-5 (%) #Params #MACs
ResNet50 Design Space
ofa-resnet50D-41 [email protected][email protected] 79.8 94.7 30.9M 4.1B
ofa-resnet50D-37 [email protected][email protected] 79.7 94.7 26.5M 3.7B
ofa-resnet50D-30 [email protected][email protected] 79.3 94.5 28.7M 3.0B
ofa-resnet50D-24 [email protected][email protected] 79.0 94.2 29.0M 2.4B
ofa-resnet50D-18 [email protected][email protected] 78.3 94.0 20.7M 1.8B
ofa-resnet50D-12 [email protected][email protected]_finetune@25 77.1 93.3 19.3M 1.2B
ofa-resnet50D-09 [email protected][email protected]_finetune@25 76.3 92.9 14.5M 0.9B
ofa-resnet50D-06 [email protected][email protected]_finetune@25 75.0 92.1 9.6M 0.6B
FLOPs
ofa-595M flops@[email protected]_finetune@75 80.0 94.9 9.1M 595M
ofa-482M flops@[email protected]_finetune@75 79.6 94.8 9.1M 482M
ofa-389M flops@[email protected]_finetune@75 79.1 94.5 8.4M 389M
LG G8
ofa-lg-24 LG-G8_lat@[email protected]_finetune@25 76.4 93.0 5.8M 230M
ofa-lg-16 LG-G8_lat@[email protected]_finetune@25 74.7 92.0 5.8M 151M
ofa-lg-11 LG-G8_lat@[email protected]_finetune@25 73.0 91.1 5.0M 103M
ofa-lg-8 LG-G8_lat@[email protected]_finetune@25 71.1 89.7 4.1M 74M
Samsung S7 Edge
ofa-s7edge-88 s7edge_lat@[email protected]_finetune@25 76.3 92.9 6.4M 219M
ofa-s7edge-58 s7edge_lat@[email protected]_finetune@25 74.7 92.0 4.6M 145M
ofa-s7edge-41 s7edge_lat@[email protected]_finetune@25 73.1 91.0 4.7M 96M
ofa-s7edge-29 s7edge_lat@[email protected]_finetune@25 70.5 89.5 3.8M 66M
Samsung Note8
ofa-note8-65 note8_lat@[email protected]_finetune@25 76.1 92.7 5.3M 220M
ofa-note8-49 note8_lat@[email protected]_finetune@25 74.9 92.1 6.0M 164M
ofa-note8-31 note8_lat@[email protected]_finetune@25 72.8 90.8 4.6M 101M
ofa-note8-22 note8_lat@[email protected]_finetune@25 70.4 89.3 4.3M 67M
Samsung Note10
ofa-note10-64 note10_lat@[email protected]_finetune@75 80.2 95.1 9.1M 743M
ofa-note10-50 note10_lat@[email protected]_finetune@75 79.7 94.9 9.1M 554M
ofa-note10-41 note10_lat@[email protected]_finetune@75 79.3 94.5 9.0M 457M
ofa-note10-30 note10_lat@[email protected]_finetune@75 78.4 94.2 7.5M 339M
ofa-note10-22 note10_lat@[email protected]_finetune@25 76.6 93.1 5.9M 237M
ofa-note10-16 note10_lat@[email protected]_finetune@25 75.5 92.3 4.9M 163M
ofa-note10-11 note10_lat@[email protected]_finetune@25 73.6 91.2 4.3M 110M
ofa-note10-08 note10_lat@[email protected]_finetune@25 71.4 89.8 3.8M 79M
Google Pixel1
ofa-pixel1-143 pixel1_lat@[email protected]_finetune@75 80.1 95.0 9.2M 642M
ofa-pixel1-132 pixel1_lat@[email protected]_finetune@75 79.8 94.9 9.2M 593M
ofa-pixel1-79 pixel1_lat@[email protected]_finetune@75 78.7 94.2 8.2M 356M
ofa-pixel1-58 pixel1_lat@[email protected]_finetune@75 76.9 93.3 5.8M 230M
ofa-pixel1-40 pixel1_lat@[email protected]_finetune@25 74.9 92.1 6.0M 162M
ofa-pixel1-28 pixel1_lat@[email protected]_finetune@25 73.3 91.0 5.2M 109M
ofa-pixel1-20 pixel1_lat@[email protected]_finetune@25 71.4 89.8 4.3M 77M
Google Pixel2
ofa-pixel2-62 pixel2_lat@[email protected]_finetune@25 75.8 92.7 5.8M 208M
ofa-pixel2-50 pixel2_lat@[email protected]_finetune@25 74.7 91.9 4.7M 166M
ofa-pixel2-35 pixel2_lat@[email protected]_finetune@25 73.4 91.1 5.1M 113M
ofa-pixel2-25 pixel2_lat@[email protected]_finetune@25 71.5 90.1 4.1M 79M
1080ti GPU (Batch Size 64)
ofa-1080ti-27 1080ti_gpu64@[email protected]_finetune@25 76.4 93.0 6.5M 397M
ofa-1080ti-22 1080ti_gpu64@[email protected]_finetune@25 75.3 92.4 5.2M 313M
ofa-1080ti-15 1080ti_gpu64@[email protected]_finetune@25 73.8 91.3 6.0M 226M
ofa-1080ti-12 1080ti_gpu64@[email protected]_finetune@25 72.6 90.9 5.9M 165M
V100 GPU (Batch Size 64)
ofa-v100-11 v100_gpu64@[email protected]_finetune@25 76.1 92.7 6.2M 352M
ofa-v100-09 v100_gpu64@[email protected]_finetune@25 75.3 92.4 5.2M 313M
ofa-v100-06 v100_gpu64@[email protected]_finetune@25 73.0 91.1 4.9M 179M
ofa-v100-05 v100_gpu64@[email protected]_finetune@25 71.6 90.3 5.2M 141M
Jetson TX2 GPU (Batch Size 16)
ofa-tx2-96 tx2_gpu16@[email protected]_finetune@25 75.8 92.7 6.2M 349M
ofa-tx2-80 tx2_gpu16@[email protected]_finetune@25 75.4 92.4 5.2M 313M
ofa-tx2-47 tx2_gpu16@[email protected]_finetune@25 72.9 91.1 4.9M 179M
ofa-tx2-35 tx2_gpu16@[email protected]_finetune@25 70.3 89.4 4.3M 121M
Intel Xeon CPU with MKL-DNN (Batch Size 1)
ofa-cpu-17 cpu_lat@[email protected]_finetune@25 75.7 92.6 4.9M 365M
ofa-cpu-15 cpu_lat@[email protected]_finetune@25 74.6 92.0 4.9M 301M
ofa-cpu-11 cpu_lat@[email protected]_finetune@25 72.0 90.4 4.4M 160M
ofa-cpu-10 cpu_lat@[email protected]_finetune@25 71.1 89.9 4.2M 143M

How to train OFA Networks

mpirun -np 32 -H <server1_ip>:8,<server2_ip>:8,<server3_ip>:8,<server4_ip>:8 \
    -bind-to none -map-by slot \
    -x NCCL_DEBUG=INFO -x LD_LIBRARY_PATH -x PATH \
    python train_ofa_net.py

or

horovodrun -np 32 -H <server1_ip>:8,<server2_ip>:8,<server3_ip>:8,<server4_ip>:8 \
    python train_ofa_net.py

Introduction Video

Watch the video

Hands-on Tutorial Video

Watch the video

Requirement

  • Python 3.6+
  • Pytorch 1.4.0+
  • ImageNet Dataset
  • Horovod

Related work on automated and efficient deep learning:

ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware (ICLR’19)

AutoML for Architecting Efficient and Specialized Neural Networks (IEEE Micro)

AMC: AutoML for Model Compression and Acceleration on Mobile Devices (ECCV’18)

HAQ: Hardware-Aware Automated Quantization (CVPR’19, oral)

More Repositories

1

streaming-llm

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
Python
6,530
star
2

bevfusion

[ICRA'23] BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation
Python
2,286
star
3

temporal-shift-module

[ICCV 2019] TSM: Temporal Shift Module for Efficient Video Understanding
Python
2,060
star
4

llm-awq

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Python
1,687
star
5

proxylessnas

[ICLR 2019] ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware
C++
1,420
star
6

torchquantum

A PyTorch-based framework for Quantum Classical Simulation, Quantum Machine Learning, Quantum Neural Networks, Parameterized Quantum Circuits with support for easy deployments on real quantum computers.
Jupyter Notebook
1,304
star
7

data-efficient-gans

[NeurIPS 2020] Differentiable Augmentation for Data-Efficient GAN Training
Python
1,277
star
8

efficientvit

EfficientViT is a new family of vision models for efficient high-resolution vision.
Python
1,218
star
9

torchsparse

[MICRO'23, MLSys'22] TorchSparse: Efficient Training and Inference Framework for Sparse Convolution on GPUs.
Cuda
1,181
star
10

smoothquant

[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
Python
1,175
star
11

gan-compression

[CVPR 2020] GAN Compression: Efficient Architectures for Interactive Conditional GANs
Python
1,104
star
12

anycost-gan

[CVPR 2021] Anycost GANs for Interactive Image Synthesis and Editing
Python
778
star
13

tinyml

Python
755
star
14

TinyChatEngine

TinyChatEngine: On-Device LLM Inference Library
C++
730
star
15

tinyengine

[NeurIPS 2020] MCUNet: Tiny Deep Learning on IoT Devices; [NeurIPS 2021] MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning; [NeurIPS 2022] MCUNetV3: On-Device Training Under 256KB Memory
C
717
star
16

fastcomposer

[IJCV] FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention
Python
644
star
17

pvcnn

[NeurIPS 2019, Spotlight] Point-Voxel CNN for Efficient 3D Deep Learning
Python
639
star
18

lite-transformer

[ICLR 2020] Lite Transformer with Long-Short Range Attention
Python
589
star
19

spvnas

[ECCV 2020] Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution
Python
577
star
20

distrifuser

[CVPR 2024 Highlight] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
Python
538
star
21

mcunet

[NeurIPS 2020] MCUNet: Tiny Deep Learning on IoT Devices; [NeurIPS 2021] MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning
Python
460
star
22

tiny-training

On-Device Training Under 256KB Memory [NeurIPS'22]
Python
432
star
23

amc

[ECCV 2018] AMC: AutoML for Model Compression and Acceleration on Mobile Devices
Python
428
star
24

dlg

[NeurIPS 2019] Deep Leakage From Gradients
Python
400
star
25

haq

[CVPR 2019, Oral] HAQ: Hardware-Aware Automated Quantization with Mixed Precision
Python
368
star
26

offsite-tuning

Offsite-Tuning: Transfer Learning without Full Model
Python
365
star
27

hardware-aware-transformers

[ACL'20] HAT: Hardware-Aware Transformers for Efficient Natural Language Processing
Python
321
star
28

litepose

[CVPR'22] Lite Pose: Efficient Architecture Design for 2D Human Pose Estimation
Python
304
star
29

inter-operator-scheduler

[MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration
C++
191
star
30

amc-models

[ECCV 2018] AMC: AutoML for Model Compression and Acceleration on Mobile Devices
Python
166
star
31

apq

[CVPR 2020] APQ: Joint Search for Network Architecture, Pruning and Quantization Policy
Python
156
star
32

parallel-computing-tutorial

C++
134
star
33

flatformer

[CVPR'23] FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer
Python
119
star
34

patch_conv

Patch convolution to avoid large GPU memory usage of Conv2D
Python
74
star
35

6s965-fall2022

Jupyter Notebook
64
star
36

sparsevit

[CVPR'23] SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer
Python
48
star
37

bnn-icestick

Binary Neural Network on IceStick FPGA.
Jupyter Notebook
47
star
38

e3d

Efficient 3D Deep Learning
46
star
39

neurips-micronet

[JMLR'20] NeurIPS 2019 MicroNet Challenge Efficient Language Modeling, Champion
Jupyter Notebook
40
star
40

spatten-llm

[HPCA'21] SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning
Scala
32
star
41

tinychat-tutorial

C++
28
star
42

pruning-sparsity-publications

14
star
43

iccad-tinyml-open

[ICCAD'22 TinyML Contest] Efficient Heart Stroke Detection on Low-cost Microcontrollers
C
14
star
44

calo-cluster

Jupyter Notebook
5
star
45

ml-blood-pressure

Python
5
star
46

gan-compression-dynamic

Python
3
star
47

data-efficient-gans-dynamic

Python
3
star