• Stars
    star
    1,218
  • Rank 38,490 (Top 0.8 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created over 1 year ago
  • Updated 8 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

EfficientViT is a new family of vision models for efficient high-resolution vision.

EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction (paper, poster)

News

If you are interested in getting updates, please join our mailing list here.

  • [2023/09/18] EfficientViT for Segment Anything Model (SAM) is released. EfficientViT SAM runs at 1009 images/s on A100 GPU, compared to ViT-H (12 images/s), mobileSAM (297 images/s), and nanoSAM (744 image/s, but much lower mIoU)
  • [2023/09/12] EfficientViT is highlighted by MIT home page and MIT News.
  • [2023/07/18] EfficientViT is accepted by ICCV 2023.

Demo

EfficientViT-L0 for Segment Anything (1009 image/s on A100 GPU) demo demo demo

EfficientViT-L1 for Semantic Segmentation (45.9ms on Nvidia Jetson AGX Orin, 82.716 mIoU on Cityscapes)

demo

About EfficientViT Models

EfficientViT is a new family of ViT models for efficient high-resolution dense prediction vision tasks. The core building block of EfficientViT is a lightweight, multi-scale linear attention module that achieves global receptive field and multi-scale learning with only hardware-efficient operations, making EfficientViT TensorRT-friendly and suitable for GPU deployment.

Third-Party Implementation/Integration

Getting Started

Installation

conda create -n efficientvit python=3.10
conda activate efficientvit
conda install -c conda-forge mpi4py openmpi
pip install -r requirements.txt

Dataset

ImageNet: https://www.image-net.org/
Our code expects the ImageNet dataset directory to follow the following structure:

imagenet
β”œβ”€β”€ train
β”œβ”€β”€ val
Cityscapes: https://www.cityscapes-dataset.com/
Our code expects the Cityscapes dataset directory to follow the following structure:

cityscapes
β”œβ”€β”€ gtFine
|   β”œβ”€β”€ train
|   β”œβ”€β”€ val
β”œβ”€β”€ leftImg8bit
|   β”œβ”€β”€ train
|   β”œβ”€β”€ val
ADE20K: https://groups.csail.mit.edu/vision/datasets/ADE20K/
Our code expects the ADE20K dataset directory to follow the following structure:

ade20k
β”œβ”€β”€ annotations
|   β”œβ”€β”€ training
|   β”œβ”€β”€ validation
β”œβ”€β”€ images
|   β”œβ”€β”€ training
|   β”œβ”€β”€ validation

Pretrained Models

Latency/Throughput is measured on NVIDIA Jetson Nano, NVIDIA Jetson AGX Orin, and NVIDIA A100 GPU with TensorRT, fp16. Data transfer time is included.

Segment Anything

In this version, the EfficientViT segment anything models are trained using the image embedding extracted by SAM ViT-H as the target. The prompt encoder and mask decoder are the same as SAM ViT-H.

Image Encoder COCO-val2017 mIoU (all) COCO-val2017 mIoU (large) COCO-val2017 mIoU (medium) COCO-val2017 mIoU (small) Params MACs A100 Throughput Checkpoint
NanoSAM 70.6 79.6 73.8 62.4 - - 744 image/s -
MobileSAM 72.8 80.4 75.9 65.8 - - 297 image/s -
EfficientViT-L0 74.454 81.410 77.201 68.159 31M 35G 1009 image/s link
EfficientViT-L1 75.183 81.786 78.110 68.944 44M 49G 815 image/s link
EfficientViT-L2 75.656 81.706 78.644 69.689 57M 69G 634 image/s link

ImageNet

All EfficientViT classification models are trained on ImageNet-1K with random initialization (300 epochs + 20 warmup epochs) using supervised learning.

Model Resolution ImageNet Top1 Acc ImageNet Top5 Acc Params MACs A100 Throughput Checkpoint
EfficientNetV2-S 384x384 83.9 - 22M 8.4G 2869 image/s -
EfficientNetV2-M 480x480 85.2 - 54M 25G 1160 image/s -
EfficientViT-L1 224x224 84.484 96.862 53M 5.3G 6207 image/s link
EfficientViT-L2 224x224 85.050 97.090 64M 6.9G 4998 image/s link
EfficientViT-L2 256x256 85.366 97.216 64M 9.1G 3969 image/s link
EfficientViT-L2 288x288 85.630 97.364 64M 11G 3102 image/s link
EfficientViT-L2 320x320 85.734 97.438 64M 14G 2525 image/s link
EfficientViT-L2 384x384 85.978 97.518 64M 20G 1784 image/s link
EfficientViT-L3 224x224 85.814 97.198 246M 28G 2081 image/s link
EfficientViT-L3 256x256 85.938 97.318 246M 36G 1641 image/s link
EfficientViT-L3 288x288 86.070 97.440 246M 46G 1276 image/s link
EfficientViT-L3 320x320 86.230 97.474 246M 56G 1049 image/s link
EfficientViT-L3 384x384 86.408 97.632 246M 81G 724 image/s link
EfficientViT B series
Model Resolution ImageNet Top1 Acc ImageNet Top5 Acc Params MACs Jetson Nano (bs1) Jetson Orin (bs1) Checkpoint
EfficientViT-B1 224x224 79.390 94.346 9.1M 0.52G 24.8ms 1.48ms link
EfficientViT-B1 256x256 79.918 94.704 9.1M 0.68G 28.5ms 1.57ms link
EfficientViT-B1 288x288 80.410 94.984 9.1M 0.86G 34.5ms 1.82ms link
EfficientViT-B2 224x224 82.100 95.782 24M 1.6G 50.6ms 2.63ms link
EfficientViT-B2 256x256 82.698 96.096 24M 2.1G 58.5ms 2.84ms link
EfficientViT-B2 288x288 83.086 96.302 24M 2.6G 69.9ms 3.30ms link
EfficientViT-B3 224x224 83.468 96.356 49M 4.0G 101ms 4.36ms link
EfficientViT-B3 256x256 83.806 96.514 49M 5.2G 120ms 4.74ms link
EfficientViT-B3 288x288 84.150 96.732 49M 6.5G 141ms 5.63ms link

Cityscapes

Model Resolution Cityscapes mIoU Params MACs Jetson Orin Latency (bs1) A100 Throughput (bs1) Checkpoint
EfficientViT-L1 1024x2048 82.716 40M 282G 45.9ms 122 image/s link
EfficientViT-L2 1024x2048 83.228 53M 396G 60.0ms 102 image/s link
EfficientViT B series
Model Resolution Cityscapes mIoU Params MACs Jetson Nano (bs1) Jetson Orin (bs1) Checkpoint
EfficientViT-B0 1024x2048 75.653 0.7M 4.4G 275ms 9.9ms link
EfficientViT-B1 1024x2048 80.547 4.8M 25G 819ms 24.3ms link
EfficientViT-B2 1024x2048 82.073 15M 74G 1676ms 46.5ms link
EfficientViT-B3 1024x2048 83.016 40M 179G 3192ms 81.8ms link

ADE20K

Model Resolution ADE20K mIoU Params MACs Jetson Orin Latency (bs1) A100 Throughput (bs16) Checkpoint
EfficientViT-L1 512x512 49.191 40M 36G 7.2ms 947 image/s link
EfficientViT-L2 512x512 50.702 51M 45G 9.0ms 758 image/s link
EfficientViT B series
Model Resolution ADE20K mIoU Params MACs Jetson Nano (bs1) Jetson Orin (bs1) Checkpoint
EfficientViT-B1 512x512 42.840 4.8M 3.1G 110ms 4.0ms link
EfficientViT-B2 512x512 45.941 15M 9.1G 212ms 7.3ms link
EfficientViT-B3 512x512 49.013 39M 22G 411ms 12.5ms link

Usage

# segment anything
from efficientvit.sam_model_zoo import create_sam_model

efficientvit_sam = create_sam_model(
  name="l2", weight_url="assets/checkpoints/sam/l2.pt",
)
efficientvit_sam = efficientvit_sam.cuda().eval()
from efficientvit.models.efficientvit.sam import EfficientViTSamPredictor

efficientvit_sam_predictor = EfficientViTSamPredictor(efficientvit_sam)
from efficientvit.models.efficientvit.sam import EfficientViTSamAutomaticMaskGenerator

efficientvit_mask_generator = EfficientViTSamAutomaticMaskGenerator(efficientvit_sam)
# classification
from efficientvit.cls_model_zoo import create_cls_model

model = create_cls_model(
  name="l3", weight_url="assets/checkpoints/cls/l3-r384.pt"
)
# semantic segmentation
from efficientvit.seg_model_zoo import create_seg_model

model = create_seg_model(
  name="l2", dataset="cityscapes", weight_url="assets/checkpoints/seg/cityscapes/l2.pt"
)

model = create_seg_model(
  name="l2", dataset="ade20k", weight_url="assets/checkpoints/seg/ade20k/l2.pt"
)

Evaluation

Please run eval_sam_coco.py, eval_cls_model.py or eval_seg_model.py to evaluate our models.

Examples: segment anything, classification, segmentation

Visualization

Please run demo_sam_model.py to visualize our segment anything models.

Example:

# segment everything
python demo_sam_model.py --model l1 --mode all

# prompt with points
python demo_sam_model.py --model l1 --mode point

# prompt with box
python demo_sam_model.py --model l1 --mode box --box "[150,70,630,400]"

Please run eval_seg_model.py to visualize the outputs of our semantic segmentation models.

Example:

python eval_seg_model.py --dataset cityscapes --crop_size 1024 --model b3 --save_path demo/cityscapes/b3/

Export TFLite

To generate TFLite files, please refer to tflite_export.py. It requires the TinyNN package.

pip install git+https://github.com/alibaba/TinyNeuralNetwork.git

Example:

python tflite_export.py --export_path model.tflite --task seg --dataset ade20k --model b3 --resolution 512 512

Export ONNX

To generate ONNX files, please refer to onnx_export.py.

To export ONNX files for EfficientViT SAM models, please refer to the scripts shared by CVHub.

Training

Please see TRAINING.md for detailed training instructions.

Contact

Han Cai: [email protected]

TODO

  • ImageNet Pretrained models
  • Segmentation Pretrained models
  • ImageNet training code
  • EfficientViT L series, designed for cloud
  • EfficientViT for segment anything
  • EfficientViT for super-resolution
  • Segmentation training code

Citation

If EfficientViT is useful or relevant to your research, please kindly recognize our contributions by citing our paper:

@article{cai2022efficientvit,
  title={Efficientvit: Enhanced linear attention for high-resolution low-computation visual recognition},
  author={Cai, Han and Gan, Chuang and Han, Song},
  journal={arXiv preprint arXiv:2205.14756},
  year={2022}
}

More Repositories

1

streaming-llm

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
Python
6,530
star
2

bevfusion

[ICRA'23] BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation
Python
2,286
star
3

temporal-shift-module

[ICCV 2019] TSM: Temporal Shift Module for Efficient Video Understanding
Python
2,060
star
4

once-for-all

[ICLR 2020] Once for All: Train One Network and Specialize it for Efficient Deployment
Python
1,866
star
5

llm-awq

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Python
1,687
star
6

proxylessnas

[ICLR 2019] ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware
C++
1,420
star
7

torchquantum

A PyTorch-based framework for Quantum Classical Simulation, Quantum Machine Learning, Quantum Neural Networks, Parameterized Quantum Circuits with support for easy deployments on real quantum computers.
Jupyter Notebook
1,304
star
8

data-efficient-gans

[NeurIPS 2020] Differentiable Augmentation for Data-Efficient GAN Training
Python
1,277
star
9

torchsparse

[MICRO'23, MLSys'22] TorchSparse: Efficient Training and Inference Framework for Sparse Convolution on GPUs.
Cuda
1,181
star
10

smoothquant

[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
Python
1,175
star
11

gan-compression

[CVPR 2020] GAN Compression: Efficient Architectures for Interactive Conditional GANs
Python
1,104
star
12

anycost-gan

[CVPR 2021] Anycost GANs for Interactive Image Synthesis and Editing
Python
778
star
13

tinyml

Python
755
star
14

TinyChatEngine

TinyChatEngine: On-Device LLM Inference Library
C++
730
star
15

tinyengine

[NeurIPS 2020] MCUNet: Tiny Deep Learning on IoT Devices; [NeurIPS 2021] MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning; [NeurIPS 2022] MCUNetV3: On-Device Training Under 256KB Memory
C
717
star
16

fastcomposer

[IJCV] FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention
Python
644
star
17

pvcnn

[NeurIPS 2019, Spotlight] Point-Voxel CNN for Efficient 3D Deep Learning
Python
639
star
18

lite-transformer

[ICLR 2020] Lite Transformer with Long-Short Range Attention
Python
589
star
19

spvnas

[ECCV 2020] Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution
Python
577
star
20

distrifuser

[CVPR 2024 Highlight] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
Python
538
star
21

mcunet

[NeurIPS 2020] MCUNet: Tiny Deep Learning on IoT Devices; [NeurIPS 2021] MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning
Python
460
star
22

tiny-training

On-Device Training Under 256KB Memory [NeurIPS'22]
Python
432
star
23

amc

[ECCV 2018] AMC: AutoML for Model Compression and Acceleration on Mobile Devices
Python
428
star
24

dlg

[NeurIPS 2019] Deep Leakage From Gradients
Python
400
star
25

haq

[CVPR 2019, Oral] HAQ: Hardware-Aware Automated Quantization with Mixed Precision
Python
368
star
26

offsite-tuning

Offsite-Tuning: Transfer Learning without Full Model
Python
365
star
27

hardware-aware-transformers

[ACL'20] HAT: Hardware-Aware Transformers for Efficient Natural Language Processing
Python
321
star
28

litepose

[CVPR'22] Lite Pose: Efficient Architecture Design for 2D Human Pose Estimation
Python
304
star
29

inter-operator-scheduler

[MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration
C++
191
star
30

amc-models

[ECCV 2018] AMC: AutoML for Model Compression and Acceleration on Mobile Devices
Python
166
star
31

apq

[CVPR 2020] APQ: Joint Search for Network Architecture, Pruning and Quantization Policy
Python
156
star
32

parallel-computing-tutorial

C++
134
star
33

flatformer

[CVPR'23] FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer
Python
119
star
34

patch_conv

Patch convolution to avoid large GPU memory usage of Conv2D
Python
74
star
35

6s965-fall2022

Jupyter Notebook
64
star
36

sparsevit

[CVPR'23] SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer
Python
48
star
37

bnn-icestick

Binary Neural Network on IceStick FPGA.
Jupyter Notebook
47
star
38

e3d

Efficient 3D Deep Learning
46
star
39

neurips-micronet

[JMLR'20] NeurIPS 2019 MicroNet Challenge Efficient Language Modeling, Champion
Jupyter Notebook
40
star
40

spatten-llm

[HPCA'21] SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning
Scala
32
star
41

tinychat-tutorial

C++
28
star
42

pruning-sparsity-publications

14
star
43

iccad-tinyml-open

[ICCAD'22 TinyML Contest] Efficient Heart Stroke Detection on Low-cost Microcontrollers
C
14
star
44

calo-cluster

Jupyter Notebook
5
star
45

ml-blood-pressure

Python
5
star
46

gan-compression-dynamic

Python
3
star
47

data-efficient-gans-dynamic

Python
3
star