• Stars
    star
    1,435
  • Rank 32,809 (Top 0.7 %)
  • Language
    Python
  • License
    MIT License
  • Created almost 2 years ago
  • Updated 10 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

[ICLR'23 Spotlight🔥] The first successful BERT/MAE-style pretraining on any convolutional network; Pytorch impl. of "Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling"

SparK: the first successful BERT/MAE-style pretraining on any convolutional networks  Reddit Twitter

This is the official implementation of ICLR paper Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling, which can pretrain any CNN (e.g., ResNet) in a BERT-style self-supervised manner. We've tried our best to make the codebase clean, short, easy to read, state-of-the-art, and only rely on minimal dependencies.

SparK_demo_22s_4k_wo_bages.1.mp4

SOTA  OpenReview  arXiv

🔥 News

🕹️ Colab Visualization Demo

Check pretrain/viz_reconstruction.ipynb for visualizing the reconstruction of SparK pretrained models, like:

We also provide pretrain/viz_spconv.ipynb that shows the "mask pattern vanishing" issue of dense conv layers.

What's new here?

🔥 Pretrained CNN beats pretrained Swin-Transformer:

🔥 After SparK pretraining, smaller models can beat un-pretrained larger models:

🔥 All models can benefit, showing a scaling behavior:

🔥 Generative self-supervised pretraining surpasses contrastive learning:

See our paper for more analysis, discussions, and evaluations.

Todo list

catalog

Pretrained weights (self-supervised; w/o decoder; can be directly finetuned)

Note: for network definitions, we directly use timm.models.ResNet and official ConvNeXt.

reso.: the image resolution; acc@1: ImageNet-1K finetuned acc (top-1)

arch. reso. acc@1 #params flops weights (self-supervised, without SparK's decoder)
ResNet50 224 80.6 26M 4.1G resnet50_1kpretrained_timm_style.pth
ResNet101 224 82.2 45M 7.9G resnet101_1kpretrained_timm_style.pth
ResNet152 224 82.7 60M 11.6G resnet152_1kpretrained_timm_style.pth
ResNet200 224 83.1 65M 15.1G resnet200_1kpretrained_timm_style.pth
ConvNeXt-S 224 84.1 50M 8.7G convnextS_1kpretrained_official_style.pth
ConvNeXt-B 224 84.8 89M 15.4G convnextB_1kpretrained_official_style.pth
ConvNeXt-L 224 85.4 198M 34.4G convnextL_1kpretrained_official_style.pth
ConvNeXt-L 384 86.0 198M 101.0G convnextL_384_1kpretrained_official_style.pth
Pretrained weights (with SparK's UNet-style decoder; can be used to reconstruct images)
arch. reso. acc@1 #params flops weights (self-supervised, with SparK's decoder)
ResNet50 224 80.6 26M 4.1G res50_withdecoder_1kpretrained_spark_style.pth
ResNet101 224 82.2 45M 7.9G res101_withdecoder_1kpretrained_spark_style.pth
ResNet152 224 82.7 60M 11.6G res152_withdecoder_1kpretrained_spark_style.pth
ResNet200 224 83.1 65M 15.1G res200_withdecoder_1kpretrained_spark_style.pth
ConvNeXt-S 224 84.1 50M 8.7G cnxS224_withdecoder_1kpretrained_spark_style.pth
ConvNeXt-L 384 86.0 198M 101.0G cnxL384_withdecoder_1kpretrained_spark_style.pth

Installation & Running

We highly recommended you to use torch==1.10.0, torchvision==0.11.1, and timm==0.5.4 for reproduction. Check INSTALL.md to install all pip dependencies.

  • Loading pretrained model weights in 3 lines
# download our weights `resnet50_1kpretrained_timm_style.pth` first
import torch, timm
res50, state = timm.create_model('resnet50'), torch.load('resnet50_1kpretrained_timm_style.pth', 'cpu')
res50.load_state_dict(state.get('module', state), strict=False)     # just in case the model weights are actually saved in state['module']

Acknowledgement

We referred to these useful codebases:

License

This project is under the MIT license. See LICENSE for more details.

Citation

If you found this project useful, you can kindly give us a star ⭐, or cite us in your work 📖:

@Article{tian2023designing,
  author  = {Keyu Tian and Yi Jiang and Qishuai Diao and Chen Lin and Liwei Wang and Zehuan Yuan},
  title   = {Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling},
  journal = {arXiv:2301.03580},
  year    = {2023},
}

More Repositories

1

Cpp-Gomoku-with-AI

C++ Gomoku with a strong AI based on minimax search and alpha-beta pruning with Qt5 GUI. *Dozens of C++ tricks & hacks are used to improve efficiency.* Come and try to see if you can beat the powerful AI!
C++
66
star
2

Gobang_AI_by_Kevin

纯C五子棋AI实现。当然也可双人对弈。
C
38
star
3

Star-War

Multiplayer game with Java Swing framework, using MVVM design architecture and Dependency-Inversion principle.
Java
4
star
4

dda

Dynamic Data Augmentation for Time Series
Shell
3
star
5

BUAA-datastructure-project-solution

[Ranked No. 1🥇] My solution for the course project of Datastructure 2019'Spring @ BUAA (北航数据结构). Plenty of C language tricks, hacks, and optimizations are used for extreme efficiency. *Ranked 1/800* in the efficiency test.
C
3
star
6

BUAA-parallel-computing-project-solution

[Ranked No. 1🥇] My solution for the course project of Parallel Computing 2021'Spring @ BUAA (北航并行程序设计). Plenty of C++ tricks, hacks, and optimizations are used for extreme efficiency. Ranked *1/100* in the efficiency test.
C++
3
star
7

CP-miniplc0

miniplc0 implementation based on pure python
Python
2
star
8

lyyf_bert

Python
2
star
9

Christmas-tree

Marry Xmas! (from http://www.codebaoku.com/it-python/it-python-232037.html)
Python
2
star
10

Computer-Network-2020Fall-Assignment

my crawler assignment for Computer Network
Python
2
star
11

marcov_qa

Python
1
star
12

BUAA-DS-2019Spring

the code repository for my datastructure-course(using C89/C99)
C
1
star
13

Data-Structure-Cpp11

数据结构,C++11实现,兼顾效率、可读性、可维护性、可扩展性
C++
1
star
14

nlp_prototype

Python
1
star
15

Machine-Learning-Projects

some small projects and games using the methods or the algorithms based on the machine learning
C++
1
star
16

BUAA-compiler-principles-project-solution

My solution for the course project of Compiler Principles 2021'Spring @ BUAA (北航编译原理). A compiler for language C0 including assembler, lexical tokenizer, and syntactic analyzer. Low coupling, high cohesion. Clean code architecture. Minimal design.
Python
1
star