• Stars
    star
    179
  • Rank 207,267 (Top 5 %)
  • Language
    Python
  • Created almost 2 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

This is a PyTorch implementation of “Context AutoEncoder for Self-Supervised Representation Learning"

CAE: Context AutoEncoder for Self-Supervised Representation Learning

This is a PyTorch implementation of CAE: Context AutoEncoder for Self-Supervised Representation Learning.

Highlights

  • State-of-the-art MIM performance. Results in the paper are successfully reproduced.

Installation

Clone the repo and install required packages.

pip install -r requirements.txt

# install apex
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Data Preparation

First, download ImageNet-1k from http://image-net.org/.

The directory structure is the standard layout of torchvision's datasets.ImageFolder. The training and validation data are expected to be in the train/ folder and val folder, respectively:

/path/to/imagenet/
  train/
    class1/
      img1.jpeg
    class2/
      img2.jpeg
  val/
    class1/
      img3.jpeg
    class/2
      img4.jpeg

Second, download the pretrained tokenizer.

TOKENIZER_PATH=/path/to/save/dall_e_tokenizer_weight
mkdir -p $TOKENIZER_PATH
wget -o $TOKENIZER_PATH/encoder.pkl https://cdn.openai.com/dall-e/encoder.pkl
wget -o $TOKENIZER_PATH/decoder.pkl https://cdn.openai.com/dall-e/decoder.pkl

Pretraining

Here is an example that pretrains CAE-base on ImageNet-1K with 32 GPUs. Please see scripts/cae_base_800e.sh for complete script.

OMP_NUM_THREADS=1 $PYTHON -m torch.distributed.launch \
  --nproc_per_node=8 \
  tools/run_pretraining.py \
  --data_path ${DATA_PATH} \
  --output_dir ${OUTPUT_DIR} \
  --model cae_base_patch16_224_8k_vocab --discrete_vae_weight_path ${TOKENIZER_PATH} \
  --batch_size 64 --lr 1.5e-3 --warmup_epochs 20 --epochs 800 \
  --clip_grad 3.0 --layer_scale_init_value 0.1 \
  --imagenet_default_mean_and_std \
  --color_jitter 0 \
  --drop_path 0.1 \
  --sincos_pos_emb \
  --mask_generator block \
  --num_mask_patches 98 \
  --decoder_layer_scale_init_value 0.1 \
  --no_auto_resume \
  --save_ckpt_freq 100 \
  --exp_name $my_name \
  --regressor_depth 4 \
  --decoder_depth 4 \
  --align_loss_weight 2
  • --num_mask_patches: number of the input patches need be masked.
  • --batch_size: batch size per GPU.
  • Effective batch size = number of GPUs * --batch_size. So in the above example, the effective batch size is 64*32 = 2048.
  • --lr: learning rate.
  • --warmup_epochs: learning rate warmup epochs. Warm up [10, 20, 40] epochs for [300, 800, 1600] pretrain epochs respectively.
  • --epochs: total pretraining epochs.
  • --clip_grad: clip gradient norm.
  • --drop_path: stochastic depth rate.
  • --imagenet_default_mean_and_std: enable this for ImageNet-1k pretraining, i.e., (0.485, 0.456, 0.406) for mean and (0.229, 0.224, 0.225) for std. For other pretraining data, use (0.5, 0.5, 0.5) for mean and (0.5, 0.5, 0.5) for std by default.
  • --layer_scale_init_value: 0.1 for base, 1e-5 for large, set 0 to disable layerscale. We set --decoder_layer_scale_init_value the same as this.
  • --sincos_pos_emb: adopt sin-cos positional embedding during pretraining.
  • --regressor_depth: length of the regressor.
  • --decoder_depth: length of the decoder.
  • --align_loss_weight: weight for alignment loss. 2 by default.

Warmup epochs for 300/800/1600 epochs pretraining are 10/20/40.

For CAE-large, please refer to scripts/cae_large_1600e.sh.

Results

Here provides the results of CAE-base/CAE-large for these evaluation tasks:

  • Linear probing
  • Attentive probing
  • Fine-tuning
  • Semantic segmentation
  • Object detection and instance segmentation

Pretrained weights and logs are available (Google Drive, Baidu Cloud [Code: 4kil]). *: from CAE paper.

Model Pretraining data #Epoch Linear Attentive Fine-tuning ADE Seg COCO Det COCO InstSeg
MAE-base* ImageNet-1K 1600 67.8 74.2 83.6 48.1 48.4 42.6
MAE-large* ImageNet-1K 1600 76.0 78.8 86.0 53.6 54.0 47.1
CAE-base ImageNet-1K 300 64.5 74.0 83.6 48.1 48.3 42.7
CAE-base ImageNet-1K 800 68.9 75.9 83.8 49.7 49.9 43.9
CAE-base ImageNet-1K 1600 70.3 77.2 83.9 50.3 50.3 44.2
CAE-large ImageNet-1K 1600 77.8 81.2 86.2 54.9 54.5 47.5

Linear Probing

  • Please refer to scripts/cae_base_800e.sh (32 GPUs).
  • For CAE-large, just replace --model cae_base_patch16_224 with --model cae_large_patch16_224.

Attentive Probing

  • Please refer to scripts/cae_base_800e.sh (32 GPUs).
  • For CAE-large, just replace --model cae_base_patch16_224 with --model cae_large_patch16_224.

Fine-tuning

Segmentation & Detection

Acknowledgement

This repository is built using the BEiT and MMSelfSup, thanks for their open-source code! Thanks also to the CAE authors for their excellent work!

Citation

@article{ContextAutoencoder2022,
  title={Context Autoencoder for Self-Supervised Representation Learning},
  author={Chen, Xiaokang and Ding, Mingyu and Wang, Xiaodi and Xin, Ying and Mo, Shentong and Wang, Yunhao and Han, Shumin and Luo, Ping and Zeng, Gang and Wang, Jingdong},
  journal={arXiv preprint arXiv:2202.03026},
  year={2022}
}

More Repositories

1

OMG-Seg

[CVPR-2024] One Model For Image/Video/Instractive/Open-Vocabulary Segmentation
Python
683
star
2

OctaveConv_pytorch

Pytorch implementation of newly added convolution
Python
577
star
3

Awesome-Segmentation-With-Transformer

[Arxiv-04-2023] Transformer-Based Visual Segmentation: A Survey
577
star
4

DecoupleSegNets

Implementation of Our ECCV2020-work: Improving Semantic Segmentation via Decoupled Body and Edge Supervision
Python
366
star
5

SFSegNets

[ECCV-2020-oral]-Semantic Flow for Fast and Accurate Scene Parsing
Python
352
star
6

GALD-DGCNet

Source code and model GALD net (BMVC-2019) and Dual-Seg Net (BMVC-2019)
Python
344
star
7

Fast_Seg

This repo provides ⚡ fast⚡ semantic segmentation models on CityScapes/Camvid DataSet by Pytorch
Python
208
star
8

Video-K-Net

[CVPR-2022 (oral)]-Video K-Net: A Simple, Strong, and Unified Baseline for Video Segmentation
Python
150
star
9

PFSegNets

PointFlow (CVPR-2021)
Python
121
star
10

Tube-Link

[ICCV-2023]-Universal Video Segmentaion For VSS, VPS and VIS
Python
105
star
11

dfn_seg

Implementation of Paper Learning a Discriminative Feature Network for Semantic Segmentation (CVPR2018)(face++)
Python
70
star
12

BSSeg

BoundarySqueeze: Image Segmentation as Boundary Squeezing
Python
53
star
13

Panoptic-PartFormer

[ECCV-2022] The First Unified End-to-End System for Panoptic Part Segmentation
Python
52
star
14

fuse_seg_pytorch

Pytorch Implementation of Paper: Enhancing Feature Fusion for Semantic Segmentation (face++)
Python
43
star
15

TemporalPyramidRouting

Temporal Pyramid Routing For Video Instance Segmentation-T-PAMI-2022
Python
26
star
16

AI_challenger_Chinese_Caption

Repository for image caption for Chinese
Jupyter Notebook
25
star
17

QueryPanSeg

Query Learning of Both Thing and Stuff for Panoptic Segmentation-ICIP-2022
15
star
18

deepLearning.ai.solution

This repository contains the implementation of deep learning courses by Andrew ng on Coursera
Jupyter Notebook
13
star
19

netwarp_test

Semantic Video CNNs through Representation Warping. ICCV 2017
Python
5
star
20

CompactSecondOrderNet

3
star
21

cinema_java_software_engineering

This repository contains a simple Cinema System. This is the project of 3rd Software Engineering
Java
1
star
22

Pytorch-Cifar-models

This repository contains some famous CNN models that can run on the cifar-10 dataset
Python
1
star
23

BasicAlgorithm-PAT-LeetCode-LintCode-

This repository is used to record the study of algorithm.
C++
1
star
24

MobileNet2-pytorch

This repository contains mobile nets implemetation by pytorch
Python
1
star
25

Adaboost-byhand

This repository contains the basic, mulit_boosting and basic bagging implementation
Python
1
star