• Stars
    star
    141
  • Rank 259,971 (Top 6 %)
  • Language
    Python
  • License
    MIT License
  • Created almost 3 years ago
  • Updated about 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

PyTorch reimplementation of the paper "Swin Transformer V2: Scaling Up Capacity and Resolution" [CVPR 2022].

Swin Transformer V2: Scaling Up Capacity and Resolution

License: MIT

This implementation has been merged into the PyTorch Image Models library (Timm) with the nice help of Ross Wightman. Timm also offers pre-trained weights on ImageNet1k (see release).

Unofficial PyTorch reimplementation of the paper Swin Transformer V2: Scaling Up Capacity and Resolution by Ze Liu, Han Hu et al. (Microsoft Research Asia).

This repository includes a pure PyTorch implementation of the Swin Transformer V2 and provides pre-trained weights (CIFAR10 & Places365).

The official Swin Transformer V1 implementation is available here. Currently (13.04.2022), an official implementation of the Swin Transformer V2 is not publicly available.

Update: The official Swin Transformer V2 implementation has been released here!

Installation

You can simply install the Swin Transformer V2 implementation as a Python package by using pip.

pip install git+https://github.com/ChristophReich1996/Swin-Transformer-V2

Alternatively, you can clone the repository and use the implementation in swin_transformer_v2 directly in your project.

Usage

This implementation provides the configurations reported in the paper (SwinV2-T, SwinV2-S, etc.). You can build the model by calling the corresponding function. Please note that the Swin Transformer V2 (SwinTransformerV2 class) implementation returns the feature maps of each stage of the network (List[torch.Tensor]). If you want to use this implementation for image classification simply wrap this model and take the final feature map (a wrapper example can be found here).

from swin_transformer_v2 import SwinTransformerV2

from swin_transformer_v2 import swin_transformer_v2_t, swin_transformer_v2_s, swin_transformer_v2_b, \
    swin_transformer_v2_l, swin_transformer_v2_h, swin_transformer_v2_g

# SwinV2-T
swin_transformer: SwinTransformerV2 = swin_transformer_v2_t(in_channels=3,
                                                            window_size=8,
                                                            input_resolution=(256, 256),
                                                            sequential_self_attention=False,
                                                            use_checkpoint=False)

If you want to change the resolution and/or the window size for fine-tuning or inference please use the update_resolution method.

# Change resolution and window size of the model
swin_transformer.update_resolution(new_window_size=16, new_input_resolution=(512, 512))

In case you want to use a custom configuration you can use the SwinTransformerV2 class. The constructor method takes the following parameters.

Parameter Description Type
in_channels Number of input channels int
depth Depth of the stage (number of layers) int
downscale If true input is downsampled (see Fig. 3 or V1 paper) bool
input_resolution Input resolution Tuple[int, int]
number_of_heads Number of attention heads to be utilized int
window_size Window size to be utilized int
shift_size Shifting size to be used int
ff_feature_ratio Ratio of the hidden dimension in the FFN to the input channels int
dropout Dropout in input mapping float
dropout_attention Dropout rate of attention map float
dropout_path Dropout in main path float
use_checkpoint If true checkpointing is utilized bool
sequential_self_attention If true sequential self-attention is performed bool
use_deformable_block If true deformable block is used bool

This file includes a full example how to use this implementation.

This implementation also includes a deformable version of the Swin Transformer V2 inspired by the paper Vision Transformer with Deformable Attention. Deformable attention can be utilized by setting use_deformable_block=True.

This repository also provides an image classification training script for CIFAR10 and Places365.

Results

Model Dataset Accuracy Weights
Swin Transformer V2 T CIFAR10 0.8974 backbone weights
Swin Transformer V2 T deformable CIFAR10 0.8962 backbone weights
Swin Transformer V2 B Places365 (256 X 256) 0.4456 (after 13 epochs) backbone weights

For details on how to load the checkpoints have a look at this issue.

Disclaimer

This is a very experimental implementation based on the Swin Transformer V2 paper and the official implementation of the Swin Transformer V1. Especially, the sequential self-attention implementation is currently not really memory efficient, if you have any idea for a more efficient sequential implementation please open a pull request. Since an official implementation of the Swin Transformer V2 is not yet published, it is not possible to say to which extent this implementation might differ from the original one. If you have any issues with this implementation please raise an issue.

Reference

@article{Liu2021,
    title={{Swin Transformer V2: Scaling Up Capacity and Resolution}},
    author={Liu, Ze and Hu, Han and Lin, Yutong and Yao, Zhuliang and Xie, Zhenda and Wei, Yixuan and Ning, Jia and Cao, 
            Yue and Zhang, Zheng and Dong, Li and others},
    journal={arXiv preprint arXiv:2111.09883},
    year={2021}
}

More Repositories

1

MaxViT

PyTorch reimplementation of the paper "MaxViT: Multi-Axis Vision Transformer" [arXiv 2022].
Python
135
star
2

Involution

PyTorch reimplementation of the paper "Involution: Inverting the Inherence of Convolution for Visual Recognition" (2D and 3D Involution) [CVPR 2021].
Python
101
star
3

Cell-DETR

Official and maintained implementation of the paper "Attention-Based Transformers for Instance Segmentation of Cells in Microstructures" [BIBM 2020].
Python
88
star
4

Semantic_Pyramid_for_Image_Generation

PyTorch reimplementation of the paper: "Semantic Pyramid for Image Generation" [CVPR 2020].
Python
46
star
5

Mode_Collapse

Mode collapse example of GANs in 2D (PyTorch).
Python
30
star
6

ECG_Classification

Official and maintained implementation of the paper "Exploring Novel Algorithms for Atrial Fibrillation Detection by Driving Graduate Level Education in Medical Machine Learning" (ECG-DualNet) [Physiological Measurement 2022].
Python
27
star
7

OSS-Net

Official and maintained implementation of the paper "OSS-Net: Memory Efficient High Resolution Semantic Segmentation of 3D Medical Data" [BMVC 2021].
Python
24
star
8

SmeLU

PyTorch reimplementation of the Smooth ReLU activation function proposed in the paper "Real World Large Scale Recommendation Systems Reproducibility and Smooth Activations" [arXiv 2022].
Python
17
star
9

ToeffiPy

ToeffiPy is a PyTorch like autograd/deep learning library based only on NumPy.
Python
16
star
10

Dirac-GAN

PyTorch reimplementation of the DiracGAN proposed in the paper "Which Training Methods for GANs do actually Converge?" [ICML 2018].
Python
16
star
11

Optical-Flow-Visualization-PyTorch

PyTorch implementation of the classical optical flow visualization by Baker et al. [ICCV 2007].
Python
13
star
12

HyperMixer

PyTorch reimplementation of the paper "HyperMixer: An MLP-based Green AI Alternative to Transformers" [arXiv 2022].
Python
13
star
13

Multi-StyleGAN

Official and maintained implementation of the paper "Multi-StyleGAN: Towards Image-Based Simulation of Time-Lapse Live-Cell Microscopy" [MICCAI 2021].
Python
10
star
14

DeepFoveaPP_for_Video_Reconstruction_and_Super_Resolution

DeepFovea++: Reconstruction and Super-Resolution for Natural Foveated Rendered Videos (PyTorch).
Python
10
star
15

FNet2D

FNet 2D: Scaling Fourier Transform Token Mixing To Vision
Python
7
star
16

Differentiable_JPEG

This repo reimplements the differentiable JPEG proposed in "JPEG-resistant Adversarial Images".
Python
6
star
17

Yeast-in-Microstructures-Dataset

Official and maintained implementation of the dataset paper "An Instance Segmentation Dataset of Yeast Cells in Microstructures" [EMBC 2023].
Python
6
star
18

Pade-Activation-Unit

PyTorch reimplementation of the paper "Padé Activation Units: End-to-end Learning of Flexible Activation Functions in Deep Networks" [ICLR 2020].
Python
5
star
19

Scaling_Vision_Transformers_22B_Param

Reimplementation of the paper "Scaling Vision Transformers to 22 Billion Parameters" by Dehghani et al. [arXiv, 2023]
3
star
20

DL4NLP_Cheatsheet_TUD

Cheatsheet for the lecture Deep Learning for Natural Language Processing at TU Darmstadt
3
star
21

3D_Baggage_Segmentation

This repo implements a 3D segmentation task for an airport baggage dataset.
Python
2
star
22

SmeLU-Triton

Triton reimplementation of the Smooth ReLU activation function proposed in the paper "Real World Large Scale Recommendation Systems Reproducibility and Smooth Activations" [arXiv 2022].
Python
2
star
23

simple_logistic_regression

Simple logistic regression model with autograd for Statistics II
Python
1
star
24

CV2_Cheatsheet_TUD

Cheatsheet for the lecture Computer Vision at TU Darmstadt
1
star
25

Latex_Auto_Compile

Short python script for auto compiling Latex files.
Python
1
star
26

Elektronik_Formelsammlung_TUD

Formelsammlung für das Modul Elektronik (TU Darmstadt).
1
star
27

Neural_Network_cpp

Neural network from scratch in C++.
C++
1
star