keras-vision-transformer
This repository contains the tensorflow.keras
implementation of the Swin Transformer (Liu et al., 2021) and its applications to benchmark datasets.
-
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S. and Guo, B., 2021. Swin transformer: Hierarchical vision transformer using shifted windows. arXiv preprint arXiv:2103.14030. https://arxiv.org/abs/2103.14030.
-
Hu, C., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q. and Wang, M., 2021. Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation. arXiv preprint arXiv:2105.05537.
Notebooks
Note: the Swin-UNET implementation is experimental
- MNIST image classification with Swin Transformers [link]
- Oxford IIIT Pet image Segmentation with Swin-UNET [link]
Dependencies
- TensorFlow 2.5.0, Keras 2.5.0, Numpy 1.19.5.
Overview
Swin Transformers are Transformer-based computer vision models that feature self-attention with shift-windows. Compared to other vision transformer variants, which compute embedded patches (tokens) globally, the Swin Transformer computes token subsets through non-overlapping windows that are alternatively shifted within Transformer blocks. This mechanism makes Swin Transformers more suitable for processing high-resolution images. Swin Transformers have shown effectiveness in image classification, object detection, and semantic segmentation problems.
Contact
Yingkai (Kyle) Sha <[email protected]> <[email protected]>
The work is benefited from: