• Stars
    star
    456
  • Rank 95,985 (Top 2 %)
  • Language
    Python
  • License
    MIT License
  • Created over 5 years ago
  • Updated almost 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Implementing Stand-Alone Self-Attention in Vision Models using Pytorch

Implementing Stand-Alone Self-Attention in Vision Models using Pytorch (13 Jun 2019)

  • Stand-Alone Self-Attention in Vision Models paper
  • Author:
    • Prajit Ramachandran (Google Research, Brain Team)
    • Niki Parmar (Google Research, Brain Team)
    • Ashish Vaswani (Google Research, Brain Team)
    • Irwan Bello (Google Research, Brain Team)
    • Anselm Levskaya (Google Research, Brain Team)
    • Jonathon Shlens (Google Research, Brain Team)
  • Awesome :)

Method

  • Attention Layer

    • Equation 1:

      CodeCogsEqn (2)

  • Relative Position Embedding

    • The row and column offsets are associated with an embedding CodeCogsEqn (3) and CodeCogsEqn (4) respectively each with dimension CodeCogsEqn (5). The row and column offset embeddings are concatenated to form CodeCogsEqn (6). This spatial-relative attention is now defined as below equation.

    • Equation 2:

      CodeCogsEqn (7)

    • I refer to the following paper when implementing this part.

  1. Replacing Spatial Convolutions
    - A 2 Γ— 2 average pooling with stride 2 operation follows the attention layer whenever spatial downsampling is required. - This work applies the transform on the ResNet family of architectures. The proposed transform swaps the 3 Γ— 3 spatial convolution with a self-attention layer as defined in Equation 3.
  2. Replacing the Convolutional Stem
    - The initial layers of a CNN, sometimes referred to as the stem, play a critical role in learning local features such as edges, which later layers use to identify global objects. - The stem performs self-attention within each 4 Γ— 4 spatial block of the original image, followed by batch normalization and a 4 Γ— 4 max pool operation.

Experiments

Setup

  • Spatial extent: 7
  • Attention heads: 8
  • Layers:
    • ResNet 26: [1, 2, 4, 1]
    • ResNet 38: [2, 3, 5, 2]
    • ResNet 50: [3, 4, 6, 3]
Datasets Model Accuracy Parameters (My Model, Paper Model)
CIFAR-10 ResNet 26 90.94% 8.30M, -
CIFAR-10 Naive ResNet 26 94.29% 8.74M
CIFAR-10 ResNet 26 + stem 90.22% 8.30M, -
CIFAR-10 ResNet 38 (WORK IN PROCESS) 89.46% 12.1M, -
CIFAR-10 Naive ResNet 38 94.93% 15.0M
CIFAR-10 ResNet 50 (WORK IN PROCESS) 16.0M, -
IMAGENET ResNet 26 (WORK IN PROCESS) 10.3M, 10.3M
IMAGENET ResNet 38 (WORK IN PROCESS) 14.1M, 14.1M
IMAGENET ResNet 50 (WORK IN PROCESS) 18.0M, 18.0M

Usage

Requirements

  • torch==1.0.1

Todo

  • Experiments
  • IMAGENET
  • Review relative position embedding, attention stem
  • Code Refactoring

Reference

More Repositories

1

Attention-Augmented-Conv2d

Implementing Attention Augmented Convolutional Networks using Pytorch
Python
643
star
2

MobileNetV3-Pytorch

Implementing Searching for MobileNetV3 paper using Pytorch
Python
290
star
3

BottleneckTransformers

Bottleneck Transformers for Visual Recognition
Python
272
star
4

LambdaNetworks

Implementing Lambda Networks using Pytorch
Python
138
star
5

Billion-scale-semi-supervised-learning

Implementing Billion-scale semi-supervised learning for image classification using Pytorch
Python
89
star
6

RandWireNN

Implementing Randomly Wired Neural Networks for Image Recognition, Using CIFAR-10 dataset, CIFAR-100 dataset
Jupyter Notebook
89
star
7

CLIP

CLIP: Connecting Text and Image (Learning Transferable Visual Models From Natural Language Supervision)
Python
74
star
8

Synthesizer-Rethinking-Self-Attention-Transformer-Models

Implementing SYNTHESIZER: Rethinking Self-Attention in Transformer Models using Pytorch
Python
70
star
9

Mixed-Depthwise-Convolutional-Kernels

Implementing MixNet: Mixed Depthwise Convolutional Kernels using Pytorch
Python
60
star
10

SimSiam

Exploring Simple Siamese Representation Learning
Python
58
star
11

Action-Localization

Action-Localization, Atomic Visual Actions (AVA) Dataset
Python
25
star
12

Bag-of-MLP

Bag of MLP
Python
20
star
13

PSPNet

Implementing Pyramid Scene Parsing Network (PSPNet) paper using Pytorch
Python
14
star
14

DiffusionModel

Re-implementating Diffusion model using Pytorch
Python
7
star
15

AssembleNet

Implementing AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures Explain using Pytorch
Python
7
star
16

Backpropagation-CNN-basic

Python
6
star
17

OmniNet

OmniNet: Omnidirectional Representations from Transformers
Python
6
star
18

Graph-Convolutional-Network

Python
5
star
19

Phasic-Policy-Gradient

Phasic-Policy-Gradient
Python
5
star
20

bag-of-rl

Bag of Reinforcement Learning Algorithm
Python
5
star
21

minimal-BERT

Bidirectional Encoder Representations from Transformers
Python
4
star
22

Vision-Language

Vision-Language, Solve GQA(Visual Reasoning in the Real World) dataset.
Python
3
star
23

minimal-cyclegan

Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks
Python
3
star
24

Transformer

Implementing Attention Is All You Need paper. Transformer Model
Python
2
star
25

minimal-stylegan

Python
2
star
26

SlowFast

SlowFast Network
Python
1
star
27

minimal-segmentation

minimal-segmentation
Python
1
star