• Stars
    star
    639
  • Rank 67,728 (Top 2 %)
  • Language
    Python
  • License
    MIT License
  • Created about 5 years ago
  • Updated over 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Implementing Attention Augmented Convolutional Networks using Pytorch

Implementing Attention Augmented Convolutional Networks using Pytorch

  • In the paper, it is implemented as Tensorflow. So I implemented it with Pytorch.

Update (2019.05.11)

  • Fixed an issue where key_rel_w and key_rel_h were not found as learning parameters when using relative=True mode.

  • In "relative = True" mode, you can see that "key_rel_w" and "key_rel_h" are learning parameters. In "relative = False" mode, you do not have to worry about the "shape" parameter.

  • Example, relative=True, stride=1, shape=32

import torch

from attention_augmented_conv import AugmentedConv

use_cuda = torch.cuda.is_available()
device = torch.deivce('cuda' if use_cuda else 'cpu')

tmp = torch.randn((16, 3, 32, 32)).to(device)
augmented_conv1 = AugmentedConv(in_channels=3, out_channels=20, kernel_size=3, dk=40, dv=4, Nh=4, relative=True, stride=1, shape=32).to(device)
conv_out1 = augmented_conv1(tmp)
print(conv_out1.shape) # (16, 20, 32, 32)

for name, param in augmented_conv1.named_parameters():
    print('parameter name: ', name)
  • As a result of parameter name, we can see "key_rel_w" and "key_rel_h".

  • Example, relative=True, stride=2, shape=16

import torch

from attention_augmented_conv import AugmentedConv

use_cuda = torch.cuda.is_available()
device = torch.deivce('cuda' if use_cuda else 'cpu')

tmp = torch.randn((16, 3, 32, 32)).to(device)
augmented_conv1 = AugmentedConv(in_channels=3, out_channels=20, kernel_size=3, dk=40, dv=4, Nh=4, relative=True, stride=2, shape=16).to(device)
conv_out1 = augmented_conv1(tmp)
print(conv_out1.shape) # (16, 20, 16, 16)
  • This is important, when using the "relative = True" mode, the stride * shape should be the same as the input shape. For example, if input is (16, 3, 32, 32) and stride = 2, the shape should be 16.

Update (2019.05.02)

  • I have added padding to the "AugmentedConv" part.

  • You can use it as you would with nn.conv2d.

  • I will attach the example below as well.

  • Example, relative=False, stride=1

import torch

from attention_augmented_conv import AugmentedConv

use_cuda = torch.cuda.is_available()
device = torch.deivce('cuda' if use_cuda else 'cpu')

temp_input = torch.randn((16, 3, 32, 32)).to(device)
augmented_conv = AugmentedConv(in_channels=3, out_channels=20, kernel_size=3, dk=40, dv=4, Nh=1, relative=False, stride=1).to(device)
conv_out = augmented_conv(tmp)
print(conv_out.shape) # (16, 20, 32, 32), (batch_size, out_channels, height, width)
  • Example, relative=False, stride=2
import torch

from attention_augmented_conv import AugmentedConv

use_cuda = torch.cuda.is_available()
device = torch.deivce('cuda' if use_cuda else 'cpu')

temp_input = torch.randn((16, 3, 32, 32)).to(device)
augmented_conv = AugmentedConv(in_channels=3, out_channels=20, kernel_size=3, dk=40, dv=4, Nh=1, relative=False, stride=2).to(device)
conv_out = augmented_conv(tmp)
print(conv_out.shape) # (16, 20, 16, 16), (batch_size, out_channels, height, width)
  • I added an assert for parameters (dk, dv, Nh).
assert self.Nh != 0, "integer division or modulo by zero, Nh >= 1"
assert self.dk % self.Nh == 0, "dk should be divided by Nh. (example: out_channels: 20, dk: 40, Nh: 4)"
assert self.dv % self.Nh == 0, "dv should be divided by Nh. (example: out_channels: 20, dv: 4, Nh: 4)"
assert stride in [1, 2], str(stride) + " Up to 2 strides are allowed."

I posted two versions of the "Attention-Augmented Conv"

  • Paper version is here
  • AA-Wide-ResNet version is here

Reference

Paper

Wide-ResNet

Method

image

Input Parameters

  • In the paper, CodeCogsEqn (2) and CodeCogsEqn (3) are obtained using the following equations.

    CodeCogsEqn, CodeCogsEqn (1)

  • Experiments of parameters in paper

    ์บก์ฒ˜

Experiments

  • In the paper, they said that We augment the Wide-ResNet-28-10 by augmenting the first convolution of all residual blocks with relative attention using Nh=8 heads and ฮบ=2, ฯ…=0.2 and a minimum of 20 dimensions per head for the keys.
Datasets Model Accuracy Epoch Training Time
CIFAR-10 Wide-ResNet 28x10(WORK IN PROCESS)
CIFAR-100 Wide-ResNet 28x10(WORK IN PROCESS)
CIFAR-100 Just 3-Conv layers(channels: 64, 128, 192) 61.6% 100 22m
CIFAR-100 Just 3-Attention-Augmented Conv layers(channels: 64, 128, 192) 59.82% 35 2h 23m
  • I don't have enough GPUs. So, I have many difficulties in training.
  • I just want to see feasibility of this method(Attention-Augemnted Conv layer), I'll try about ResNet.
  • The above results show that there are many time differences. I will think about this part a bit more.
    • I have seen the issue that the torch.einsum function is slow. Link
    • When I execute the example code in the link, the result was:

      ์บก์ฒ˜
    • using cuda

      ์บก์ฒ˜

Time complexity

  • I compared the time complexity of "relative = True" and "relative = False".
  • I'll compare the performance of the two different values(relative=True, relative=False).
  • In addition, I will consider ways to reduce time complexity in "relative = True".
    time_complexity

Requirements

  • tqdm==4.31.1
  • torch==1.0.1
  • torchvision==0.2.2

More Repositories

1

Stand-Alone-Self-Attention

Implementing Stand-Alone Self-Attention in Vision Models using Pytorch
Python
454
star
2

MobileNetV3-Pytorch

Implementing Searching for MobileNetV3 paper using Pytorch
Python
291
star
3

BottleneckTransformers

Bottleneck Transformers for Visual Recognition
Python
265
star
4

LambdaNetworks

Implementing Lambda Networks using Pytorch
Python
138
star
5

Billion-scale-semi-supervised-learning

Implementing Billion-scale semi-supervised learning for image classification using Pytorch
Python
89
star
6

RandWireNN

Implementing Randomly Wired Neural Networks for Image Recognition, Using CIFAR-10 dataset, CIFAR-100 dataset
Jupyter Notebook
88
star
7

Synthesizer-Rethinking-Self-Attention-Transformer-Models

Implementing SYNTHESIZER: Rethinking Self-Attention in Transformer Models using Pytorch
Python
70
star
8

CLIP

CLIP: Connecting Text and Image (Learning Transferable Visual Models From Natural Language Supervision)
Python
69
star
9

Mixed-Depthwise-Convolutional-Kernels

Implementing MixNet: Mixed Depthwise Convolutional Kernels using Pytorch
Python
60
star
10

SimSiam

Exploring Simple Siamese Representation Learning
Python
57
star
11

Action-Localization

Action-Localization, Atomic Visual Actions (AVA) Dataset
Python
23
star
12

Bag-of-MLP

Bag of MLP
Python
19
star
13

PSPNet

Implementing Pyramid Scene Parsing Network (PSPNet) paper using Pytorch
Python
14
star
14

DiffusionModel

Re-implementating Diffusion model using Pytorch
Python
7
star
15

AssembleNet

Implementing AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures Explain using Pytorch
Python
7
star
16

OmniNet

OmniNet: Omnidirectional Representations from Transformers
Python
6
star
17

Backpropagation-CNN-basic

Python
6
star
18

Graph-Convolutional-Network

Python
5
star
19

Phasic-Policy-Gradient

Phasic-Policy-Gradient
Python
5
star
20

bag-of-rl

Bag of Reinforcement Learning Algorithm
Python
5
star
21

minimal-BERT

Bidirectional Encoder Representations from Transformers
Python
4
star
22

Vision-Language

Vision-Language, Solve GQA(Visual Reasoning in the Real World) dataset.
Python
3
star
23

minimal-cyclegan

Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks
Python
3
star
24

Transformer

Implementing Attention Is All You Need paper. Transformer Model
Python
2
star
25

minimal-stylegan

Python
2
star
26

SlowFast

SlowFast Network
Python
1
star
27

minimal-segmentation

minimal-segmentation
Python
1
star