• Stars
    star
    512
  • Rank 86,323 (Top 2 %)
  • Language
    Python
  • License
    Other
  • Created over 3 years ago
  • Updated over 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

This is an official implementation for "Contextual Transformer Networks for Visual Recognition".

Introduction

This repository is the official implementation of Contextual Transformer Networks for Visual Recognition.

CoT is a unified self-attention building block, and acts as an alternative to standard convolutions in ConvNet. As a result, it is feasible to replace convolutions with their CoT counterparts for strengthening vision backbones with contextualized self-attention.

2021/3/25-2021/6/5: CVPR 2021 Open World Image Classification Challenge

Rank 1 in Open World Image Classification Challenge @ CVPR 2021. (Team name: VARMS)

Usage

The code is mainly based on timm.

Requirement:

  • PyTorch 1.8.0+
  • Python3.7
  • CUDA 10.1+
  • CuPy.

Clone the repository:

git clone https://github.com/JDAI-CV/CoTNet.git

Train

First, download the ImageNet dataset. To train CoTNet-50 on ImageNet on a single node with 8 gpus for 350 epochs run:

python -m torch.distributed.launch --nproc_per_node=8 train.py --folder ./experiments/cot_experiments/CoTNet-50-350epoch

The training scripts for CoTNet (e.g., CoTNet-50) can be found in the cot_experiments folder.

Inference Time vs. Accuracy

CoTNet models consistently obtain better top-1 accuracy with less inference time than other vision backbones across both default and advanced training setups. In a word, CoTNet models seek better inference time-accuracy trade-offs than existing vision backbones.

Results on ImageNet

name resolution #params FLOPs Top-1 Acc. Top-5 Acc. model
CoTNet-50 224 22.2M 3.3 81.3 95.6 GoogleDrive / Baidu
CoTNeXt-50 224 30.1M 4.3 82.1 95.9 GoogleDrive / Baidu
SE-CoTNetD-50 224 23.1M 4.1 81.6 95.8 GoogleDrive / Baidu
CoTNet-101 224 38.3M 6.1 82.8 96.2 GoogleDrive / Baidu
CoTNeXt-101 224 53.4M 8.2 83.2 96.4 GoogleDrive / Baidu
SE-CoTNetD-101 224 40.9M 8.5 83.2 96.5 GoogleDrive / Baidu
SE-CoTNetD-152 224 55.8M 17.0 84.0 97.0 GoogleDrive / Baidu
SE-CoTNetD-152 320 55.8M 26.5 84.6 97.1 GoogleDrive / Baidu

Access code for Baidu is cotn

CoTNet on downstream tasks

For Object Detection and Instance Segmentation, please see CoTNet for Object Detection and Instance Segmentation.

Citing Contextual Transformer Networks

@article{cotnet,
  title={Contextual Transformer Networks for Visual Recognition},
  author={Li, Yehao and Yao, Ting and Pan, Yingwei and Mei, Tao},
  journal={arXiv preprint arXiv:2107.12292},
  year={2021}
}

Acknowledgements

Thanks the contribution of timm and awesome PyTorch team.

More Repositories

1

fast-reid

SOTA Re-identification Methods and Toolbox
Python
3,377
star
2

FaceX-Zoo

A PyTorch Toolbox for Face Recognition
Python
1,863
star
3

dabnn

dabnn is an accelerated binary neural networks inference framework for mobile platform
C++
767
star
4

DCL

Destruction and Construction Learning for Fine-grained Image Recognition
Python
585
star
5

centerX

This repo is implemented based on detectron2 and centernet
Python
554
star
6

VeRidataset

This is the project page for veri dataset which is a large scale image dataset for vehicle re-identification in urban traffic surveillance.
MATLAB
397
star
7

DNNLibrary

Daquexian's NNAPI Library. ONNX + Android NNAPI
C++
346
star
8

image-captioning

Implementation of 'X-Linear Attention Networks for Image Captioning' [CVPR 2020]
Python
269
star
9

lapa-dataset

A large-scale dataset for face parsing (AAAI2020)
258
star
10

Down-to-the-Last-Detail-Virtual-Try-on-with-Detail-Carving

Virtural try-on under arbitrary poses
Python
217
star
11

Partial-Person-ReID

Python
168
star
12

FADA

(ECCV 2020) Classes Matter: A Fine-grained Adversarial Approach to Cross-domain Semantic Segmentation
Python
140
star
13

DSD-SATN

ICCV19: Official code of Human Mesh Recovery from Monocular Images via a Skeleton-disentangled Representation
Python
133
star
14

LIO

Look-into-Object: Self-supervised Structure Modeling for Object Recognition (CVPR 2020)
Jupyter Notebook
113
star
15

PGPT

Implementation of ‘Pose-Guided Tracking-by-Detection: Robust Multi-Person Pose Tracking’ [TMM 2020]
Python
47
star
16

CM-NAS

CM-NAS: Cross-Modality Neural Architecture Search for Visible-Infrared Person Re-Identification (ICCV2021)
Python
46
star
17

CoTNet-ObjectDetection-InstanceSegmentation

Python
33
star
18

dabnn-example

Android demo for dabnn
Java
19
star
19

atlasWrapper

C++
8
star