• Stars
    star
    302
  • Rank 138,030 (Top 3 %)
  • Language
    Python
  • License
    MIT License
  • Created about 3 years ago
  • Updated 8 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

[NeurIPS 2021 Spotlight] & [IJCV 2024] SOFT: Softmax-free Transformer with Linear Complexity

Softmax-free Linear Transformers

image

Softmax-free Linear Transformers,
Jiachen Lu, Li Zhang, Junge Zhang, Xiatian Zhu, Hang Xu, Jianfeng Feng

What's new

  1. We propose a normalized softmax-free self-attention with stronger generalizability.
  2. SOFT is now avaliable on more vision tasks (object detection and semantic segmentation).

NEWS

  • [2022/07/07] Our journal extension Softmax-free Linear Transformer appears on arXiv.
  • [2022/07/05] SOFT is now available for downstream tasks! An efficient normalization is applied to SOFT. Please refer to SOFT-Norm

Requirments

  • timm==0.3.2

  • torch>=1.7.0 and torchvision that matches the PyTorch installation

  • cuda>=10.2

Compilation may be fail on cuda < 10.2.
We have compiled it successfully on cuda 10.2 and cuda 11.2.

Data preparation

Download and extract ImageNet train and val images from http://image-net.org/. The directory structure is the standard layout for the torchvision datasets.ImageFolder, and the training and validation data is expected to be in the train/ folder and val folder respectively:

/path/to/imagenet/
  train/
    class1/
      img1.jpeg
    class2/
      img2.jpeg
  val/
    class1/
      img3.jpeg
    class/2
      img4.jpeg

Installation

git clone https://github.com/fudan-zvg/SOFT.git
python -m pip install -e SOFT

Main results

ImageNet-1K Image Classification

Model Resolution Params FLOPs Top-1 % Config Pretrained Model
SOFT-Tiny 224 13M 1.9G 79.3 SOFT_Tiny.yaml, SOFT_Tiny_cuda.yaml SOFT_Tiny, SOFT_Tiny_cuda
SOFT-Small 224 24M 3.3G 82.2 SOFT_Small.yaml, SOFT_Small_cuda.yaml
SOFT-Medium 224 45M 7.2G 82.9 SOFT_Meidum.yaml, SOFT_Meidum_cuda.yaml
SOFT-Large 224 64M 11.0G 83.1 SOFT_Large.yaml, SOFT_Large_cuda.yaml
SOFT-Huge 224 87M 16.3G 83.3 SOFT_Huge.yaml, SOFT_Huge_cuda.yaml
SOFT-Tiny-Norm 224 13M 1.9G 79.4 SOFT_Tiny_norm.yaml SOFT_Tiny_norm
SOFT-Small-Norm 224 24M 3.3G 82.4 SOFT_Small_norm.yaml SOFT_Small_norm
SOFT-Medium-Norm 224 45M 7.2G 83.1 SOFT_Meidum_norm.yaml SOFT_Medium_norm
SOFT-Large-Norm 224 64M 11.0G 83.3 SOFT_Large_norm.yaml SOFT_Large_norm
SOFT-Huge-Norm 224 87M 16.3G 83.4 SOFT_Huge_norm.yaml

COCO Object Detection (2017 val)

Backbone Method lr schd box mAP mask mAP Params
SOFT-Tiny-Norm RetinaNet 1x 40.0 - 23M
SOFT-Tiny-Norm Mask R-CNN 1x 41.2 38.2 33M
SOFT-Small-Norm RetinaNet 1x 42.8 - 34M
SOFT-Small-Norm Mask R-CNN 1x 43.8 40.1 44M
SOFT-Medium-Norm RetinaNet 1x 44.3 - 55M
SOFT-Medium-Norm Mask R-CNN 1x 46.6 42.0 65M
SOFT-Large-Norm RetinaNet 1x 45.3 - 74M
SOFT-Large-Norm Mask R-CNN 1x 47.0 42.2 84M

ADE20K Semantic Segmentation (val)

Backbone Method Crop size lr schd mIoU Params
SOFT-Small-Norm UperNet 512x512 1x 46.2 54M
SOFT-Medium-Norm UperNet 512x512 1x 48.0 76M

Get Started

Train

We have two implementations of Gaussian Kernel: PyTorch version and the exact form of Gaussian function implemented by cuda. The config file containing cuda is the cuda implementation. Both implementations yield same performance. Please install SOFT before running the cuda version.

./dist_train.sh ${GPU_NUM} --data ${DATA_PATH} --config ${CONFIG_FILE}
# For example, train SOFT-Tiny on Imagenet training dataset with 8 GPUs
./dist_train.sh 8 --data ${DATA_PATH} --config config/SOFT_Tiny.yaml

Test

./dist_train.sh ${GPU_NUM} --data ${DATA_PATH} --config ${CONFIG_FILE} --eval_checkpoint ${CHECKPOINT_FILE} --eval

# For example, test SOFT-Tiny on Imagenet validation dataset with 8 GPUs

./dist_train.sh 8 --data ${DATA_PATH} --config config/SOFT_Tiny.yaml --eval_checkpoint ${CHECKPOINT_FILE} --eval

Reference

@inproceedings{SOFT,
    title={SOFT: Softmax-free Transformer with Linear Complexity}, 
    author={Lu, Jiachen and Yao, Jinghan and Zhang, Junge and Zhu, Xiatian and Xu, Hang and Gao, Weiguo and Xu, Chunjing and Xiang, Tao and Zhang, Li},
    booktitle={NeurIPS},
    year={2021}
}

License

MIT

Acknowledgement

Thanks to previous open-sourced repo:
Detectron2
T2T-ViT
PVT
Nystromformer
pytorch-image-models

More Repositories

1

Semantic-Segment-Anything

Automated dense category annotation engine that serves as the initial semantic labeling for the Segment Anything dataset (SA-1B).
Python
2,073
star
2

SETR

[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers
Python
1,033
star
3

4d-gaussian-splatting

[ICLR 2024] Real-time Photorealistic Dynamic Scene Representation and Rendering with 4D Gaussian Splatting
Python
538
star
4

SeaFormer

[ICLR 2023] SeaFormer: Squeeze-enhanced Axial Transformer for Mobile Semantic Segmentation
Python
285
star
5

PVG

Periodic Vibration Gaussian: Dynamic Urban Scene Reconstruction and Real-time Rendering
Python
257
star
6

DeepInteraction

[NeurIPS 2022] DeepInteraction: 3D Object Detection via Modality Interaction
Python
201
star
7

GSS

[CVPR 2023] Official repository of Generative Semantic Segmentation
Python
196
star
8

S-NeRF

[ICLR 2023] S-NeRF: Neural Radiance Fields for Street Views
Python
165
star
9

PolarFormer

[AAAI 2023] PolarFormer: Multi-camera 3D Object Detection with Polar Transformers
Python
161
star
10

tet-splatting

[NeurIPS 2024] Tetrahedron Splatting for 3D Generation
107
star
11

Ego3RT

[ECCV 2022] Learning Ego 3D Representation as Ray Tracing
Python
105
star
12

WoVoGen

WoVoGen: World Volume-aware Diffusion for Controllable Multi-camera Driving Scene Generation
Python
78
star
13

Efficient4D

Python
74
star
14

PGC-3D

[ICLR 2024] Enhancing High-Resolution 3D Generation through Pixel-wise Gradient Clipping
Python
73
star
15

meta-prompts

Python
67
star
16

Reason2Drive

Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving
64
star
17

RoadNet

[ICCV2023 Oral] RoadNetworkTRansformer & [AAAI 2024] LaneGraph2Seq
Python
63
star
18

NeRF-LiDAR

[AAAI 2024] NeRF-LiDAR: Generating Realistic LiDAR Point Clouds with Neural Radiance Fields
Python
62
star
19

PDS

[ECCV 2022] Accelerating Score-based Generative Models with Preconditioned Diffusion Sampling
Python
54
star
20

DGMN2

[TPAMI 2022 & CVPR 2020 Oral] Dynamic Graph Message Passing Networks
Python
29
star
21

diffusion-square

Python
29
star
22

DDMP

[CVPR 2021] Depth-conditioned Dynamic Message Propagation for Monocular 3D Object Detection
Python
24
star
23

TDAS

18
star
24

S-Agents

Official repository of S-Agents: Self-organizing Agents in Open-ended Environment
16
star
25

Rodyn-SLAM

15
star
26

PARTNER

[ICCV 2023] PARTNER: Level up the Polar Representation for LiDAR 3D Object Detection
Python
11
star
27

fudan-zvg.github.io

JavaScript
4
star
28

Brain3D

2
star
29

DGMN2_MindSpore_Ascend

Python
1
star