• Stars
    star
    130
  • Rank 275,985 (Top 6 %)
  • Language
    Python
  • License
    MIT License
  • Created over 1 year ago
  • Updated 6 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

The official repo for [TGRS'22] "An Empirical Study of Remote Sensing Pretraining"

An Empirical Study of Remote Sensing Pretraining

Di Wang, Jing Zhang, Bo Du, Gui-Song Xia and Dacheng Tao

Updates | Introduction | Usage | Results & Models | Talk | Statement |

Current applications

Scene Recognition: Please see Remote Sensing Pretraining for Scene Recognition;

Sementic Segmentation: Please see Remote Sensing Pretraining for Semantic Segmentation;

Object Detection: Please see Remote Sensing Pretraining for Object Detection;

Change Detection: Please see Remote Sensing Pretraining for Change Detection;

ViTAE: Please see ViTAE-Transformer;

Matting: Please see ViTAE-Transformer for matting;

Updates

018/10/2023

RSP won the highly cited paper!

027/05/2022

The early access is available!

020/05/2022

The paper has been accepted by IEEE TGRS.

011/04/2022

The baiduyun links of pretrained models are provided.

07/04/2022

The paper is post on arxiv!

06/04/2022

The pretrained models for ResNet-50, Swin-T and ViTAEv2-S are released. The code for pretraining and downstream tasks are also provided for reference.

Introduction

This repository contains codes, models and test results for the paper "An Empirical Study of Remote Sensing Pretraining".

The aerial images are usually obtained by a camera in a birdview perspective lying on the planes or satellites, perceiving a large scope of land uses and land covers, whose scene is usually difficult to be interpreted since the interference of the scene-irrelevant regions and the complicated spatial distribution of land objects. Although deep learning has largely reshaped remote sensing research for aerial image understanding and made a great success. However, most of existing deep models are initialized with ImageNet pretrained weights, where the natural images inevitably presents a large domain gap relative to the aerial images, probably limiting the finetuning performance on downstream aerial scene tasks. This issue motivates us to conduct an empirical study of remote sensing pretraining. To this end, we train different networks from scratch with the help of the largest remote sensing scene recognition dataset up to now-MillionAID, to obtain the remote sensing pretrained backbones, including both convolutional neural networks (CNN) and vision transformers such as Swin and ViTAE, which have shown promising performance on computer vision tasks. Then, we investigate the impact of ImageNet pretraining (IMP) and RSP on a series of downstream tasks including scene recognition, semantic segmentation, object detection, and change detection using the CNN and vision transformers backbones.

Fig. - (a) and (b) are the natural image and aerial image belonging to the "park" category. (c) and (d) are two aerial images from the "school" category. Despite the distinct view difference of (a) and (b), (b) contains the playground that is unusual in the park scenes but usually exists in the school scenes like (d). On the other hand, (c) and (d) show different colors as well as significantly different spatial distributions of land objects like playground and swimming pool.

Results and Models

MillionAID

Backbone Input size Acc@1 Acc@5 Param(M) Pretrained model
RSP-ResNet-50-E300 224 × 224 98.99 99.82 23.6 google & baidu
RSP-Swin-T-E300 224 × 224 98.59 99.88 27.6 google & baidu
RSP-ViTAEv2-S-E100 224 × 224 98.97 99.88 18.8 google & baidu

Usage

Please refer to Readme.md for installation, dataset preparation, training and inference.

Citation

If this repo is useful for your research, please consider citation

@ARTICLE{rsp,
  author={Wang, Di and Zhang, Jing and Du, Bo and Xia, Gui-Song and Tao, Dacheng},
  journal={IEEE Transactions on Geoscience and Remote Sensing}, 
  title={An Empirical Study of Remote Sensing Pretraining}, 
  year={2023},
  volume={61},
  number={},
  pages={1-20},
  doi={10.1109/TGRS.2022.3176603}
}

Talk

A video talk about this study (In Chinese)

Statement

This project is under MIT licence. For any other questions please contact di.wang at gmail.com or di_wang at whu.edu.cn.

Relevant Projects

[1] Advancing Plain Vision Transformer Towards Remote Sensing Foundation Model, IEEE TGRS, 2022 | Paper | Github
     Di Wang∗, Qiming Zhang∗, Yufei Xu∗, Jing Zhang, Bo Du, Dacheng Tao and Liangpei Zhang

More Repositories

1

ViTPose

The official repo for [NeurIPS'22] "ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation" and [TPAMI'23] "ViTPose++: Vision Transformer for Generic Body Pose Estimation"
Python
1,313
star
2

ViTDet

Unofficial implementation for [ECCV'22] "Exploring Plain Vision Transformer Backbones for Object Detection"
Python
524
star
3

ViTAE-Transformer-Remote-Sensing

A comprehensive list [SAMRS@NeurIPS'23, RVSA@TGRS'22, RSP@TGRS'22] of our research works related to remote sensing, including papers, codes, and citations. Note: The repo for [TGRS'22] "An Empirical Study of Remote Sensing Pretraining" has been moved to: https://github.com/ViTAE-Transformer/RSP
TeX
446
star
4

Remote-Sensing-RVSA

The official repo for [TGRS'22] "Advancing Plain Vision Transformer Towards Remote Sensing Foundation Model"
Python
403
star
5

SAMRS

The official repo for [NeurIPS'23] "SAMRS: Scaling-up Remote Sensing Segmentation Dataset with Segment Anything Model"
Python
263
star
6

ViTAE-Transformer

The official repo for [NeurIPS'21] "ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias" and [IJCV'22] "ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond"
Python
249
star
7

ViTAE-Transformer-Matting

A comprehensive list [AIM@IJCAI'21, P3M@MM'21, GFM@IJCV'22, RIM@CVPR'23, P3MNet@IJCV'23] of our research works related to image matting, including papers, codes, datasets, demos, and citations. Note: The repo for [IJCV'23] "Rethinking Portrait Matting with Privacy Preserving" has been moved to: https://github.com/ViTAE-Transformer/P3M-Net
TeX
229
star
8

QFormer

The official repo for [TPAMI'23] "Vision Transformer with Quadrangle Attention"
Python
158
star
9

ViTAE-VSA

The official repo for [ECCV'22] "VSA: Learning Varied-Size Window Attention in Vision Transformers"
Python
152
star
10

MTP

The official repo for [JSTARS'24] "MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining"
Python
140
star
11

P3M-Net

The official repo for [IJCV'23] "Rethinking Portrait Matting with Privacy Preserving"
Python
90
star
12

DeepSolo

[CVPR 2023] DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting
Python
68
star
13

ViTAE-Transformer-Scene-Text-Detection

The official repo for [IJCV'22] I3CL: Intra- and Inter-Instance Collaborative Learning for Arbitrary-shaped Scene Text Detection
Python
37
star
14

LeMeViT

The official repo for [IJCAI'24] "LeMeViT: Efficient Vision Transformer with Learnable Meta Tokens for Remote Sensing Image Interpretation"
Python
37
star
15

SimDistill

The official repo for [AAAI 2024] "SimDistill: Simulated Multi-modal Distillation for BEV 3D Object Detection""
Python
22
star
16

VOS-LLB

The official repo for [AAAI'23] "Learning to Learn Better for Video Object Segmentation"
Python
10
star
17

APTv2

The official repo for the extension of [NeurIPS'22] "APT-36K: A Large-scale Benchmark for Animal Pose Estimation and Tracking": https://github.com/pandorgan/APT-36K
Python
9
star
18

I3CL

The official repo for [IJCV'22] "I3CL: Intra- and Inter-Instance Collaborative Learning for Arbitrary-shaped Scene Text Detection"
Python
2
star