• Stars
    star
    446
  • Rank 97,276 (Top 2 %)
  • Language
    TeX
  • Created over 2 years ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A comprehensive list [SAMRS@NeurIPS'23, RVSA@TGRS'22, RSP@TGRS'22] of our research works related to remote sensing, including papers, codes, and citations. Note: The repo for [TGRS'22] "An Empirical Study of Remote Sensing Pretraining" has been moved to: https://github.com/ViTAE-Transformer/RSP

The repo of the paper "An Empirical Study of Remote Sensing Pretraining" has been moved to RSP

Remote Sensing

This repo contains a comprehensive list of our research works related to Remote Sensing. For any related questions, please contact Di Wang at [email protected] or [email protected].

Overview

1. An Empirical Study of Remote Sensing Pretraining [TGRS-2022]

2. Advancing Plain Vision Transformer Towards Remote Sensing Foundation Model [TGRS-2022]

3. Scaling-up Remote Sensing Segmentation Dataset with Segment Anything Model [arXiv-2023]

Projects

📘 An Empirical Study of Remote Sensing Pretraining [TGRS-2022]

Di Wang, Jing Zhang, Bo Du, Gui-Song Xia and Dacheng Tao

Paper | Github Code | BibTex

We train different networks from scratch with the help of the largest remote sensing scene recognition dataset up to now-MillionAID, to obtain the remote sensing pretrained backbones, including both convolutional neural networks (CNN) and vision transformers such as Swin and ViTAE, which have shown promising performance on computer vision tasks. Then, we investigate the impact of ImageNet pretraining (IMP) and remote sensing pretraining (RSP) on a series of downstream tasks including scene recognition, semantic segmentation, object detection, and change detection using the CNN and vision transformers backbones.


📘 Advancing Plain Vision Transformer Towards Remote Sensing Foundation Model [TGRS-2022]

Di Wang, Qiming Zhang, Yufei Xu, Jing Zhang, Bo Du, Dacheng Tao and Liangpei Zhang.

Paper | Github Code | BibTex

We resort to plain vision transformers with about 100M and make the first attempt to propose large vision models customized for RS tasks and propose a new rotated varied-size window attention (RVSA) to substitute the original full attention to handle the large image size and objects of various orientations in RS images. The RVSA could significantly reduce the computational cost and memory footprint while learn better object representation by extracting rich context from the generated diverse windows.


📘 Scaling-up Remote Sensing Segmentation Dataset with Segment Anything Model [arXiv-2023]

Di Wang, Jing Zhang, Bo Du, Dacheng Tao and Liangpei Zhang.

Paper | Github Code | BibTex

In this study, we leverage SAM and existing RS object detection datasets to develop an efficient pipeline for generating a large-scale RS segmentation dataset, dubbed SAMRS. SAMRS surpasses existing high-resolution RS segmentation datasets in size by several orders of magnitude, and provides object category, location, and instance information that can be used for semantic segmentation, instance segmentation, and object detection, either individually or in combination. We also provide a comprehensive analysis of SAMRS from various aspects. We hope it could facilitate research in RS segmentation, particularly in large model pre-training.

More Repositories

1

ViTPose

The official repo for [NeurIPS'22] "ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation" and [TPAMI'23] "ViTPose++: Vision Transformer for Generic Body Pose Estimation"
Python
1,313
star
2

ViTDet

Unofficial implementation for [ECCV'22] "Exploring Plain Vision Transformer Backbones for Object Detection"
Python
524
star
3

Remote-Sensing-RVSA

The official repo for [TGRS'22] "Advancing Plain Vision Transformer Towards Remote Sensing Foundation Model"
Python
403
star
4

SAMRS

The official repo for [NeurIPS'23] "SAMRS: Scaling-up Remote Sensing Segmentation Dataset with Segment Anything Model"
Python
263
star
5

ViTAE-Transformer

The official repo for [NeurIPS'21] "ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias" and [IJCV'22] "ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond"
Python
249
star
6

ViTAE-Transformer-Matting

A comprehensive list [AIM@IJCAI'21, P3M@MM'21, GFM@IJCV'22, RIM@CVPR'23, P3MNet@IJCV'23] of our research works related to image matting, including papers, codes, datasets, demos, and citations. Note: The repo for [IJCV'23] "Rethinking Portrait Matting with Privacy Preserving" has been moved to: https://github.com/ViTAE-Transformer/P3M-Net
TeX
229
star
7

QFormer

The official repo for [TPAMI'23] "Vision Transformer with Quadrangle Attention"
Python
158
star
8

ViTAE-VSA

The official repo for [ECCV'22] "VSA: Learning Varied-Size Window Attention in Vision Transformers"
Python
152
star
9

MTP

The official repo for [JSTARS'24] "MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining"
Python
140
star
10

RSP

The official repo for [TGRS'22] "An Empirical Study of Remote Sensing Pretraining"
Python
130
star
11

P3M-Net

The official repo for [IJCV'23] "Rethinking Portrait Matting with Privacy Preserving"
Python
90
star
12

DeepSolo

[CVPR 2023] DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting
Python
68
star
13

ViTAE-Transformer-Scene-Text-Detection

The official repo for [IJCV'22] I3CL: Intra- and Inter-Instance Collaborative Learning for Arbitrary-shaped Scene Text Detection
Python
37
star
14

LeMeViT

The official repo for [IJCAI'24] "LeMeViT: Efficient Vision Transformer with Learnable Meta Tokens for Remote Sensing Image Interpretation"
Python
37
star
15

SimDistill

The official repo for [AAAI 2024] "SimDistill: Simulated Multi-modal Distillation for BEV 3D Object Detection""
Python
22
star
16

VOS-LLB

The official repo for [AAAI'23] "Learning to Learn Better for Video Object Segmentation"
Python
10
star
17

APTv2

The official repo for the extension of [NeurIPS'22] "APT-36K: A Large-scale Benchmark for Animal Pose Estimation and Tracking": https://github.com/pandorgan/APT-36K
Python
9
star
18

I3CL

The official repo for [IJCV'22] "I3CL: Intra- and Inter-Instance Collaborative Learning for Arbitrary-shaped Scene Text Detection"
Python
2
star