• Stars
    star
    271
  • Rank 150,828 (Top 3 %)
  • Language
    Python
  • License
    MIT License
  • Created about 3 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

PyTorch implementation of popular datasets and models in remote sensing

PyTorch Remote Sensing (torchrs)

github actions pypi pypi downloads license

UPDATE: torchrs is currently being merged into TorchGeo. Please go star & follow our progress.

PyTorch implementation of popular datasets and models in remote sensing tasks (Change Detection, Image Super Resolution, Land Cover Classification/Segmentation, Image Captioning, Audio-visual recognition etc.) for various Optical (Sentinel-2, Landsat, etc.) and Synthetic Aperture Radar (SAR) (Sentinel-1) sensors.

Installation

# pypi
pip install torch-rs

# pypi with training extras
pip install 'torch-rs[train]'

# latest
pip install git+https://github.com/isaaccorley/torchrs

# latest with extras
pip install 'git+https://github.com/isaaccorley/torchrs.git#egg=torch-rs[train]'

Table of Contents

Datasets

PROBA-V Super Resolution

The PROBA-V Super Resolution Challenge dataset is a Multi-image Super Resolution (MISR) dataset of images taken by the ESA PROBA-Vegetation satellite. The dataset contains sets of unregistered 300m low resolution (LR) images which can be used to generate single 100m high resolution (HR) images for both Near Infrared (NIR) and Red bands. In addition, Quality Masks (QM) for each LR image and Status Masks (SM) for each HR image are available. The PROBA-V contains sensors which take imagery at 100m and 300m spatial resolutions with 5 and 1 day revisit rates, respectively. Generating high resolution imagery estimates would effectively increase the frequency at which HR imagery is available for vegetation monitoring.

The dataset can be downloaded (0.83GB) using scripts/download_probav.sh and instantiated below:

from torchrs.transforms import Compose, ToTensor
from torchrs.datasets import PROBAV

transform = Compose([ToTensor()])

dataset = PROBAV(
    root="path/to/dataset/",
    split="train",  # or 'test'
    band="RED",     # or 'NIR'
    lr_transform=transform,
    hr_transform=transform
)

x = dataset[0]
"""
x: dict(
    lr: low res images  (t, 1, 128, 128)
    qm: quality masks   (t, 1, 128, 128)
    hr: high res image  (1, 384, 384)
    sm: status mask     (1, 384, 384)
)
t varies by set of images (minimum of 9)
"""

ETCI 2021 Flood Detection

The ETCI 2021 Dataset is a flood detection segmentation dataset of SAR images taken by the ESA Sentinel-1 satellite. The dataset contains pairs of VV and VH polarization images processed by the Hybrid Pluggable Processing Pipeline (hyp3) along with corresponding binary flood and water body ground truth masks.

The dataset can be downloaded (5.6GB) using scripts/download_etci2021.sh and instantiated below:

from torchrs.transforms import Compose, ToTensor
from torchrs.datasets import ETCI2021

transform = Compose([ToTensor()])

dataset = ETCI2021(
    root="path/to/dataset/",
    split="train",  # or 'val', 'test'
    transform=transform
)

x = dataset[0]
"""
x: dict(
    vv:         (3, 256, 256)
    vh:         (3, 256, 256)
    flood_mask: (1, 256, 256)
    water_mask: (1, 256, 256)
)
"""

HKH Glacier Mapping

The Hindu Kush Himalayas (HKH) Glacier Mapping dataset is a semantic segmentation dataset of 7,095 512x512 multispectral images taken by the USGS LandSat 7 satellite. The dataset contains imagery from 2002-2008 of the HKH region (spanning 8 countries) along with separate masks of clean-iced and debris-covered glaciers. The imagery contains 15 bands which includes 10 LandSat 7 bands, 3 precomputed NVDI/NDSI/NDWI indices, and 2 digital elevation and slope maps from the SRTM 90m DEM Digital Elevation Database.

The dataset can be downloaded (18GB/109GB compressed/uncompressed) using scripts/download_hkh_glacier.sh and instantiated below:

from torchrs.transforms import Compose, ToTensor
from torchrs.datasets import HKHGlacierMapping

transform = Compose([ToTensor()])

dataset = HKHGlacierMapping(
    root="path/to/dataset/",
    transform=transform
)

x = dataset[0]
"""
x: dict(
    x:                   (15, 512, 512)
    clean_ice_mask:      (1, 512, 512)
    debris_covered_mask: (1, 256, 256)
)
"""

dataset.bands
"""
['LE7 B1 (blue)', 'LE7 B2 (green)', 'LE7 B3 (red)', 'LE7 B4 (near infrared)', 'LE7 B5 (shortwave infrared 1)',
'LE7 B6_VCID_1 (low-gain thermal infrared)', 'LE7 B6_VCID_2 (high-gain thermal infrared)',
'LE7 B7 (shortwave infrared 2)', 'LE7 B8 (panchromatic)', 'LE7 BQA (quality bitmask)', 'NDVI (vegetation index)',
'NDSI (snow index)', 'NDWI (water index)', 'SRTM 90 elevation', 'SRTM 90 slope']
"""

ZueriCrop

The ZueriCrop dataset is a time-series instance segmentation dataset proposed in "Crop mapping from image time series: deep learning with multi-scale label hierarchies", Turkoglu et al. of 116k medium resolution (10m) 24x24 multispectral 9-band imagery of Zurich and Thurgau, Switzerland taken by the ESA Sentinel-2 satellite and contains pixel level semantic and instance annotations for 48 fine-grained, hierarchical categories of crop types. Note that there is only a single ground truth semantic & instance mask per time-series.

The dataset can be downloaded (39GB) using scripts/download_zuericrop.sh and instantiated below:

from torchrs.transforms import Compose, ToTensor
from torchrs.datasets import ZueriCrop

transform = Compose([ToTensor()])

dataset = ZueriCrop(
    root="path/to/dataset/",
    transform=transform
)

x = dataset[0]
"""
x: dict(
    x:              (142, 9, 24, 24)    (t, c, h, w)
    mask:           (1, 24, 24)
    instance_mask:  (1, 24, 24)
)
"""

[cls.label for cls in ds.classes]
"""
['Unknown', 'SummerBarley', 'WinterBarley', 'Oat', 'Wheat', 'Grain', ...]
"""

FAIR1M - Fine-grained Object Recognition

The FAIR1M dataset, proposed in "FAIR1M: A Benchmark Dataset for Fine-grained Object Recognition in High-Resolution Remote Sensing Imagery", Sun et al. is a fine-grained object recognition/detection dataset of 15,000 high resolution (0.3-0.8m) RGB images taken by the Gaogen (GF) satellites and extracted from Google Earth. The dataset contains rotated bounding boxes for objects of 5 categories (ships, vehicles, airplanes, courts, and roads) and 37 sub-categories. This dataset is a part of the ISPRS Benchmark on Object Detection in High-Resolution Satellite Images. Note that so far only a portion of the training dataset has been released for the challenge (1,732/15,000 images).

The dataset can be downloaded (8.7GB) using scripts/download_fair1m.sh and instantiated below:

import torchvision.transforms as T
from torchrs.datasets import FAIR1M

transform = T.Compose([T.ToTensor()])

dataset = FAIR1M(
    root="path/to/dataset/",
    split="train",  # only 'train' for now
    transform=transform,
)

x = dataset[0]
"""
x: dict(
    x: (3, h, w)
    y: (N,)
    points: (N, 5, 2)
)
where N is the number of objects in the image
"""

ADVANCE - Audiovisual Aerial Scene Recognition

The AuDio Visual Aerial sceNe reCognition datasEt (ADVANCE) dataset, proposed in "Cross-Task Transfer for Geotagged Audiovisual Aerial Scene Recognition", Hu et al. is a dataset composed of 5,075 pairs of geotagged audio recordings and 512x512 RGB images extracted from FreeSound and Google Earth, respectively. The images are then labeled into 13 scene categories using OpenStreetMap.

The dataset can be downloaded (4.5GB) using scripts/download_advance.sh and instantiated below:

import torchvision.transforms as T
from torchrs.datasets import ADVANCE

image_transform = T.Compose([T.ToTensor()])
audio_transform = T.Compose([])

dataset = ADVANCE(
    root="path/to/dataset/",
    image_transform=image_transform,
    audio_transform=audio_transform,
)

x = dataset[0]
"""
x: dict(
    image: (3, 512, 512)
    audio: (1, 220500)
    cls: int
)
"""

dataset.classes
"""
['airport', 'beach', 'bridge', 'farmland', 'forest', 'grassland', 'harbour', 'lake',
'orchard', 'residential', 'sparse shrub land', 'sports land', 'train station']
"""

Onera Satellite Change Detection (OSCD)

The Onera Satellite Change Detection (OSCD) dataset, proposed in "Urban Change Detection for Multispectral Earth Observation Using Convolutional Neural Networks", Daudt et al. is a change detection dataset of multispectral (MS) images taken by the ESA Sentinel-2 satellite. The dataset contains 24 registered image pairs from multiple continents between 2015-2018 along with binary change masks.

The dataset can be downloaded (0.73GB) using scripts/download_oscd.sh and instantiated below:

from torchrs.transforms import Compose, ToTensor
from torchrs.datasets import OSCD

transform = Compose([ToTensor(permute_dims=False)])

dataset = OSCD(
    root="path/to/dataset/",
    split="train",  # or 'test'
    transform=transform,
)

x = dataset[0]
"""
x: dict(
    x: (2, 13, h, w)
    mask: (1, h, w)
)
"""

Satellite Side-Looking (S2Looking) Change Detection

The S2Looking dataset, proposed in "S2Looking: A Satellite Side-Looking Dataset for Building Change Detection", Shen et al. is a rural building change detection dataset of 5,000 1024x1024 0.5-0.8m registered RGB image pairs of varying off-nadir angles taken by the Gaogen (GF), SuperView (SV), and BeiJing-2 (BJ-2) satellites. The dataset contains separate new and demolished building masks from regions all over the Earth with a time span of 1-3 years. This dataset was proposed along with the LEVIR-CD+ dataset and is considered difficult due to the rural locations and off-nadir angles.

The dataset can be downloaded (11GB) using scripts/download_s2looking.sh and instantiated below:

from torchrs.transforms import Compose, ToTensor
from torchrs.datasets import S2Looking

transform = Compose([ToTensor()])

dataset = S2Looking(
    root="path/to/dataset/",
    split="train",  # or 'val', 'test'
    transform=transform,
)

x = dataset[0]
"""
x: dict(
    x: (2, 3, 1024, 1024)
    build_mask: (1, 1024, 1024),
    demolish_mask: (1, 1024, 1024)
)
"""

LEVIR Change Detection+ (LEVIR-CD+)

The LEVIR-CD+ dataset, proposed in "S2Looking: A Satellite Side-Looking Dataset for Building Change Detection", Shen et al. is an urban building change detection dataset of 985 1024x1024 0.5m RGB image pairs extracted from Google Earth. The dataset contains building/land use change masks from 20 different regions of Texas between 2002-2020 with a time span of 5 years. This dataset was proposed along with the S2Looking dataset and is considered the easier version due to the urban locations and near-nadir angles.

The dataset can be downloaded (3.6GB) using scripts/download_levircd_plus.sh and instantiated below:

from torchrs.transforms import Compose, ToTensor
from torchrs.datasets import LEVIRCDPlus

transform = Compose([ToTensor()])

dataset = LEVIRCDPlus(
    root="path/to/dataset/",
    split="train",  # or 'test'
    transform=transform,
)

x = dataset[0]
"""
x: dict(
    x: (2, 3, 1024, 1024)
    mask: (1, 1024, 1024)
)
"""

High Resolution Semantic Change Detection (HRSCD)

The High Resolution Semantic Change Detection (HRSCD) dataset, proposed in "Multitask Learning for Large-scale Semantic Change Detection", Daudt et al. is a change detection dataset of high resolution (0.5m) aerial RGB image pairs extracted from the French National Institute of Geographical and Forest Information (IGN) database. The dataset contains 291 coregistered image pairs from 2006 and 2012 along with binary change masks extracted from the Urban Atlas Change 2006-2012 maps and corresponding land cover masks for each image extracted from the Urban Atlas 2006 and Urban Atlas 2012.

The dataset can be downloaded (12GB) using scripts/download_hrscd.sh and instantiated below:

from torchrs.transforms import Compose, ToTensor
from torchrs.datasets import HRSCD

transform = Compose([ToTensor()])

dataset = HRSCD(
    root="path/to/dataset/",
    transform=transform,
)

x = dataset[0]
"""
x: dict(
    x: (2, 3, 1000, 1000)
    lc: (2, 1000, 1000)
    mask: (1, 1000, 1000)
)
"""

Sentinel-2 Multitemporal Cities Pairs (S2MTCP)

The Sentinel-2 Multitemporal Cities Pairs (S2MTCP) dataset, proposed in "Self-supervised pre-training enhances change detection in Sentinel-2 imagery", Leenstra et al. is an urban change detection dataset of 1,520 medium resolution 10m unregistered image pairs taken by the ESA Sentinel-2 satellite. The dataset does not contain change masks and was originally used for self-supervised pretraining for other downstream change detection tasks (e.g. the OSCD dataset). The imagery are roughly 600x600 in shape and contain all Sentinel-2 bands of the Level 1C (L1C) product resampled to 10m.

The dataset can be downloaded (10GB/139GB compressed/uncompressed) using scripts/download_s2mtcp.sh and instantiated below:

from torchrs.transforms import Compose, ToTensor
from torchrs.datasets import S2MTCP

transform = Compose([ToTensor()])

dataset = S2MTCP(
    root="path/to/dataset/",
    transform=transform,
)

x = dataset[0]  # (2, 14, h, w)

Remote Sensing Visual Question Answering (RSVQA) Low Resolution (LR)

The RSVQA LR dataset, proposed in "RSVQA: Visual Question Answering for Remote Sensing Data", Lobry et al. is a visual question answering (VQA) dataset of 772 256x256 low resolution (10m) RGB images taken by the ESA Sentinel-2 satellite. Each image is annotated with a set of questions and their corresponding answers. Among other applications, this dataset can be used to train VQA models to perform detailed scene understanding of medium resolution remote sensing imagery.

The dataset can be downloaded (0.2GB) using scripts/download_rsvqa_lr.sh and instantiated below:

import torchvision.transforms as T
from torchrs.datasets import RSVQALR

transform = T.Compose([T.ToTensor()])

dataset = RSVQALR(
    root="path/to/dataset/",
    split="train",  # or 'val', 'test'
    transform=transform
)

x = dataset[0]
"""
x: dict(
    x:         (3, 256, 256)
    questions:  List[str]
    answers:    List[str]
    types:      List[str]
)
"""

Remote Sensing Visual Question Answering (RSVQA) High Resolution (HR)

The RSVQA HR dataset, proposed in "RSVQA: Visual Question Answering for Remote Sensing Data", Lobry et al. is a visual question answering (VQA) dataset of 772 512x512 high resolution (15cm) aerial RGB images extracted from the USGS High Resolution Orthoimagery (HRO) collection. Each image is annotated with a set of questions and their corresponding answers. Among other applications, this dataset can be used to train VQA models to perform detailed scene understanding of high resolution remote sensing imagery.

The dataset can be downloaded (15GB) using scripts/download_rsvqa_hr.sh and instantiated below:

import torchvision.transforms as T
from torchrs.datasets import RSVQALR

transform = T.Compose([T.ToTensor()])

dataset = RSVQALR(
    root="path/to/dataset/",
    split="train",  # or 'val', 'test'
    transform=transform
)

x = dataset[0]
"""
x: dict(
    x:         (3, 256, 256)
    questions:  List[str]
    answers:    List[str]
    types:      List[str]
)
"""

Remote Sensing Visual Question Answering BigEarthNet (RSVQAxBEN)

The RSVQAxBEN dataset, proposed in "RSVQA Meets BigEarthNet: A New, Large-Scale, Visual Question Answering Dataset for Remote Sensing", Lobry et al. is a version of the BigEarthNet dataset with visual question answering (VQA) annotations using the same method applied to generate annotations forthe RSVQA LR dataset. The dataset consists of 120x120 RGB Sentinel-2 imagery annotated with a set of questions and their corresponding answers.

The dataset can be downloaded (35.4GB) using scripts/download_rsvqaxben.sh and instantiated below:

import torchvision.transforms as T
from torchrs.datasets import RSVQAxBEN

transform = T.Compose([T.ToTensor()])

dataset = RSVQAxBEN(
    root="path/to/dataset/",
    split="train",  # or 'val', 'test'
    transform=transform
)

x = dataset[0]
"""
x: dict(
    x:         (3, 120, 120)
    questions:  List[str]
    answers:    List[str]
    types:      List[str]
)
"""

Remote Sensing Image Captioning Dataset (RSICD)

The RSICD dataset, proposed in "Exploring Models and Data for Remote Sensing Image Caption Generation", Lu et al. is an image captioning dataset with 5 captions per image for 10,921 224x224 RGB images extracted using Google Earth, Baidu Map, MapABC and Tianditu. While one of the larger remote sensing image captioning datasets, this dataset contains very repetitive language with little detail and many captions are duplicated.

The dataset can be downloaded (0.57GB) using scripts/download_rsicd.sh and instantiated below:

import torchvision.transforms as T
from torchrs.datasets import RSICD

transform = T.Compose([T.ToTensor()])

dataset = RSICD(
    root="path/to/dataset/",
    split="train",  # or 'val', 'test'
    transform=transform
)

x = dataset[0]
"""
x: dict(
    x:        (3, 224, 224)
    captions: List[str]
)
"""

Sydney Captions

The Sydney Captions dataset, proposed in "Deep semantic understanding of high resolution remote sensing image", Qu et al. is a version of the Sydney scene classification dataset proposed in "Saliency-Guided Unsupervised Feature Learning for Scene Classification", Zhang et al. The dataset contains 613 500x500 1ft resolution RGB images of Sydney, Australia extracted using Google Earth and is annotated with 5 captions per image.

The dataset can be downloaded (0.44GB) using scripts/download_sydney_captions.sh and instantiated below:

import torchvision.transforms as T
from torchrs.datasets import SydneyCaptions

transform = T.Compose([T.ToTensor()])

dataset = SydneyCaptions(
    root="path/to/dataset/",
    split="train",  # or 'val', 'test'
    transform=transform
)

x = dataset[0]
"""
x: dict(
    x:        (3, 500, 500)
    captions: List[str]
)
"""

UC Merced (UCM) Captions

The UC Merced (UCM) Captions dataset, proposed in "Deep semantic understanding of high resolution remote sensing image", Qu et al. is a version of the UCM land use classification dataset proposed in "Bag-Of-Visual-Words and Spatial Extensions for Land-Use Classification", Yang et al. The dataset contains 2100 256x256 1ft resolution RGB images of urban locations around the U.S. extracted from the USGS National Map Urban Area Imagery collection and is annotated with 5 captions per image.

The dataset can be downloaded (0.4GB) using scripts/download_ucm_captions.sh and instantiated below:

import torchvision.transforms as T
from torchrs.datasets import UCMCaptions

transform = T.Compose([T.ToTensor()])

dataset = UCMCaptions(
    root="path/to/dataset/",
    split="train",  # or 'val', 'test'
    transform=transform
)

x = dataset[0]
"""
x: dict(
    x:        (3, 256, 256)
    captions: List[str]
)
"""

Remote Sensing Image Scene Classification (RESISC45)

The RESISC45 dataset, proposed in "Remote Sensing Image Scene Classification: Benchmark and State of the Art", Cheng et al. is a scene classification dataset of 31,500 RGB images extracted using Google Earth. The dataset contains 45 scenes with 700 images per class from over 100 countries and was selected to optimize for high variability in image conditions (spatial resolution, occlusion, weather, illumination, etc.).

The dataset can be downloaded (0.47GB) using scripts/download_resisc45.sh and instantiated below:

import torchvision.transforms as T
from torchrs.datasets import RESISC45

transform = T.Compose([T.ToTensor()])

dataset = RESISC45(
    root="path/to/dataset/",
    transform=transform
)

x, y = dataset[0]
"""
x: (3, 256, 256)
y: int
"""

dataset.classes
"""
['airplane', 'airport', 'baseball_diamond', 'basketball_court', 'beach', 'bridge', 'chaparral',
'church', 'circular_farmland', 'cloud', 'commercial_area', 'dense_residential', 'desert', 'forest',
'freeway', 'golf_course', 'ground_track_field', 'harbor', 'industrial_area', 'intersection', 'island',
'lake', 'meadow', 'medium_residential', 'mobile_home_park', 'mountain', 'overpass', 'palace', 'parking_lot',
'railway', 'railway_station', 'rectangular_farmland', 'river', 'roundabout', 'runway', 'sea_ice', 'ship',
'snowberg', 'sparse_residential', 'stadium', 'storage_tank', 'tennis_court', 'terrace', 'thermal_power_station', 'wetland']
"""

EuroSAT

The EuroSAT dataset, proposed in "EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification", Helber et al. is a land cover classification dataset of 27,000 64x64 images taken by the ESA Sentinel-2 satellite. The dataset contains 10 land cover classes with 2-3k images per class from over 34 European countries. The dataset is available in the form of RGB only or all 13 Multispectral (MS) Sentinel-2 bands. This dataset is fairly easy with ~98.6% accuracy achievable with a ResNet-50.

The dataset can be downloaded (.13GB and 2.8GB) using scripts/download_eurosat_rgb.sh or scripts/download_eurosat_ms.sh and instantiated below:

import torchvision.transforms as T
from torchrs.transforms import ToTensor
from torchrs.datasets import EuroSATRGB, EuroSATMS

transform = T.Compose([T.ToTensor()])

dataset = EuroSATRGB(
    root="path/to/dataset/",
    transform=transform
)

x, y = dataset[0]
"""
x: (3, 64, 64)
y: int
"""

transform = T.Compose([ToTensor()])

dataset = EuroSATMS(
    root="path/to/dataset/",
    transform=transform
)

x, y = dataset[0]
"""
x: (13, 64, 64)
y: int
"""

dataset.classes
"""
['AnnualCrop', 'Forest', 'HerbaceousVegetation', 'Highway', 'Industrial',
'Pasture', 'PermanentCrop', 'Residential', 'River', 'SeaLake']
"""

SAT-4 & SAT-6

The SAT-4 & SAT-6 datasets, proposed in "DeepSat - A Learning framework for Satellite Imagery", Basu et al. are land cover classification datasets of 500k and 405k 28x28 RGBN images, respectively, sampled across the Continental United States (CONUS) and extracted from the National Agriculture Imagery Program (NAIP). The SAT-4 and SAT-6 datasets contain 4 and 6 land cover classes, respectively. This dataset is fairly easy with ~80% accuracy achievable with a 5-layer CNN.

The dataset can be downloaded (2.7GB) using scripts/download_sat.sh and instantiated below:

import torchvision.transforms as T
from torchrs.datasets import SAT4, SAT6

transform = T.Compose([T.ToTensor()])

dataset = SAT4(
    root="path/to/dataset/",
    split="train"   # or 'test'
    transform=transform
)

x, y = dataset[0]
"""
x: (4, 28, 28)
y: int
"""

dataset.classes
"""
['barren land', 'trees', 'grassland', 'other']
"""

dataset = SAT6(
    root="path/to/dataset/",
    split="train"   # or 'test'
    transform=transform
)

x, y = dataset[0]
"""
x: (4, 28, 28)
y: int
"""

dataset.classes
"""
['barren land', 'trees', 'grassland', 'roads', 'buildings', 'water']
"""

Aerial Image Dataset (AID)

The AID dataset, proposed in "AID: A Benchmark Dataset for Performance Evaluation of Aerial Scene Classification", Xia et al. is a scene classification dataset of 10k 600x600 RGB images extracted using Google Earth. The dataset contains 30 scenes with several hundred images per class from regions and countries around the world. This dataset is fairly easy with ~90% accuracy achievable with a VGG-16.

The dataset can be downloaded (2.6GB) using scripts/download_aid.sh and instantiated below:

import torchvision.transforms as T
from torchrs.datasets import AID

transform = T.Compose([T.ToTensor()])

dataset = AID(
    root="path/to/dataset/",
    transform=transform
)

x, y = dataset[0]
"""
x: (3, 600, 600)
y: int
"""

dataset.classes
"""
['Airport', 'BareLand', 'BaseballField', 'Beach', 'Bridge', 'Center', 'Church', 'Commercial',
'DenseResidential', 'Desert', 'Farmland', 'Forest', 'Industrial', 'Meadow', 'MediumResidential',
'Mountain', 'Park', 'Parking', 'Playground', 'Pond', 'Port', 'RailwayStation', 'Resort',
'River', 'School', 'SparseResidential', 'Square', 'Stadium', 'StorageTanks', 'Viaduct']
"""

Inria Aerial Image Labeling

The Inria Aerial Image Labeling Dataset is a building semantic segmentation dataset proposed in "Can semantic labeling methods generalize to any city? the inria aerial image labeling benchmark", Maggiori et al. of 360 high resolution (0.3m) 5000x5000 RGB imagery extracted from various international GIS services (e.g. USGS National Map). The dataset contains imagery from 10 regions around the world (both urban and rural) with train/test sets split into different cities for the purpose of evaluating if models can generalize across dramatically different locations. The dataset was originally used in the Inria Aerial Image Labeling Dataset Contest and the test set ground truth masks have not been released publicly.

The dataset can be downloaded (26GB) using scripts/download_inria_ail.sh and instantiated below:

from torchrs.transforms import Compose, ToTensor
from torchrs.datasets import InriaAIL

transform = Compose([ToTensor()])

dataset = InriaAIL(
    root="path/to/dataset/",
    split="train",  # or 'test'
    transform=transform
)

x = dataset[0]
"""
x: dict(
    x:         (3, 5000, 5000)
    mask:      (1, 5000, 5000)
    region:     str
)
"""

dataset.regions
"""
['austin', 'chicago', 'kitsap', 'tyrol', 'vienna']
"""

Dubai Segmentation

The Dubai Segmentation dataset is a semantic segmentation dataset of 72 high resolution ~700x700 RGB imagery taken by the MBRSC satellites. The dataset contains imagery from 9 regions across Dubai and contains masks with 6 categories.

The dataset can be downloaded (0.03GB) using scripts/download_dubai_segmentation.sh and instantiated below:

from torchrs.transforms import Compose, ToTensor
from torchrs.datasets import DubaiSegmentation

transform = Compose([ToTensor()])

dataset = DubaiSegmentation(
    root="path/to/dataset/",
    transform=transform
)

x = dataset[0]
"""
x: dict(
    x:         (3, h, w)
    mask:      (1, h, w)
    region:    str
)
"""

dataset.classes.keys()
"""
['Unlabeled', 'Water', 'Land (unpaved area)', 'Road', 'Building', 'Vegetation']
"""

GID-15

The Gaofen Image Dataset (GID-15) is a semantic segmentation dataset proposed in "Land-Cover Classification with High-Resolution Remote Sensing Images Using Transferable Deep Models", Tong et al. of 150 high resolution (3m) 6800x7200 RGB imagery taken by the Gaofen-2 satellite and contains pixel level annotations for 15 categories. The dataset was used in a challenge hosted by the IEEE ICCV 2021 1st Workshop on Learning to Understand Aerial Images and the test set ground truth masks have not been released publicly.

The dataset can be downloaded (36GB) using scripts/download_gid15.sh and instantiated below:

from torchrs.transforms import Compose, ToTensor
from torchrs.datasets import GID15

transform = Compose([ToTensor()])

dataset = GID15(
    root="path/to/dataset/",
    split="train",  # or 'val', 'test'
    transform=transform
)

x = dataset[0]
"""
x: dict(
    x:         (3, 6800, 7200)
    mask:      (1, 6800, 7200)
)
"""

dataset.classes
"""
['background', 'industrial_land', 'urban_residential', 'rural_residential', 'traffic_land', 'paddy_field',
'irrigated_land', 'dry_cropland', 'garden_plot', 'arbor_woodland', 'shrub_land', 'natural_grassland',
'artificial_grassland', 'river', 'lake', 'pond']
"""

TiSeLaC

The TiSeLaC dataset from the Time Series Land Cover Classification Challenge is a time series land cover classification dataset consisting of 23 2866x2633 medium resolution (30m) multispectral 10 band (7 reflectance + NDVI/NDWI/Brightness Index) images taken by the USGS Landsat 8 satellite. The imagery was captured over Reunion Island in 2014 and contains 9 land cover classes derived from the Corine Land Cover (CLC) map. Note that the dataset is formatted for pixelwise time-series classification where each time series is of the form (t, b) where t=23 samples and b=10 bands. This dataset is very easy with the top score currently standing at 0.9929 F1 Score.

The dataset can be downloaded (0.08GB) using scripts/download_tiselac.sh and instantiated below:

from torchrs.transforms import Compose, ToTensor
from torchrs.datasets import Tiselac

transform = Compose([ToTensor()])

dataset = Tiselac(
    root="path/to/dataset/",
    split="train"   # or 'test'
    transform=transform
)

x, y = dataset[0]
"""
x: (23, 10)
y: int
"""

dataset.classes
"""
['Urban Areas', 'Other built-up surfaces', 'Forests', 'Sparse Vegetation', 'Rocks and bare soil',
'Grassland', 'Sugarcane crops', 'Other crops', 'Water']
"""

UC Merced (UCM)

The UC Merced (UCM) dataset, proposed in "Bag-Of-Visual-Words and Spatial Extensions for Land-Use Classification", Yang et al. is a land use classification dataset of 21k 256x256 1ft resolution RGB images of urban locations around the U.S. extracted from the USGS National Map Urban Area Imagery collection with 21 land use classes (100 images per class).

The dataset can be downloaded (0.42GB) using scripts/download_ucm.sh and instantiated below:

import torchvision.transforms as T
from torchrs.datasets import UCM

transform = T.Compose([T.ToTensor()])

dataset = UCM(
    root="path/to/dataset/",
    transform=transform
)

x, y = dataset[0]
"""
x: (3, 256, 256)
y: int
"""

dataset.classes
"""
['agricultural', 'airplane', 'baseballdiamond', 'beach', 'buildings', 'chaparral', 'denseresidential',
'forest', 'freeway', 'golfcourse', 'harbor', 'intersection', 'mediumresidential', 'mobilehomepark',
'overpass', 'parkinglot', 'river', 'runway', 'sparseresidential', 'storagetanks', 'tenniscourt']
"""

PatternNet

The PatternNet dataset, proposed in "PatternNet: A Benchmark Dataset for Performance Evaluation of Remote Sensing Image Retrieval", Yang et al. is a image retrieval and scene classification dataset of 30,400 256x256 high resolution (.06-5m) RGB images extracted using Google Earth and Google Maps with 38 scene classes (800 images per class). This dataset was originally proposed as a remote sensing image retrieval (RSIR) dataset with classes selected for high intra-class diversity and inter-class similarity such that image retrieval requires learning fine-grained details between multiple classes. Additionally, this dataset has some unique classes not found in other scene classification datasets, e.g. oil well, nursing home, solar panel, etc.

The dataset can be downloaded (1.4GB) using scripts/download_patternnet.sh and instantiated below:

import torchvision.transforms as T
from torchrs.datasets import PatternNet

transform = T.Compose([T.ToTensor()])

dataset = PatternNet(
    root="path/to/dataset/",
    transform=transform
)

x, y = dataset[0]
"""
x: (3, 256, 256)
y: int
"""

dataset.classes
"""
['airplane', 'baseball_field', 'basketball_court', 'beach', 'bridge', 'cemetery', 'chaparral',
'christmas_tree_farm', 'closed_road', 'coastal_mansion', 'crosswalk', 'dense_residential',
'ferry_terminal', 'football_field', 'forest', 'freeway', 'golf_course', 'harbor', 'intersection',
'mobile_home_park', 'nursing_home', 'oil_gas_field', 'oil_well', 'overpass', 'parking_lot', 'parking_space',
'railway', 'river', 'runway', 'runway_marking', 'shipping_yard', 'solar_panel','sparse_residential',
'storage_tank', 'swimming_pool', 'tennis_court', 'transformer_station', 'wastewater_treatment_plant']
"""

WHU-RS19

The WHU-RS19 dataset, proposed in "Structural High-resolution Satellite Image Indexing", Xia et al. is a scene classification dataset of 1,005 600x600 high resolution (up to 0.5m) RGB images extracted using Google Earth with 19 scene classes (~50 images per class).

The dataset can be downloaded (0.11GB) using scripts/download_whu_rs19.sh and instantiated below:

import torchvision.transforms as T
from torchrs.datasets import WHURS19

transform = T.Compose([T.ToTensor()])

dataset = WHURS19(
    root="path/to/dataset/",
    transform=transform
)

x, y = dataset[0]
"""
x: (3, 600, 600)
y: int
"""

dataset.classes
"""
['Airport', 'Beach', 'Bridge', 'Commercial', 'Desert', 'Farmland','Forest', 'Industrial',
'Meadow', 'Mountain', 'Park', 'Parking', 'Pond', 'Port', 'Residential', 'River', 'Viaduct',
'footballField', 'railwayStation']
"""

RSSCN7

The RSSCN7 dataset, proposed in "Deep Learning Based Feature Selection for Remote Sensing Scene Classification", Zou et al. is a scene classification dataset of 2,800 400x400 high resolution RGB images extracted using Google Earth with 7 scene classes (400 images per class).

The dataset can be downloaded (0.36GB) using scripts/download_rsscn7.sh and instantiated below:

import torchvision.transforms as T
from torchrs.datasets import RSSCN7

transform = T.Compose([T.ToTensor()])

dataset = RSSCN7(
    root="path/to/dataset/",
    transform=transform
)

x, y = dataset[0]
"""
x: (3, 400, 400)
y: int
"""

dataset.classes
"""
['aGrass', 'bField', 'cIndustry', 'dRiverLake', 'eForest', 'fResident', 'gParking']
"""

Brazilian Coffee Scenes

The Brazilian Coffee Scenes dataset, proposed in "Do Deep Features Generalize from Everyday Objects to Remote Sensing and Aerial Scenes Domains?", Penatti et al. is a scene classification dataset of 2,876 64x64 3-band (Green, Red, NIR) images taken by the SPOT satellites in 2005 over four counties in the State of Minas Gerais, Brazil. This dataset was developed to classify coffee crops from non-coffee crops.

The dataset can be downloaded (0.01GB) using scripts/download_brazilian_coffee_.sh and instantiated below:

import torchvision.transforms as T
from torchrs.datasets import BrazilianCoffeeScenes

transform = T.Compose([T.ToTensor()])

dataset = BrazilianCoffeeScenes(
    root="path/to/dataset/",
    transform=transform
)

x, y = dataset[0]
"""
x: (3, 64, 64)
y: int
"""

dataset.classes
"""
['non-coffee', 'coffee']
"""

Models

Multi-Image Super Resolution - RAMS

Residual Attention Multi-image Super-resolution Network (RAMS) from "Multi-Image Super Resolution of Remotely Sensed Images Using Residual Attention Deep Neural Networks", Salvetti et al. (2021)

RAMS is currently one of the top performers on the PROBA-V Super Resolution Challenge. This Multi-image Super Resolution (MISR) architecture utilizes attention based methods to extract spatial and spatiotemporal features from a set of low resolution images to form a single high resolution image. Note that the attention methods are effectively Squeeze-and-Excitation blocks from "Squeeze-and-Excitation Networks", Hu et al..

import torch
from torchrs.models import RAMS

# increase resolution by factor of 3 (e.g. 128x128 -> 384x384)
model = RAMS(
    scale_factor=3,
    t=9,
    c=1,
    num_feature_attn_blocks=12
)

# Input should be of shape (bs, t, c, h, w), where t is the number
# of low resolution input images and c is the number of channels/bands
lr = torch.randn(1, 9, 1, 128, 128)
sr = model(lr) # (1, 1, 384, 384)

Change Detection - Fully Convolutional Early Fusion (FC-EF), Siamese Concatenation (FC-Siam-conc), and Siamese Difference (FC-Siam-diff)

Fully Convolutional Early Fusion (FC-EF), Siamese Concatenation (FC-Siam-conc), Siamese Difference (FC-Siam-conc) and are change detection segmentation architectures proposed in "Fully Convolutional Siamese Networks for Change Detection", Daudt et al.. The architectures are essentially modified U-Nets from "U-Net: Convolutional Networks for Biomedical Image Segmentation", Ronneberger et al.. FC-EF is a U-Net which takes as input the concatenated images. FC-Siam-conc and FC-Siam-diff are U-Nets with a shared encoder which concatenate or take the difference of the skip connections, respectively. Both models been modified to work with any number of input images t and channels c.

import torch
from torchrs.models import FCEF, FCSiamConc, FCSiamDiff

model = FCEF(
    channels=3,
    t=2,
    num_classes=2
)

model = FCSiamConc(
    channels=3,
    t=2,
    num_classes=2
)

model = FCSiamDiff(
    channels=3,
    t=2,
    num_classes=2
)


x = torch.randn(1, 2, 3, 128, 128)  # (b, t, c, h, w)
model(x)                            # (b, num_classes, h, w)

Change Detection - Early Fusion (EF) and Siamese (Siam)

Early Fusion (EF) and Siamese (Siam) are change detection architectures proposed along with the OSCD - Onera Satellite Change Detection dataset in "Urban Change Detection for Multispectral Earth Observation Using Convolutional Neural Networks", Daudt et al.. The architectures are effectively CNN classifiers which are trained to classify whether the central pixel of a set (typically a pair) of input patches contains change/no change. EF takes as input the concatenated images while Siam extracts feature vectors using a shared CNN and then feeds the concatenated vectors to a MLP classifier. Both models expect patches of size Cx15x15 but have been modified to work with any number of input images t and channels c.

import torch
from torchrs.models import EarlyFusion, Siam

model = EarlyFusion(
    channels=3,
    t=2,
    num_classes=2
)

model = Siam(
    channels=3,
    t=2,
    num_classes=2
)


x = torch.randn(1, 2, 3, 15, 15)  # (b, t, c, h, w)
model(x)                          # (b, num_classes, h, w)

Training

For training purposes, each model and dataset has been adapted into Pytorch Lightning LightningModules and LightningDataModules, respectively. The modules can be found in torchrs.train.modules and torchrs.train.datamodules. Among other things, Pytorch Lightning has the benefits of reducing boilerplate code, requiring minimal rewrite for multi-gpu/cluster training, supports mixed precision training, gradient accumulation, callbacks, logging metrics, etc.

To use the training features, torch-rs must be installed with the train extras.

# pypi
pip install 'torch-rs[train]'

# latest
pip install 'git+https://github.com/isaaccorley/torchrs.git#egg=torch-rs[train]'

A simple training example:

import torch
import torch.nn as nn
import pytorch_lightning as pl

from torchrs.train.modules import FCEFModule
from torchrs.train.datamodules import LEVIRCDPlusDataModule
from torchrs.transforms import Compose, ToTensor


def collate_fn(batch):
    x = torch.stack([x["x"] for x in batch])
    y = torch.cat([x["mask"] for x in batch])
    x = x.to(torch.float32)
    y = y.to(torch.long)
    return x, y 

transform = Compose([ToTensor()])
model = FCEFModule(channels=3, t=2, num_classes=2, lr=1E-3)
dm = LEVIRCDPlusDataModule(
    root="path/to/dataset",
    transform=transform,
    batch_size=4,
    num_workers=2,
    prefetch_factor=1,
    collate_fn=collate_fn,
    test_collate_fn=collate_fn,
    val_split=0.2
)
callbacks = [
    pl.callbacks.ModelCheckpoint(monitor="val_loss", mode="min", verbose=True, save_top_k=1),
    pl.callbacks.EarlyStopping(monitor="val_loss", mode="min", patience=10)
]
trainer = pl.Trainer(
    gpus=1,
    precision=16,
    accumulate_grad_batches=1,
    max_epochs=25,
    callbacks=callbacks,
    weights_summary="top"
)
trainer.fit(model, datamodule=dm)
trainer.test(datamodule=dm)

Tests

$ pytest -ra

More Repositories

1

pytorch-enhance

Open-source Library of Image Super-Resolution Models, Datasets, and Metrics for Benchmarking or Pretrained Use
Python
64
star
2

hydro-foundation-model

Hydro -- A Foundation Model for Water in Satellite Imagery
Jupyter Notebook
47
star
3

a-change-detection-reality-check

Code and experiments for the paper, "A Change Detection Reality Check", Corley et al.
Python
41
star
4

ChesapeakeRSC

Chesapeake RSC dataset introduced in "Seeing the roads through the trees: A benchmark for modeling spatial dependencies with aerial imagery"
Jupyter Notebook
39
star
5

mlp-mixer-pytorch

PyTorch implementation of "MLP-Mixer: An all-MLP Architecture for Vision" Tolstikhin et al. (2021)
Python
29
star
6

deep-aesthetics-pytorch

PyTorch implementation of "Photo Aesthetics Ranking Network with Attributes and Content Adaptation" by Kong et al. (ECCV 2016)
Jupyter Notebook
27
star
7

dfc2022-baseline

A simple baseline for the 2022 IEEE GRSS Data Fusion Contest (DFC2022)
Python
25
star
8

resize-is-all-you-need

The official repository for the paper "Revisiting pre-trained remote sensing model benchmarks: resizing and normalization matters"
Jupyter Notebook
16
star
9

segmenter-pytorch

PyTorch implementation of "Segmenter: Transformer for Semantic Segmentation" Strudel et al. (2021)
Python
15
star
10

pytorch-modulation-recognition

PyTorch Implementation Modulation Recognition Networks on the RadioML2016 Dataset
Python
14
star
11

simsiam-pytorch

PyTorch Implementation of SimSiam from "Exploring Simple Siamese Representation Learning" by Chen et al.
Python
13
star
12

detcon-pytorch

PyTorch implementation of DeepMind's DetCon from "Efficient Visual Pretraining with Contrastive Detection" Henaff et al. (ICCV 2021)
Python
13
star
13

simpleview-pytorch

PyTorch implementation of SimpleView from "Revisiting Point Cloud Classification with a Simple and Effective Baseline", Goyal et al. (2020)
Python
12
star
14

Making-Convolutional-Networks-Shift-Invariant-Again-Tensorflow

Tensorflow Implementation of BlurPool the Antialiasing Pooling operation from "Making Convolutional Networks Shift-Invariant Again"
Python
10
star
15

jax-enhance

minimal library for image super-resolution implemented in jax
Python
9
star
16

landslide4sense

LandSlide4Sense 2022 Baseline
Python
8
star
17

deep-image-harmonization-pytorch

PyTorch implementation of "Deep Image Harmonization" by Tsai et al. (CVPR 2017)
Python
7
star
18

point-transformer-pytorch

Minimal PyTorch implementation of Point Transformer from "Point Transformer" by Zhao et al.
Python
6
star
19

Custom-Matlab-Neural-Net-Toolbox-Layers

Implementation of Layer and Activations not currently available in the Matlab Deep Learning Toolbox
MATLAB
6
star
20

contrastive-surface-image-pretraining

Code for the paper "Supervising Remote Sensing Change Detection Models with 3D Surface Semantics"
Python
5
star
21

convmlp-pytorch

PyTorch implementation of "ConvMLP: Hierarchical Convolutional MLPs for Vision" Li et al. (2021)
Python
4
star
22

isaaccorley

3
star
23

Gradient-Sonification-Keras-Callback

Keras Callback for Auralization of Gradient Norms
Python
3
star
24

QUIPSTER-Automatic-Cipher-Solver

Implementation of the QUIPSTER Substitution Cipher Solving Algorithm from "Solving Substitution Ciphers" - Hasinoff (2003)
Python
2
star
25

Python-Secure-File-Server-Implementation

Pure Python implementation of a secure Dropbox clone, that provides client/server file interface using RSA public-key cryptography.
Python
2
star
26

zrg-dataset

This is the repository for the paper: ZRG: A High Resolution 3D Residential Rooftop Geometry Dataset for Machine Learning
2
star
27

vit-pytorch

PyTorch implementation of Vision Transformer (ViT) from "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale", Dosovitskiy et al. (2020)
Python
1
star
28

Privacy-Encoder

Python
1
star
29

Neural-Branch-Prediction

Computer Architecture Branch Prediction using Deep Neural Networks
Python
1
star
30

Adaptive-Filters

Python
1
star
31

StockSellRulesChecker

Jupyter Notebook
1
star
32

Python-Cache-Simulator

Graduate Computer Architecture Course Project - Cache Simulator
Python
1
star
33

faissknn

Faiss Multilabel and Multiclass KNN Classifier Implementations
Python
1
star