PyTorch Remote Sensing (torchrs)

UPDATE: torchrs is currently being merged into TorchGeo. Please go star & follow our progress.

PyTorch implementation of popular datasets and models in remote sensing tasks (Change Detection, Image Super Resolution, Land Cover Classification/Segmentation, Image Captioning, Audio-visual recognition etc.) for various Optical (Sentinel-2, Landsat, etc.) and Synthetic Aperture Radar (SAR) (Sentinel-1) sensors.

Installation

# pypi
pip install torch-rs

# pypi with training extras
pip install 'torch-rs[train]'

# latest
pip install git+https://github.com/isaaccorley/torchrs

# latest with extras
pip install 'git+https://github.com/isaaccorley/torchrs.git#egg=torch-rs[train]'

Datasets

PROBA-V Super Resolution

The PROBA-V Super Resolution Challenge dataset is a Multi-image Super Resolution (MISR) dataset of images taken by the ESA PROBA-Vegetation satellite. The dataset contains sets of unregistered 300m low resolution (LR) images which can be used to generate single 100m high resolution (HR) images for both Near Infrared (NIR) and Red bands. In addition, Quality Masks (QM) for each LR image and Status Masks (SM) for each HR image are available. The PROBA-V contains sensors which take imagery at 100m and 300m spatial resolutions with 5 and 1 day revisit rates, respectively. Generating high resolution imagery estimates would effectively increase the frequency at which HR imagery is available for vegetation monitoring.

The dataset can be downloaded (0.83GB) using scripts/download_probav.sh and instantiated below:

from torchrs.transforms import Compose, ToTensor
from torchrs.datasets import PROBAV

transform = Compose([ToTensor()])

dataset = PROBAV(
    root="path/to/dataset/",
    split="train",  # or 'test'
    band="RED",     # or 'NIR'
    lr_transform=transform,
    hr_transform=transform
)

x = dataset[0]
"""
x: dict(
    lr: low res images  (t, 1, 128, 128)
    qm: quality masks   (t, 1, 128, 128)
    hr: high res image  (1, 384, 384)
    sm: status mask     (1, 384, 384)
)
t varies by set of images (minimum of 9)
"""

ETCI 2021 Flood Detection

The ETCI 2021 Dataset is a flood detection segmentation dataset of SAR images taken by the ESA Sentinel-1 satellite. The dataset contains pairs of VV and VH polarization images processed by the Hybrid Pluggable Processing Pipeline (hyp3) along with corresponding binary flood and water body ground truth masks.

The dataset can be downloaded (5.6GB) using scripts/download_etci2021.sh and instantiated below:

from torchrs.transforms import Compose, ToTensor
from torchrs.datasets import ETCI2021

transform = Compose([ToTensor()])

dataset = ETCI2021(
    root="path/to/dataset/",
    split="train",  # or 'val', 'test'
    transform=transform
)

x = dataset[0]
"""
x: dict(
    vv:         (3, 256, 256)
    vh:         (3, 256, 256)
    flood_mask: (1, 256, 256)
    water_mask: (1, 256, 256)
)
"""

HKH Glacier Mapping

The Hindu Kush Himalayas (HKH) Glacier Mapping dataset is a semantic segmentation dataset of 7,095 512x512 multispectral images taken by the USGS LandSat 7 satellite. The dataset contains imagery from 2002-2008 of the HKH region (spanning 8 countries) along with separate masks of clean-iced and debris-covered glaciers. The imagery contains 15 bands which includes 10 LandSat 7 bands, 3 precomputed NVDI/NDSI/NDWI indices, and 2 digital elevation and slope maps from the SRTM 90m DEM Digital Elevation Database.

The dataset can be downloaded (18GB/109GB compressed/uncompressed) using scripts/download_hkh_glacier.sh and instantiated below:

from torchrs.transforms import Compose, ToTensor
from torchrs.datasets import HKHGlacierMapping

transform = Compose([ToTensor()])

dataset = HKHGlacierMapping(
    root="path/to/dataset/",
    transform=transform
)

x = dataset[0]
"""
x: dict(
    x:                   (15, 512, 512)
    clean_ice_mask:      (1, 512, 512)
    debris_covered_mask: (1, 256, 256)
)
"""

dataset.bands
"""
['LE7 B1 (blue)', 'LE7 B2 (green)', 'LE7 B3 (red)', 'LE7 B4 (near infrared)', 'LE7 B5 (shortwave infrared 1)',
'LE7 B6_VCID_1 (low-gain thermal infrared)', 'LE7 B6_VCID_2 (high-gain thermal infrared)',
'LE7 B7 (shortwave infrared 2)', 'LE7 B8 (panchromatic)', 'LE7 BQA (quality bitmask)', 'NDVI (vegetation index)',
'NDSI (snow index)', 'NDWI (water index)', 'SRTM 90 elevation', 'SRTM 90 slope']
"""

ZueriCrop

The ZueriCrop dataset is a time-series instance segmentation dataset proposed in "Crop mapping from image time series: deep learning with multi-scale label hierarchies", Turkoglu et al. of 116k medium resolution (10m) 24x24 multispectral 9-band imagery of Zurich and Thurgau, Switzerland taken by the ESA Sentinel-2 satellite and contains pixel level semantic and instance annotations for 48 fine-grained, hierarchical categories of crop types. Note that there is only a single ground truth semantic & instance mask per time-series.

The dataset can be downloaded (39GB) using scripts/download_zuericrop.sh and instantiated below:

from torchrs.transforms import Compose, ToTensor
from torchrs.datasets import ZueriCrop

transform = Compose([ToTensor()])

dataset = ZueriCrop(
    root="path/to/dataset/",
    transform=transform
)

x = dataset[0]
"""
x: dict(
    x:              (142, 9, 24, 24)    (t, c, h, w)
    mask:           (1, 24, 24)
    instance_mask:  (1, 24, 24)
)
"""

[cls.label for cls in ds.classes]
"""
['Unknown', 'SummerBarley', 'WinterBarley', 'Oat', 'Wheat', 'Grain', ...]
"""

FAIR1M - Fine-grained Object Recognition

The FAIR1M dataset, proposed in "FAIR1M: A Benchmark Dataset for Fine-grained Object Recognition in High-Resolution Remote Sensing Imagery", Sun et al. is a fine-grained object recognition/detection dataset of 15,000 high resolution (0.3-0.8m) RGB images taken by the Gaogen (GF) satellites and extracted from Google Earth. The dataset contains rotated bounding boxes for objects of 5 categories (ships, vehicles, airplanes, courts, and roads) and 37 sub-categories. This dataset is a part of the ISPRS Benchmark on Object Detection in High-Resolution Satellite Images. Note that so far only a portion of the training dataset has been released for the challenge (1,732/15,000 images).

The dataset can be downloaded (8.7GB) using scripts/download_fair1m.sh and instantiated below:

import torchvision.transforms as T
from torchrs.datasets import FAIR1M

transform = T.Compose([T.ToTensor()])

dataset = FAIR1M(
    root="path/to/dataset/",
    split="train",  # only 'train' for now
    transform=transform,
)

x = dataset[0]
"""
x: dict(
    x: (3, h, w)
    y: (N,)
    points: (N, 5, 2)
)
where N is the number of objects in the image
"""

ADVANCE - Audiovisual Aerial Scene Recognition

The AuDio Visual Aerial sceNe reCognition datasEt (ADVANCE) dataset, proposed in "Cross-Task Transfer for Geotagged Audiovisual Aerial Scene Recognition", Hu et al. is a dataset composed of 5,075 pairs of geotagged audio recordings and 512x512 RGB images extracted from FreeSound and Google Earth, respectively. The images are then labeled into 13 scene categories using OpenStreetMap.

The dataset can be downloaded (4.5GB) using scripts/download_advance.sh and instantiated below:

import torchvision.transforms as T
from torchrs.datasets import ADVANCE

image_transform = T.Compose([T.ToTensor()])
audio_transform = T.Compose([])

dataset = ADVANCE(
    root="path/to/dataset/",
    image_transform=image_transform,
    audio_transform=audio_transform,
)

x = dataset[0]
"""
x: dict(
    image: (3, 512, 512)
    audio: (1, 220500)
    cls: int
)
"""

dataset.classes
"""
['airport', 'beach', 'bridge', 'farmland', 'forest', 'grassland', 'harbour', 'lake',
'orchard', 'residential', 'sparse shrub land', 'sports land', 'train station']
"""

Onera Satellite Change Detection (OSCD)

The Onera Satellite Change Detection (OSCD) dataset, proposed in "Urban Change Detection for Multispectral Earth Observation Using Convolutional Neural Networks", Daudt et al. is a change detection dataset of multispectral (MS) images taken by the ESA Sentinel-2 satellite. The dataset contains 24 registered image pairs from multiple continents between 2015-2018 along with binary change masks.

The dataset can be downloaded (0.73GB) using scripts/download_oscd.sh and instantiated below:

from torchrs.transforms import Compose, ToTensor
from torchrs.datasets import OSCD

transform = Compose([ToTensor(permute_dims=False)])

dataset = OSCD(
    root="path/to/dataset/",
    split="train",  # or 'test'
    transform=transform,
)

x = dataset[0]
"""
x: dict(
    x: (2, 13, h, w)
    mask: (1, h, w)
)
"""

Satellite Side-Looking (S2Looking) Change Detection

The S2Looking dataset, proposed in "S2Looking: A Satellite Side-Looking Dataset for Building Change Detection", Shen et al. is a rural building change detection dataset of 5,000 1024x1024 0.5-0.8m registered RGB image pairs of varying off-nadir angles taken by the Gaogen (GF), SuperView (SV), and BeiJing-2 (BJ-2) satellites. The dataset contains separate new and demolished building masks from regions all over the Earth with a time span of 1-3 years. This dataset was proposed along with the LEVIR-CD+ dataset and is considered difficult due to the rural locations and off-nadir angles.

The dataset can be downloaded (11GB) using scripts/download_s2looking.sh and instantiated below:

from torchrs.transforms import Compose, ToTensor
from torchrs.datasets import S2Looking

transform = Compose([ToTensor()])

dataset = S2Looking(
    root="path/to/dataset/",
    split="train",  # or 'val', 'test'
    transform=transform,
)

x = dataset[0]
"""
x: dict(
    x: (2, 3, 1024, 1024)
    build_mask: (1, 1024, 1024),
    demolish_mask: (1, 1024, 1024)
)
"""

LEVIR Change Detection+ (LEVIR-CD+)

The LEVIR-CD+ dataset, proposed in "S2Looking: A Satellite Side-Looking Dataset for Building Change Detection", Shen et al. is an urban building change detection dataset of 985 1024x1024 0.5m RGB image pairs extracted from Google Earth. The dataset contains building/land use change masks from 20 different regions of Texas between 2002-2020 with a time span of 5 years. This dataset was proposed along with the S2Looking dataset and is considered the easier version due to the urban locations and near-nadir angles.

The dataset can be downloaded (3.6GB) using scripts/download_levircd_plus.sh and instantiated below:

from torchrs.transforms import Compose, ToTensor
from torchrs.datasets import LEVIRCDPlus

transform = Compose([ToTensor()])

dataset = LEVIRCDPlus(
    root="path/to/dataset/",
    split="train",  # or 'test'
    transform=transform,
)

x = dataset[0]
"""
x: dict(
    x: (2, 3, 1024, 1024)
    mask: (1, 1024, 1024)
)
"""

High Resolution Semantic Change Detection (HRSCD)

The High Resolution Semantic Change Detection (HRSCD) dataset, proposed in "Multitask Learning for Large-scale Semantic Change Detection", Daudt et al. is a change detection dataset of high resolution (0.5m) aerial RGB image pairs extracted from the French National Institute of Geographical and Forest Information (IGN) database. The dataset contains 291 coregistered image pairs from 2006 and 2012 along with binary change masks extracted from the Urban Atlas Change 2006-2012 maps and corresponding land cover masks for each image extracted from the Urban Atlas 2006 and Urban Atlas 2012.

The dataset can be downloaded (12GB) using scripts/download_hrscd.sh and instantiated below:

from torchrs.transforms import Compose, ToTensor
from torchrs.datasets import HRSCD

transform = Compose([ToTensor()])

dataset = HRSCD(
    root="path/to/dataset/",
    transform=transform,
)

x = dataset[0]
"""
x: dict(
    x: (2, 3, 1000, 1000)
    lc: (2, 1000, 1000)
    mask: (1, 1000, 1000)
)
"""

Sentinel-2 Multitemporal Cities Pairs (S2MTCP)

The Sentinel-2 Multitemporal Cities Pairs (S2MTCP) dataset, proposed in "Self-supervised pre-training enhances change detection in Sentinel-2 imagery", Leenstra et al. is an urban change detection dataset of 1,520 medium resolution 10m unregistered image pairs taken by the ESA Sentinel-2 satellite. The dataset does not contain change masks and was originally used for self-supervised pretraining for other downstream change detection tasks (e.g. the OSCD dataset). The imagery are roughly 600x600 in shape and contain all Sentinel-2 bands of the Level 1C (L1C) product resampled to 10m.

The dataset can be downloaded (10GB/139GB compressed/uncompressed) using scripts/download_s2mtcp.sh and instantiated below:

from torchrs.transforms import Compose, ToTensor
from torchrs.datasets import S2MTCP

transform = Compose([ToTensor()])

dataset = S2MTCP(
    root="path/to/dataset/",
    transform=transform,
)

x = dataset[0]  # (2, 14, h, w)

Remote Sensing Visual Question Answering (RSVQA) Low Resolution (LR)

The RSVQA LR dataset, proposed in "RSVQA: Visual Question Answering for Remote Sensing Data", Lobry et al. is a visual question answering (VQA) dataset of 772 256x256 low resolution (10m) RGB images taken by the ESA Sentinel-2 satellite. Each image is annotated with a set of questions and their corresponding answers. Among other applications, this dataset can be used to train VQA models to perform detailed scene understanding of medium resolution remote sensing imagery.

The dataset can be downloaded (0.2GB) using scripts/download_rsvqa_lr.sh and instantiated below:

import torchvision.transforms as T
from torchrs.datasets import RSVQALR

transform = T.Compose([T.ToTensor()])

dataset = RSVQALR(
    root="path/to/dataset/",
    split="train",  # or 'val', 'test'
    transform=transform
)

x = dataset[0]
"""
x: dict(
    x:         (3, 256, 256)
    questions:  List[str]
    answers:    List[str]
    types:      List[str]
)
"""

Remote Sensing Visual Question Answering (RSVQA) High Resolution (HR)

The RSVQA HR dataset, proposed in "RSVQA: Visual Question Answering for Remote Sensing Data", Lobry et al. is a visual question answering (VQA) dataset of 772 512x512 high resolution (15cm) aerial RGB images extracted from the USGS High Resolution Orthoimagery (HRO) collection. Each image is annotated with a set of questions and their corresponding answers. Among other applications, this dataset can be used to train VQA models to perform detailed scene understanding of high resolution remote sensing imagery.

The dataset can be downloaded (15GB) using scripts/download_rsvqa_hr.sh and instantiated below:

import torchvision.transforms as T
from torchrs.datasets import RSVQALR

transform = T.Compose([T.ToTensor()])

dataset = RSVQALR(
    root="path/to/dataset/",
    split="train",  # or 'val', 'test'
    transform=transform
)

x = dataset[0]
"""
x: dict(
    x:         (3, 256, 256)
    questions:  List[str]
    answers:    List[str]
    types:      List[str]
)
"""

Remote Sensing Visual Question Answering BigEarthNet (RSVQAxBEN)

The RSVQAxBEN dataset, proposed in "RSVQA Meets BigEarthNet: A New, Large-Scale, Visual Question Answering Dataset for Remote Sensing", Lobry et al. is a version of the BigEarthNet dataset with visual question answering (VQA) annotations using the same method applied to generate annotations forthe RSVQA LR dataset. The dataset consists of 120x120 RGB Sentinel-2 imagery annotated with a set of questions and their corresponding answers.

The dataset can be downloaded (35.4GB) using scripts/download_rsvqaxben.sh and instantiated below:

import torchvision.transforms as T
from torchrs.datasets import RSVQAxBEN

transform = T.Compose([T.ToTensor()])

dataset = RSVQAxBEN(
    root="path/to/dataset/",
    split="train",  # or 'val', 'test'
    transform=transform
)

x = dataset[0]
"""
x: dict(
    x:         (3, 120, 120)
    questions:  List[str]
    answers:    List[str]
    types:      List[str]
)
"""

Remote Sensing Image Captioning Dataset (RSICD)

The RSICD dataset, proposed in "Exploring Models and Data for Remote Sensing Image Caption Generation", Lu et al. is an image captioning dataset with 5 captions per image for 10,921 224x224 RGB images extracted using Google Earth, Baidu Map, MapABC and Tianditu. While one of the larger remote sensing image captioning datasets, this dataset contains very repetitive language with little detail and many captions are duplicated.

The dataset can be downloaded (0.57GB) using scripts/download_rsicd.sh and instantiated below:

import torchvision.transforms as T
from torchrs.datasets import RSICD

transform = T.Compose([T.ToTensor()])

dataset = RSICD(
    root="path/to/dataset/",
    split="train",  # or 'val', 'test'
    transform=transform
)

x = dataset[0]
"""
x: dict(
    x:        (3, 224, 224)
    captions: List[str]
)
"""

Sydney Captions

The Sydney Captions dataset, proposed in "Deep semantic understanding of high resolution remote sensing image", Qu et al. is a version of the Sydney scene classification dataset proposed in "Saliency-Guided Unsupervised Feature Learning for Scene Classification", Zhang et al. The dataset contains 613 500x500 1ft resolution RGB images of Sydney, Australia extracted using Google Earth and is annotated with 5 captions per image.

The dataset can be downloaded (0.44GB) using scripts/download_sydney_captions.sh and instantiated below:

import torchvision.transforms as T
from torchrs.datasets import SydneyCaptions

transform = T.Compose([T.ToTensor()])

dataset = SydneyCaptions(
    root="path/to/dataset/",
    split="train",  # or 'val', 'test'
    transform=transform
)

x = dataset[0]
"""
x: dict(
    x:        (3, 500, 500)
    captions: List[str]
)
"""

UC Merced (UCM) Captions

The UC Merced (UCM) Captions dataset, proposed in "Deep semantic understanding of high resolution remote sensing image", Qu et al. is a version of the UCM land use classification dataset proposed in "Bag-Of-Visual-Words and Spatial Extensions for Land-Use Classification", Yang et al. The dataset contains 2100 256x256 1ft resolution RGB images of urban locations around the U.S. extracted from the USGS National Map Urban Area Imagery collection and is annotated with 5 captions per image.

The dataset can be downloaded (0.4GB) using scripts/download_ucm_captions.sh and instantiated below:

import torchvision.transforms as T
from torchrs.datasets import UCMCaptions

transform = T.Compose([T.ToTensor()])

dataset = UCMCaptions(
    root="path/to/dataset/",
    split="train",  # or 'val', 'test'
    transform=transform
)

x = dataset[0]
"""
x: dict(
    x:        (3, 256, 256)
    captions: List[str]
)
"""

Remote Sensing Image Scene Classification (RESISC45)

The RESISC45 dataset, proposed in "Remote Sensing Image Scene Classification: Benchmark and State of the Art", Cheng et al. is a scene classification dataset of 31,500 RGB images extracted using Google Earth. The dataset contains 45 scenes with 700 images per class from over 100 countries and was selected to optimize for high variability in image conditions (spatial resolution, occlusion, weather, illumination, etc.).

The dataset can be downloaded (0.47GB) using scripts/download_resisc45.sh and instantiated below:

import torchvision.transforms as T
from torchrs.datasets import RESISC45

transform = T.Compose([T.ToTensor()])

dataset = RESISC45(
    root="path/to/dataset/",
    transform=transform
)

x, y = dataset[0]
"""
x: (3, 256, 256)
y: int
"""

dataset.classes
"""
['airplane', 'airport', 'baseball_diamond', 'basketball_court', 'beach', 'bridge', 'chaparral',
'church', 'circular_farmland', 'cloud', 'commercial_area', 'dense_residential', 'desert', 'forest',
'freeway', 'golf_course', 'ground_track_field', 'harbor', 'industrial_area', 'intersection', 'island',
'lake', 'meadow', 'medium_residential', 'mobile_home_park', 'mountain', 'overpass', 'palace', 'parking_lot',
'railway', 'railway_station', 'rectangular_farmland', 'river', 'roundabout', 'runway', 'sea_ice', 'ship',
'snowberg', 'sparse_residential', 'stadium', 'storage_tank', 'tennis_court', 'terrace', 'thermal_power_station', 'wetland']
"""

EuroSAT

The EuroSAT dataset, proposed in "EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification", Helber et al. is a land cover classification dataset of 27,000 64x64 images taken by the ESA Sentinel-2 satellite. The dataset contains 10 land cover classes with 2-3k images per class from over 34 European countries. The dataset is available in the form of RGB only or all 13 Multispectral (MS) Sentinel-2 bands. This dataset is fairly easy with ~98.6% accuracy achievable with a ResNet-50.

The dataset can be downloaded (.13GB and 2.8GB) using scripts/download_eurosat_rgb.sh or scripts/download_eurosat_ms.sh and instantiated below:

import torchvision.transforms as T
from torchrs.transforms import ToTensor
from torchrs.datasets import EuroSATRGB, EuroSATMS

transform = T.Compose([T.ToTensor()])

dataset = EuroSATRGB(
    root="path/to/dataset/",
    transform=transform
)

x, y = dataset[0]
"""
x: (3, 64, 64)
y: int
"""

transform = T.Compose([ToTensor()])

dataset = EuroSATMS(
    root="path/to/dataset/",
    transform=transform
)

x, y = dataset[0]
"""
x: (13, 64, 64)
y: int
"""

dataset.classes
"""
['AnnualCrop', 'Forest', 'HerbaceousVegetation', 'Highway', 'Industrial',
'Pasture', 'PermanentCrop', 'Residential', 'River', 'SeaLake']
"""

SAT-4 & SAT-6

The SAT-4 & SAT-6 datasets, proposed in "DeepSat - A Learning framework for Satellite Imagery", Basu et al. are land cover classification datasets of 500k and 405k 28x28 RGBN images, respectively, sampled across the Continental United States (CONUS) and extracted from the National Agriculture Imagery Program (NAIP). The SAT-4 and SAT-6 datasets contain 4 and 6 land cover classes, respectively. This dataset is fairly easy with ~80% accuracy achievable with a 5-layer CNN.

The dataset can be downloaded (2.7GB) using scripts/download_sat.sh and instantiated below:

import torchvision.transforms as T
from torchrs.datasets import SAT4, SAT6

transform = T.Compose([T.ToTensor()])

dataset = SAT4(
    root="path/to/dataset/",
    split="train"   # or 'test'
    transform=transform
)

x, y = dataset[0]
"""
x: (4, 28, 28)
y: int
"""

dataset.classes
"""
['barren land', 'trees', 'grassland', 'other']
"""

dataset = SAT6(
    root="path/to/dataset/",
    split="train"   # or 'test'
    transform=transform
)

x, y = dataset[0]
"""
x: (4, 28, 28)
y: int
"""

dataset.classes
"""
['barren land', 'trees', 'grassland', 'roads', 'buildings', 'water']
"""

Aerial Image Dataset (AID)

The AID dataset, proposed in "AID: A Benchmark Dataset for Performance Evaluation of Aerial Scene Classification", Xia et al. is a scene classification dataset of 10k 600x600 RGB images extracted using Google Earth. The dataset contains 30 scenes with several hundred images per class from regions and countries around the world. This dataset is fairly easy with ~90% accuracy achievable with a VGG-16.

The dataset can be downloaded (2.6GB) using scripts/download_aid.sh and instantiated below:

import torchvision.transforms as T
from torchrs.datasets import AID

transform = T.Compose([T.ToTensor()])

dataset = AID(
    root="path/to/dataset/",
    transform=transform
)

x, y = dataset[0]
"""
x: (3, 600, 600)
y: int
"""

dataset.classes
"""
['Airport', 'BareLand', 'BaseballField', 'Beach', 'Bridge', 'Center', 'Church', 'Commercial',
'DenseResidential', 'Desert', 'Farmland', 'Forest', 'Industrial', 'Meadow', 'MediumResidential',
'Mountain', 'Park', 'Parking', 'Playground', 'Pond', 'Port', 'RailwayStation', 'Resort',
'River', 'School', 'SparseResidential', 'Square', 'Stadium', 'StorageTanks', 'Viaduct']
"""

Inria Aerial Image Labeling

The Inria Aerial Image Labeling Dataset is a building semantic segmentation dataset proposed in "Can semantic labeling methods generalize to any city? the inria aerial image labeling benchmark", Maggiori et al. of 360 high resolution (0.3m) 5000x5000 RGB imagery extracted from various international GIS services (e.g. USGS National Map). The dataset contains imagery from 10 regions around the world (both urban and rural) with train/test sets split into different cities for the purpose of evaluating if models can generalize across dramatically different locations. The dataset was originally used in the Inria Aerial Image Labeling Dataset Contest and the test set ground truth masks have not been released publicly.

The dataset can be downloaded (26GB) using scripts/download_inria_ail.sh and instantiated below:

from torchrs.transforms import Compose, ToTensor
from torchrs.datasets import InriaAIL

transform = Compose([ToTensor()])

dataset = InriaAIL(
    root="path/to/dataset/",
    split="train",  # or 'test'
    transform=transform
)

x = dataset[0]
"""
x: dict(
    x:         (3, 5000, 5000)
    mask:      (1, 5000, 5000)
    region:     str
)
"""

dataset.regions
"""
['austin', 'chicago', 'kitsap', 'tyrol', 'vienna']
"""

Dubai Segmentation

The Dubai Segmentation dataset is a semantic segmentation dataset of 72 high resolution ~700x700 RGB imagery taken by the MBRSC satellites. The dataset contains imagery from 9 regions across Dubai and contains masks with 6 categories.

The dataset can be downloaded (0.03GB) using scripts/download_dubai_segmentation.sh and instantiated below:

from torchrs.transforms import Compose, ToTensor
from torchrs.datasets import DubaiSegmentation

transform = Compose([ToTensor()])

dataset = DubaiSegmentation(
    root="path/to/dataset/",
    transform=transform
)

x = dataset[0]
"""
x: dict(
    x:         (3, h, w)
    mask:      (1, h, w)
    region:    str
)
"""

dataset.classes.keys()
"""
['Unlabeled', 'Water', 'Land (unpaved area)', 'Road', 'Building', 'Vegetation']
"""

GID-15

The Gaofen Image Dataset (GID-15) is a semantic segmentation dataset proposed in "Land-Cover Classification with High-Resolution Remote Sensing Images Using Transferable Deep Models", Tong et al. of 150 high resolution (3m) 6800x7200 RGB imagery taken by the Gaofen-2 satellite and contains pixel level annotations for 15 categories. The dataset was used in a challenge hosted by the IEEE ICCV 2021 1st Workshop on Learning to Understand Aerial Images and the test set ground truth masks have not been released publicly.

The dataset can be downloaded (36GB) using scripts/download_gid15.sh and instantiated below:

from torchrs.transforms import Compose, ToTensor
from torchrs.datasets import GID15

transform = Compose([ToTensor()])

dataset = GID15(
    root="path/to/dataset/",
    split="train",  # or 'val', 'test'
    transform=transform
)

x = dataset[0]
"""
x: dict(
    x:         (3, 6800, 7200)
    mask:      (1, 6800, 7200)
)
"""

dataset.classes
"""
['background', 'industrial_land', 'urban_residential', 'rural_residential', 'traffic_land', 'paddy_field',
'irrigated_land', 'dry_cropland', 'garden_plot', 'arbor_woodland', 'shrub_land', 'natural_grassland',
'artificial_grassland', 'river', 'lake', 'pond']
"""

TiSeLaC

The TiSeLaC dataset from the Time Series Land Cover Classification Challenge is a time series land cover classification dataset consisting of 23 2866x2633 medium resolution (30m) multispectral 10 band (7 reflectance + NDVI/NDWI/Brightness Index) images taken by the USGS Landsat 8 satellite. The imagery was captured over Reunion Island in 2014 and contains 9 land cover classes derived from the Corine Land Cover (CLC) map. Note that the dataset is formatted for pixelwise time-series classification where each time series is of the form (t, b) where t=23 samples and b=10 bands. This dataset is very easy with the top score currently standing at 0.9929 F1 Score.

The dataset can be downloaded (0.08GB) using scripts/download_tiselac.sh and instantiated below:

from torchrs.transforms import Compose, ToTensor
from torchrs.datasets import Tiselac

transform = Compose([ToTensor()])

dataset = Tiselac(
    root="path/to/dataset/",
    split="train"   # or 'test'
    transform=transform
)

x, y = dataset[0]
"""
x: (23, 10)
y: int
"""

dataset.classes
"""
['Urban Areas', 'Other built-up surfaces', 'Forests', 'Sparse Vegetation', 'Rocks and bare soil',
'Grassland', 'Sugarcane crops', 'Other crops', 'Water']
"""

UC Merced (UCM)

The UC Merced (UCM) dataset, proposed in "Bag-Of-Visual-Words and Spatial Extensions for Land-Use Classification", Yang et al. is a land use classification dataset of 21k 256x256 1ft resolution RGB images of urban locations around the U.S. extracted from the USGS National Map Urban Area Imagery collection with 21 land use classes (100 images per class).

The dataset can be downloaded (0.42GB) using scripts/download_ucm.sh and instantiated below:

import torchvision.transforms as T
from torchrs.datasets import UCM

transform = T.Compose([T.ToTensor()])

dataset = UCM(
    root="path/to/dataset/",
    transform=transform
)

x, y = dataset[0]
"""
x: (3, 256, 256)
y: int
"""

dataset.classes
"""
['agricultural', 'airplane', 'baseballdiamond', 'beach', 'buildings', 'chaparral', 'denseresidential',
'forest', 'freeway', 'golfcourse', 'harbor', 'intersection', 'mediumresidential', 'mobilehomepark',
'overpass', 'parkinglot', 'river', 'runway', 'sparseresidential', 'storagetanks', 'tenniscourt']
"""

PatternNet

The PatternNet dataset, proposed in "PatternNet: A Benchmark Dataset for Performance Evaluation of Remote Sensing Image Retrieval", Yang et al. is a image retrieval and scene classification dataset of 30,400 256x256 high resolution (.06-5m) RGB images extracted using Google Earth and Google Maps with 38 scene classes (800 images per class). This dataset was originally proposed as a remote sensing image retrieval (RSIR) dataset with classes selected for high intra-class diversity and inter-class similarity such that image retrieval requires learning fine-grained details between multiple classes. Additionally, this dataset has some unique classes not found in other scene classification datasets, e.g. oil well, nursing home, solar panel, etc.

The dataset can be downloaded (1.4GB) using scripts/download_patternnet.sh and instantiated below:

import torchvision.transforms as T
from torchrs.datasets import PatternNet

transform = T.Compose([T.ToTensor()])

dataset = PatternNet(
    root="path/to/dataset/",
    transform=transform
)

x, y = dataset[0]
"""
x: (3, 256, 256)
y: int
"""

dataset.classes
"""
['airplane', 'baseball_field', 'basketball_court', 'beach', 'bridge', 'cemetery', 'chaparral',
'christmas_tree_farm', 'closed_road', 'coastal_mansion', 'crosswalk', 'dense_residential',
'ferry_terminal', 'football_field', 'forest', 'freeway', 'golf_course', 'harbor', 'intersection',
'mobile_home_park', 'nursing_home', 'oil_gas_field', 'oil_well', 'overpass', 'parking_lot', 'parking_space',
'railway', 'river', 'runway', 'runway_marking', 'shipping_yard', 'solar_panel','sparse_residential',
'storage_tank', 'swimming_pool', 'tennis_court', 'transformer_station', 'wastewater_treatment_plant']
"""

WHU-RS19

The WHU-RS19 dataset, proposed in "Structural High-resolution Satellite Image Indexing", Xia et al. is a scene classification dataset of 1,005 600x600 high resolution (up to 0.5m) RGB images extracted using Google Earth with 19 scene classes (~50 images per class).

The dataset can be downloaded (0.11GB) using scripts/download_whu_rs19.sh and instantiated below:

import torchvision.transforms as T
from torchrs.datasets import WHURS19

transform = T.Compose([T.ToTensor()])

dataset = WHURS19(
    root="path/to/dataset/",
    transform=transform
)

x, y = dataset[0]
"""
x: (3, 600, 600)
y: int
"""

dataset.classes
"""
['Airport', 'Beach', 'Bridge', 'Commercial', 'Desert', 'Farmland','Forest', 'Industrial',
'Meadow', 'Mountain', 'Park', 'Parking', 'Pond', 'Port', 'Residential', 'River', 'Viaduct',
'footballField', 'railwayStation']
"""

RSSCN7

The RSSCN7 dataset, proposed in "Deep Learning Based Feature Selection for Remote Sensing Scene Classification", Zou et al. is a scene classification dataset of 2,800 400x400 high resolution RGB images extracted using Google Earth with 7 scene classes (400 images per class).

The dataset can be downloaded (0.36GB) using scripts/download_rsscn7.sh and instantiated below:

import torchvision.transforms as T
from torchrs.datasets import RSSCN7

transform = T.Compose([T.ToTensor()])

dataset = RSSCN7(
    root="path/to/dataset/",
    transform=transform
)

x, y = dataset[0]
"""
x: (3, 400, 400)
y: int
"""

dataset.classes
"""
['aGrass', 'bField', 'cIndustry', 'dRiverLake', 'eForest', 'fResident', 'gParking']
"""

Brazilian Coffee Scenes

The Brazilian Coffee Scenes dataset, proposed in "Do Deep Features Generalize from Everyday Objects to Remote Sensing and Aerial Scenes Domains?", Penatti et al. is a scene classification dataset of 2,876 64x64 3-band (Green, Red, NIR) images taken by the SPOT satellites in 2005 over four counties in the State of Minas Gerais, Brazil. This dataset was developed to classify coffee crops from non-coffee crops.

The dataset can be downloaded (0.01GB) using scripts/download_brazilian_coffee_.sh and instantiated below:

import torchvision.transforms as T
from torchrs.datasets import BrazilianCoffeeScenes

transform = T.Compose([T.ToTensor()])

dataset = BrazilianCoffeeScenes(
    root="path/to/dataset/",
    transform=transform
)

x, y = dataset[0]
"""
x: (3, 64, 64)
y: int
"""

dataset.classes
"""
['non-coffee', 'coffee']
"""

Models

Multi-Image Super Resolution - RAMS

Residual Attention Multi-image Super-resolution Network (RAMS) from "Multi-Image Super Resolution of Remotely Sensed Images Using Residual Attention Deep Neural Networks", Salvetti et al. (2021)

RAMS is currently one of the top performers on the PROBA-V Super Resolution Challenge. This Multi-image Super Resolution (MISR) architecture utilizes attention based methods to extract spatial and spatiotemporal features from a set of low resolution images to form a single high resolution image. Note that the attention methods are effectively Squeeze-and-Excitation blocks from "Squeeze-and-Excitation Networks", Hu et al..

import torch
from torchrs.models import RAMS

# increase resolution by factor of 3 (e.g. 128x128 -> 384x384)
model = RAMS(
    scale_factor=3,
    t=9,
    c=1,
    num_feature_attn_blocks=12
)

# Input should be of shape (bs, t, c, h, w), where t is the number
# of low resolution input images and c is the number of channels/bands
lr = torch.randn(1, 9, 1, 128, 128)
sr = model(lr) # (1, 1, 384, 384)

Change Detection - Fully Convolutional Early Fusion (FC-EF), Siamese Concatenation (FC-Siam-conc), and Siamese Difference (FC-Siam-diff)

Fully Convolutional Early Fusion (FC-EF), Siamese Concatenation (FC-Siam-conc), Siamese Difference (FC-Siam-conc) and are change detection segmentation architectures proposed in "Fully Convolutional Siamese Networks for Change Detection", Daudt et al.. The architectures are essentially modified U-Nets from "U-Net: Convolutional Networks for Biomedical Image Segmentation", Ronneberger et al.. FC-EF is a U-Net which takes as input the concatenated images. FC-Siam-conc and FC-Siam-diff are U-Nets with a shared encoder which concatenate or take the difference of the skip connections, respectively. Both models been modified to work with any number of input images t and channels c.

import torch
from torchrs.models import FCEF, FCSiamConc, FCSiamDiff

model = FCEF(
    channels=3,
    t=2,
    num_classes=2
)

model = FCSiamConc(
    channels=3,
    t=2,
    num_classes=2
)

model = FCSiamDiff(
    channels=3,
    t=2,
    num_classes=2
)


x = torch.randn(1, 2, 3, 128, 128)  # (b, t, c, h, w)
model(x)                            # (b, num_classes, h, w)

Change Detection - Early Fusion (EF) and Siamese (Siam)

Early Fusion (EF) and Siamese (Siam) are change detection architectures proposed along with the OSCD - Onera Satellite Change Detection dataset in "Urban Change Detection for Multispectral Earth Observation Using Convolutional Neural Networks", Daudt et al.. The architectures are effectively CNN classifiers which are trained to classify whether the central pixel of a set (typically a pair) of input patches contains change/no change. EF takes as input the concatenated images while Siam extracts feature vectors using a shared CNN and then feeds the concatenated vectors to a MLP classifier. Both models expect patches of size Cx15x15 but have been modified to work with any number of input images t and channels c.

import torch
from torchrs.models import EarlyFusion, Siam

model = EarlyFusion(
    channels=3,
    t=2,
    num_classes=2
)

model = Siam(
    channels=3,
    t=2,
    num_classes=2
)


x = torch.randn(1, 2, 3, 15, 15)  # (b, t, c, h, w)
model(x)                          # (b, num_classes, h, w)

Training

For training purposes, each model and dataset has been adapted into Pytorch Lightning LightningModules and LightningDataModules, respectively. The modules can be found in torchrs.train.modules and torchrs.train.datamodules. Among other things, Pytorch Lightning has the benefits of reducing boilerplate code, requiring minimal rewrite for multi-gpu/cluster training, supports mixed precision training, gradient accumulation, callbacks, logging metrics, etc.

To use the training features, torch-rs must be installed with the train extras.

# pypi
pip install 'torch-rs[train]'

# latest
pip install 'git+https://github.com/isaaccorley/torchrs.git#egg=torch-rs[train]'

A simple training example:

import torch
import torch.nn as nn
import pytorch_lightning as pl

from torchrs.train.modules import FCEFModule
from torchrs.train.datamodules import LEVIRCDPlusDataModule
from torchrs.transforms import Compose, ToTensor


def collate_fn(batch):
    x = torch.stack([x["x"] for x in batch])
    y = torch.cat([x["mask"] for x in batch])
    x = x.to(torch.float32)
    y = y.to(torch.long)
    return x, y 

transform = Compose([ToTensor()])
model = FCEFModule(channels=3, t=2, num_classes=2, lr=1E-3)
dm = LEVIRCDPlusDataModule(
    root="path/to/dataset",
    transform=transform,
    batch_size=4,
    num_workers=2,
    prefetch_factor=1,
    collate_fn=collate_fn,
    test_collate_fn=collate_fn,
    val_split=0.2
)
callbacks = [
    pl.callbacks.ModelCheckpoint(monitor="val_loss", mode="min", verbose=True, save_top_k=1),
    pl.callbacks.EarlyStopping(monitor="val_loss", mode="min", patience=10)
]
trainer = pl.Trainer(
    gpus=1,
    precision=16,
    accumulate_grad_batches=1,
    max_epochs=25,
    callbacks=callbacks,
    weights_summary="top"
)
trainer.fit(model, datamodule=dm)
trainer.test(datamodule=dm)

Tests

$ pytest -ra

isaaccorley/torchrs

isaaccorley

Reviews

Repository Details