Version 2.0 (stable)
News
- I have updated this repo to pytorch 2.0 and pytorch-lightning 2.0, support multi-gpu training, etc.
- Pretrained Weights of backbones can be access from Google Drive
- UNetFormer (accepted by ISPRS, PDF) and UAVid dataset are supported.
- ISPRS Vaihingen and Potsdam datasets are supported. Since private sharing is not allowed, you need to download the datasets from the official website and split them by Folder Structure.
- More networks are updated and the link of pretrained weights is provided.
- config/loveda/dcswin.py provides a detailed explain about config setting.
- Inference on huge RS images are supported (inference_huge_image.py).
Introduction
GeoSeg is an open-source semantic segmentation toolbox based on PyTorch, pytorch lightning and timm, which mainly focuses on developing advanced Vision Transformers for remote sensing image segmentation.
Major Features
-
Unified Benchmark
we provide a unified training script for various segmentation methods.
-
Simple and Effective
Thanks to pytorch lightning and timm , the code is easy for further development.
-
Supported Remote Sensing Datasets
- ISPRS Vaihingen and Potsdam
- UAVid
- LoveDA
- More datasets will be supported in the future.
-
Multi-scale Training and Testing
-
Inference on Huge Remote Sensing Images
Supported Networks
Folder Structure
Prepare the following folders to organize this repo:
airs
βββ GeoSeg (code)
βββ pretrain_weights (pretrained weights of backbones, such as vit, swin, etc)
βββ model_weights (save the model weights trained on ISPRS vaihingen, LoveDA, etc)
βββ fig_results (save the masks predicted by models)
βββ lightning_logs (CSV format training logs)
βββ data
β βββ LoveDA
β β βββ Train
β β β βββ Urban
β β β β βββ images_png (original images)
β β β β βββ masks_png (original masks)
β β β β βββ masks_png_convert (converted masks used for training)
β β β β βββ masks_png_convert_rgb (original rgb format masks)
β β β βββ Rural
β β β β βββ images_png
β β β β βββ masks_png
β β β β βββ masks_png_convert
β β β β βββ masks_png_convert_rgb
β β βββ Val (the same with Train)
β β βββ Test
β β βββ train_val (Merge Train and Val)
β βββ uavid
β β βββ uavid_train (original)
β β βββ uavid_val (original)
β β βββ uavid_test (original)
β β βββ uavid_train_val (Merge uavid_train and uavid_val)
β β βββ train (processed)
β β βββ val (processed)
β β βββ train_val (processed)
β βββ vaihingen
β β βββ train_images (original)
β β βββ train_masks (original)
β β βββ test_images (original)
β β βββ test_masks (original)
β β βββ test_masks_eroded (original)
β β βββ train (processed)
β β βββ test (processed)
β βββ potsdam (the same with vaihingen)
Install
Open the folder airs using Linux Terminal and create python environment:
conda create -n airs python=3.8
conda activate airs
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install -r GeoSeg/requirements.txt
Pretrained Weights of Backbones
Baidu Disk : 1234
Data Preprocessing
Download the datasets from the official website and split them yourself.
Vaihingen
Generate the training set.
python GeoSeg/tools/vaihingen_patch_split.py \
--img-dir "data/vaihingen/train_images" \
--mask-dir "data/vaihingen/train_masks" \
--output-img-dir "data/vaihingen/train/images_1024" \
--output-mask-dir "data/vaihingen/train/masks_1024" \
--mode "train" --split-size 1024 --stride 512
Generate the testing set.
python GeoSeg/tools/vaihingen_patch_split.py \
--img-dir "data/vaihingen/test_images" \
--mask-dir "data/vaihingen/test_masks_eroded" \
--output-img-dir "data/vaihingen/test/images_1024" \
--output-mask-dir "data/vaihingen/test/masks_1024" \
--mode "val" --split-size 1024 --stride 1024 \
--eroded
Generate the masks_1024_rgb (RGB format ground truth labels) for visualization.
python GeoSeg/tools/vaihingen_patch_split.py \
--img-dir "data/vaihingen/test_images" \
--mask-dir "data/vaihingen/test_masks" \
--output-img-dir "data/vaihingen/test/images_1024" \
--output-mask-dir "data/vaihingen/test/masks_1024_rgb" \
--mode "val" --split-size 1024 --stride 1024 \
--gt
As for the validation set, you can select some images from the training set to build it.
Potsdam
python GeoSeg/tools/potsdam_patch_split.py \
--img-dir "data/potsdam/train_images" \
--mask-dir "data/potsdam/train_masks" \
--output-img-dir "data/potsdam/train/images_1024" \
--output-mask-dir "data/potsdam/train/masks_1024" \
--mode "train" --split-size 1024 --stride 1024 --rgb-image
python GeoSeg/tools/potsdam_patch_split.py \
--img-dir "data/potsdam/test_images" \
--mask-dir "data/potsdam/test_masks_eroded" \
--output-img-dir "data/potsdam/test/images_1024" \
--output-mask-dir "data/potsdam/test/masks_1024" \
--mode "val" --split-size 1024 --stride 1024 \
--eroded --rgb-image
python GeoSeg/tools/potsdam_patch_split.py \
--img-dir "data/potsdam/test_images" \
--mask-dir "data/potsdam/test_masks" \
--output-img-dir "data/potsdam/test/images_1024" \
--output-mask-dir "data/potsdam/test/masks_1024_rgb" \
--mode "val" --split-size 1024 --stride 1024 \
--gt --rgb-image
UAVid
python GeoSeg/tools/uavid_patch_split.py \
--input-dir "data/uavid/uavid_train_val" \
--output-img-dir "data/uavid/train_val/images" \
--output-mask-dir "data/uavid/train_val/masks" \
--mode 'train' --split-size-h 1024 --split-size-w 1024 \
--stride-h 1024 --stride-w 1024
python GeoSeg/tools/uavid_patch_split.py \
--input-dir "data/uavid/uavid_train" \
--output-img-dir "data/uavid/train/images" \
--output-mask-dir "data/uavid/train/masks" \
--mode 'train' --split-size-h 1024 --split-size-w 1024 \
--stride-h 1024 --stride-w 1024
python GeoSeg/tools/uavid_patch_split.py \
--input-dir "data/uavid/uavid_val" \
--output-img-dir "data/uavid/val/images" \
--output-mask-dir "data/uavid/val/masks" \
--mode 'val' --split-size-h 1024 --split-size-w 1024 \
--stride-h 1024 --stride-w 1024
LoveDA
python GeoSeg/tools/loveda_mask_convert.py --mask-dir data/LoveDA/Train/Rural/masks_png --output-mask-dir data/LoveDA/Train/Rural/masks_png_convert
python GeoSeg/tools/loveda_mask_convert.py --mask-dir data/LoveDA/Train/Urban/masks_png --output-mask-dir data/LoveDA/Train/Urban/masks_png_convert
python GeoSeg/tools/loveda_mask_convert.py --mask-dir data/LoveDA/Val/Rural/masks_png --output-mask-dir data/LoveDA/Val/Rural/masks_png_convert
python GeoSeg/tools/loveda_mask_convert.py --mask-dir data/LoveDA/Val/Urban/masks_png --output-mask-dir data/LoveDA/Val/Urban/masks_png_convert
Training
"-c" means the path of the config, use different config to train different models.
python GeoSeg/train_supervision.py -c GeoSeg/config/uavid/unetformer.py
Testing
"-c" denotes the path of the config, Use different config to test different models.
"-o" denotes the output path
"-t" denotes the test time augmentation (TTA), can be [None, 'lr', 'd4'], default is None, 'lr' is flip TTA, 'd4' is multiscale TTA
"--rgb" denotes whether to output masks in RGB format
Vaihingen
python GeoSeg/vaihingen_test.py -c GeoSeg/config/vaihingen/dcswin.py -o fig_results/vaihingen/dcswin --rgb -t 'd4'
Potsdam
python GeoSeg/potsdam_test.py -c GeoSeg/config/potsdam/dcswin.py -o fig_results/potsdam/dcswin --rgb -t 'lr'
LoveDA (Online Testing)
python GeoSeg/loveda_test.py -c GeoSeg/config/loveda/dcswin.py -o fig_results/loveda/dcswin_test -t 'd4'
UAVid (Online Testing)
python GeoSeg/inference_uavid.py \
-i 'data/uavid/uavid_test' \
-c GeoSeg/config/uavid/unetformer.py \
-o fig_results/uavid/unetformer_r18 \
-t 'lr' -ph 1152 -pw 1024 -b 2 -d "uavid"
Inference on huge remote sensing image
python GeoSeg/inference_huge_image.py \
-i data/vaihingen/test_images \
-c GeoSeg/config/vaihingen/dcswin.py \
-o fig_results/vaihingen/dcswin_huge \
-t 'lr' -ph 512 -pw 512 -b 2 -d "pv"
Reproduction Results
Method | Dataset | F1 | OA | mIoU |
---|---|---|---|---|
UNetFormer | UAVid | - | - | 67.63 |
UNetFormer | Vaihingen | 90.30 | 91.10 | 82.54 |
UNetFormer | Potsdam | 92.64 | 91.19 | 86.52 |
UNetFormer | LoveDA | - | - | 52.97 |
FT-UNetFormer | Vaihingen | 91.17 | 91.74 | 83.98 |
FT-UNetFormer | Potsdam | 93.22 | 91.87 | 87.50 |
Due to some random operations in the training stage, reproduced results (run once) are slightly different from the reported in paper.
Citation
If you find this project useful in your research, please consider citingοΌ
- UNetFormer: A UNet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery
- A Novel Transformer Based Semantic Segmentation Scheme for Fine-Resolution Remote Sensing Images
- Transformer Meets Convolution: A Bilateral Awareness Network for Semantic Segmentation of Very Fine Resolution Urban Scene Images
- ABCNet: Attentive Bilateral Contextual Network for Efficient Semantic Segmentation of Fine-Resolution Remote Sensing Images
- Multiattention network for semantic segmentation of fine-resolution remote sensing images
- A2-FPN for semantic segmentation of fine-resolution remotely sensed images
Acknowledgement
We wish GeoSeg could serve the growing research of remote sensing by providing a unified benchmark and inspiring researchers to develop their own segmentation networks. Many thanks the following projects's contributions to GeoSeg.