AlphaRotate: A Rotation Detection Benchmark using TensorFlow
Abstract
AlphaRotate is mainly maintained by Xue Yang with Shanghai Jiao Tong University supervised by Prof. Junchi Yan.
Papers and codes related to remote sensing/aerial image detection: DOTA-DOAI .
Techniques:
- Dataset: DOTA, HRSC2016, ICDAR2015, ICDAR2017 MLT, MSRA-TD500, UCAS-AOD, FDDB, OHD-SJTU, SSDD++, Total-Text
- Baackbone: ResNet, MobileNetV2, EfficientNet, DarkNet53
- Neck: FPN, BiFPN
- Detectors:
- R2CNN (Faster-RCNN-H): R2CNN_Faster-RCNN_Tensorflow , DOTA-DOAI , R2CNN_FPN_Tensorflow (Deprecated)
- Double Head
- RRPN (Faster-RCNN-R): TF code
- SCRDet (ICCV19): R2CNN++ , IoU-Smooth L1 Loss
- RetinaNet-H, RetinaNet-R: TF code
- RefineRetinaNet (CascadeRetinaNet)
- ATSS
- FCOS
- RSDet (AAAI21): TF code
- RSDet++ (FCOS-RSDet)
- R3Det (AAAI21): TF code , PyTorch code1 , PyTorch code2 , Pytorch code3
- Circular Smooth Label (CSL ECCV20): TF code , PyTorch code
- Densely Coded Label (DCL CVPR21): TF code
- GWD (ICML21): PyTorch code
- BCD (TPAMI22): PyTorch code
- KLD (NeurIPS21): PyTorch code
- RIDet (GRSL): PyTorch code
- KFIoU (ICLR23): PyTorch code
- Mixed method: R3Det-DCL, R3Det-GWD, R3Det-BCD, R3Det-KLD, FCOS-RSDet, R2CNN-BCD, R2CNN-KF
- Loss: CE, Focal Loss, Smooth L1 Loss, IoU-Smooth L1 Loss, Modulated Loss
- Others: SWA, exportPb, MMdnn
The above-mentioned rotation detectors are all modified based on the following horizontal detectors:
- Faster RCNN: TF code
- R-FCN: TF code
- FPN: TF code1 , TF code2 (Deprecated)
- Cascade RCNN: TF code
- Cascade FPN RCNN: TF code
- RetinaNet: TF code
- RefineDet: MxNet code
- FCOS: TF code , MxNet code
Projects
Latest Performance
DOTA (Task1)
Baseline
Backbone | Neck | Training/test dataset | Data Augmentation | Epoch | NMS |
---|---|---|---|---|---|
ResNet50_v1d 600->800 | FPN | trainval/test | Γ | 13 (AP50) or 17 (AP50:95) is enough for baseline (default is 13) | gpu nms (slightly worse <1% than cpu nms but faster) |
Method | Baseline | DOTA1.0 | DOTA1.5 | DOTA2.0 | Model | Anchor | Angle Pred. | Reg. Loss | Angle Range | Configs |
---|---|---|---|---|---|---|---|---|---|---|
- | RetinaNet-R | 67.25 | 56.50 | 42.04 | Baidu Drive (bi8b) | R | Reg. (ββ¬) | smooth L1 | [-90,0) | dota1.0, dota1.5, dota2.0 |
- | RetinaNet-H | 64.17 | 56.10 | 43.06 | Baidu Drive (bi8b) | H | Reg. (ββ¬) | smooth L1 | [-90,90) | dota1.0, dota1.5, dota2.0 |
- | RetinaNet-H | 65.33 | 57.21 | 44.58 | Baidu Drive (bi8b) | H | Reg. (sinβ¬, cosβ¬) | smooth L1 | [-90,90) | dota1.0, dota1.5, dota2.0 |
- | RetinaNet-H | 65.73 | 58.87 | 44.16 | Baidu Drive (bi8b) | H | Reg. (ββ¬) | smooth L1 | [-90,0) | dota1.0, dota1.5, dota2.0 |
IoU-Smooth L1 | RetinaNet-H | 66.99 | 59.17 | 46.31 | Baidu Drive (qcvc) | H | Reg. (ββ¬) | iou-smooth L1 | [-90,0) | dota1.0, dota1.5, dota2.0 |
RIDet | RetinaNet-H | 66.06 | 58.91 | 45.35 | Baidu Drive (njjv) | H | Quad. | hungarian loss | - | dota1.0, dota1.5, dota2.0 |
RSDet | RetinaNet-H | 67.27 | 61.42 | 46.71 | Baidu Drive (2a1f) | H | Quad. | modulated loss | - | dota1.0, dota1.5, dota2.0 |
CSL | RetinaNet-H | 67.38 | 58.55 | 43.34 | Baidu Drive (sdbb) | H | Cls.: Gaussian (r=1, w=10) | smooth L1 | [-90,90) | dota1.0, dota1.5, dota2.0 |
DCL | RetinaNet-H | 67.39 | 59.38 | 45.46 | Baidu Drive (m7pq) | H | Cls.: BCL (w=180/256) | smooth L1 | [-90,90) | dota1.0, dota1.5, dota2.0 |
- | FCOS | 67.69 | 61.05 | 48.10 | Baidu Drive (pic4) | - | Quad | smooth L1 | - | dota1.0, dota1.5, dota2.0 |
RSDet++ | FCOS | 67.91 | 62.18 | 48.81 | Baidu Drive (8ww5) | - | Quad | modulated loss | - | dota1.0, dota1.5 dota2.0 |
GWD | RetinaNet-H | 68.93 | 60.03 | 46.65 | Baidu Drive (7g5a) | H | Reg. (ββ¬) | gwd | [-90,0) | dota1.0, dota1.5, dota2.0 |
GWD + SWA | RetinaNet-H | 69.92 | 60.60 | 47.63 | Baidu Drive (qcn0) | H | Reg. (ββ¬) | gwd | [-90,0) | dota1.0, dota1.5, dota2.0 |
BCD | RetinaNet-H | 71.23 | 60.78 | 47.48 | Baidu Drive (0puk) | H | Reg. (ββ¬) | bcd | [-90,0) | dota1.0, dota1.5, dota2.0 |
KLD | RetinaNet-H | 71.28 | 62.50 | 47.69 | Baidu Drive (o6rv) | H | Reg. (ββ¬) | kld | [-90,0) | dota1.0, dota1.5, dota2.0 |
KFIoU | RetinaNet-H | 70.64 | 62.71 | 48.04 | Baidu Drive (o72o) | H | Reg. (ββ¬) | kfiou | [-90,0) | dota1.0, dota1.5, dota2.0 |
KFIoU* | RetinaNet-H | 71.60 | - | 48.94 | Baidu Drive (o72o) | H | Reg. (ββ¬) | kfiou | [-90,0) | dota1.0, dota2.0 |
R3Det | RetinaNet-H | 70.66 | 62.91 | 48.43 | Baidu Drive (n9mv) | H->R | Reg. (ββ¬) | smooth L1 | [-90,0) | dota1.0, dota1.5, dota2.0 |
DCL | R3Det | 71.21 | 61.98 | 48.71 | Baidu Drive (eg2s) | H->R | Cls.: BCL (w=180/256) | iou-smooth L1 | [-90,0)->[-90,90) | dota1.0, dota1.5, dota2.0 |
GWD | R3Det | 71.56 | 63.22 | 49.25 | Baidu Drive (jb6e) | H->R | Reg. (ββ¬) | smooth L1->gwd | [-90,0) | dota1.0, dota1.5, dota2.0 |
BCD | R3Det | 72.22 | 63.53 | 49.71 | Baidu Drive (v60g) | H->R | Reg. (ββ¬) | bcd | [-90,0) | dota1.0, dota1.5, dota2.0 |
KLD | R3Det | 71.73 | 65.18 | 50.90 | Baidu Drive (tq7f) | H->R | Reg. (ββ¬) | kld | [-90,0) | dota1.0, dota1.5, dota2.0 |
KFIoU | R3Det | 72.28 | 64.69 | 50.41 | Baidu Drive (u77v) | H->R | Reg. (ββ¬) | kfiou | [-90,0) | dota1.0, dota1.5, dota2.0 |
- | R2CNN (Faster-RCNN) | 72.27 | 66.45 | 52.35 | Baidu Drive (02s5) | H->R | Reg. (ββ¬) | smooth L1 | [-90,0) | dota1.0, dota1.5 dota2.0 |
SOTA
Method | Backbone | DOTA1.0 | Model | MS | Data Augmentation | Epoch | Configs |
---|---|---|---|---|---|---|---|
R2CNN-BCD | ResNet152_v1d-FPN | 79.54 | Baidu Drive (h2u1) | β | β | 34 | dota1.0 |
RetinaNet-BCD | ResNet152_v1d-FPN | 78.52 | Baidu Drive (0puk) | β | β | 51 | dota1.0 |
R3Det-BCD | ResNet50_v1d-FPN | 79.08 | Baidu Drive (v60g) | β | β | 51 | dota1.0 |
R3Det-BCD | ResNet152_v1d-FPN | 79.95 | Baidu Drive (v60g) | β | β | 51 | dota1.0 |
Note:
- Single GPU training: SAVE_WEIGHTS_INTE = iter_epoch * 1 (DOTA1.0: iter_epoch=27000, DOTA1.5: iter_epoch=32000, DOTA2.0: iter_epoch=40000)
- Multi-GPU training (better): SAVE_WEIGHTS_INTE = iter_epoch * 2
My Development Environment
- python3.5 (anaconda recommend)
- cuda 10.0
- opencv-python 4.1.1.26 (important)
- tfplot 0.2.0 (optional)
- tensorflow-gpu 1.13
- tqdm 4.54.0
- Shapely 1.7.1
Installation
Manual configuration (cuda version < 11)
pip install -r requirements.txt
pip install -v -e . # or "python setup.py develop"
Or, you can simply install AlphaRotate with the following command:
pip install alpharotate # Not suitable for dev.
Docker (cuda version < 11)
docker images: yangxue2docker/yx-tf-det:tensorflow1.13.1-cuda10-gpu-py3
Note: For 30xx series graphics cards (cuda version >= 11), I recommend this blog to install tf1.xx, or download image from tensorflow-release-notes according to your development environment, e.g. nvcr.io/nvidia/tensorflow:20.11-tf1-py3
cd alpharotate/libs/utils/cython_utils
rm *.so
rm *.c
rm *.cpp
python setup.py build_ext --inplace (or make)
cd alpharotate/libs/utils/
rm *.so
rm *.c
rm *.cpp
python setup.py build_ext --inplace
Download Model
Pretrain weights
Download a pretrain weight you need from the following three options, and then put it to $PATH_ROOT/dataloader/pretrained_weights.
- MxNet pretrain weights (recommend in this repo, default in NET_NAME): resnet_v1d, resnet_v1b, refer to gluon2TF.
- Tensorflow pretrain weights: resnet50_v1, resnet101_v1, resnet152_v1, efficientnet, mobilenet_v2, darknet53 (Baidu Drive (1jg2), Google Drive).
- PyTorch pretrain weights, refer to pretrain_zoo.py and Others.
Trained weights
- Please download trained models by this project, then put them to $PATH_ROOT/output/pretained_weights.
Train
-
If you want to train your own dataset, please note:
(1) Select the detector and dataset you want to use, and mark them as #DETECTOR and #DATASET (such as #DETECTOR=retinanet and #DATASET=DOTA) (2) Modify parameters (such as CLASS_NUM, DATASET_NAME, VERSION, etc.) in $PATH_ROO./configs/#DATASET/#DETECTOR/cfgs_xxx.py (3) Copy $PATH_ROO./configs/#DATASET/#DETECTOR/cfgs_xxx.py to $PATH_ROO./configs/cfgs.py (4) Add category information in $PATH_ROOT/libs/label_name_dict/label_dict.py (5) Add data_name to $PATH_ROOT/dataloader/dataset/read_tfrecord.py
-
Make tfrecord
If image is very large (such as DOTA dataset), the image needs to be cropped. Take DOTA dataset as a example:cd $PATH_ROOT/dataloader/dataset/DOTA python data_crop.py
If image does not need to be cropped, just convert the annotation file into xml format, refer to example.xml.
cd $PATH_ROOT/dataloader/dataset/ python convert_data_to_tfrecord.py --root_dir='/PATH/TO/DOTA/' --xml_dir='labeltxt' --image_dir='images' --save_name='train' --img_format='.png' --dataset='DOTA'
-
Start training
cd $PATH_ROOT/tools/#DETECTOR python train.py
Test
-
For large-scale image, take DOTA dataset as a example (the output file or visualization is in $PATH_ROOT/tools/#DETECTOR/test_dota/VERSION):
cd $PATH_ROOT/tools/#DETECTOR python test_dota.py --test_dir='/PATH/TO/IMAGES/' --gpus=0,1,2,3,4,5,6,7 -ms (multi-scale testing, optional) -s (visualization, optional) or (recommend in this repo, better than multi-scale testing) python test_dota_sota.py --test_dir='/PATH/TO/IMAGES/' --gpus=0,1,2,3,4,5,6,7 -s (visualization, optional)
Notice: In order to set the breakpoint conveniently, the read and write mode of the file is' a+'. If the model of the same #VERSION needs to be tested again, the original test results need to be deleted.
-
For small-scale image, take HRSC2016 dataset as a example:
cd $PATH_ROOT/tools/#DETECTOR python test_hrsc2016.py --test_dir='/PATH/TO/IMAGES/' --gpu=0 --image_ext='bmp' --test_annotation_path='/PATH/TO/ANNOTATIONS' -s (visualization, optional)
Tensorboard
cd $PATH_ROOT/output/summary
tensorboard --logdir=.
Citation
If you find our code useful for your research, please consider cite.
@article{yang2021alpharotate,
author = {Yang, Xue and Zhou, Yue and Yan, Junchi},
title = {AlphaRotate: A Rotation Detection Benchmark using TensorFlow},
journal = {arXiv preprint arXiv:2111.06677},
year = {2021},
}
Reference
1γhttps://github.com/endernewton/tf-faster-rcnn
2γhttps://github.com/zengarden/light_head_rcnn
3γhttps://github.com/tensorflow/models/tree/master/research/object_detection
4γhttps://github.com/fizyr/keras-retinanet