Multispectral-Object-Detection
Intro
Official Code for Cross-Modality Fusion Transformer for Multispectral Object Detection.
Multispectral Object Detection with Transformer and Yolov5
Abstract
Multispectral image pairs can provide the combined information, making object detection applications more reliable and robust in the open world. To fully exploit the different modalities, we present a simple yet effective cross-modality feature fusion approach, named Cross-Modality Fusion Transformer (CFT) in this paper. Unlike prior CNNs-based works, guided by the Transformer scheme, our network learns long-range dependencies and integrates global contextual information in the feature extraction stage. More importantly, by leveraging the self attention of the Transformer, the network can naturally carry out simultaneous intra-modality and inter-modality fusion, and robustly capture the latent interactions between RGB and Thermal domains, thereby significantly improving the performance of multispectral object detection. Extensive experiments and ablation studies on multiple datasets demonstrate that our approach is effective and achieves state-of-the-art detection performance.
Demo
Night Scene
Day Scene
Overview
Citation
If you use this repo for your research, please cite our paper:
@article{fang2021cross,
title={Cross-Modality Fusion Transformer for Multispectral Object Detection},
author={Fang Qingyun and Han Dapeng and Wang Zhaokui},
journal={arXiv preprint arXiv:2111.00273},
year={2021}
}
Installation
Python>=3.6.0 is required with all requirements.txt installed including PyTorch>=1.7 (The same as yolov5 https://github.com/ultralytics/yolov5 ).
Clone the repo
git clone https://github.com/DocF/multispectral-object-detection
Install requirements
$ cd multispectral-object-detection
$ pip install -r requirements.txt
Dataset
-[FLIR] [Google Drive] [Baidu Drive] extraction code:qwer
A new aligned version.
-[LLVIP] download
-[VEDAI] download
You need to convert all annotations to YOLOv5 format.
Refer: https://github.com/ultralytics/yolov5/wiki/Train-Custom-Data
Run
Download the pretrained weights
yolov5 weights (pre-train)
-[yolov5s] google drive
-[yolov5m] google drive
-[yolov5l] google drive
-[yolov5x] google drive
CFT weights
-[LLVIP] google drive
-[FLIR] google drive
Change the data cfg
some example in data/multispectral/
Change the model cfg
some example in models/transformer/
note!!! we used xxxx_transfomerx3_dataset.yaml in our paper.
Train Test and Detect
train: python train.py
test: python test.py
detect: python detect_twostream.py
Results
Dataset | CFT | mAP50 | mAP75 | mAP |
---|---|---|---|---|
FLIR | 73.0 | 32.0 | 37.4 | |
FLIR | βοΈ | 78.7 (Ξ5.7) | 35.5 (Ξ3.5) | 40.2 (Ξ2.8) |
LLVIP | 95.8 | 71.4 | 62.3 | |
LLVIP | βοΈ | 97.5 (Ξ1.7) | 72.9 (Ξ1.5) | 63.6 (Ξ1.3) |
VEDAI | 79.7 | 47.7 | 46.8 | |
VEDAI | βοΈ | 85.3 (Ξ5.6) | 65.9(Ξ18.2) | 56.0 (Ξ9.2) |
LLVIP
Log Average Miss Rate
Model | Log Average Miss Rate |
---|---|
YOLOv3-RGB | 37.70% |
YOLOv3-IR | 17.73% |
YOLOv5-RGB | 22.59% |
YOLOv5-IR | 10.66% |
Baseline(Ours) | 6.91% |
CFT(Ours) | 5.40% |
Miss Rate - FPPI curve