[ECCV22 Oral] P3AFormer: Tracking Objects as Pixel-wise Distributions
This is the official code for our ECCV22 oral paper: tracking objects as pixel-wise distributions.
Authors: Zelin Zhao, Ze Wu, Yueqing Zhuang, Boxun Li, Jiaya Jia.
The proposed model diagram is:
The association scheme is:
Installation
-
Install PyTorch and torch vision. We tested on Pytorch==1.9.0+cu111, and torchvision==0.10.0+cu111. Other versions might work as well.
-
Install detectron2 following the official guidance. We use detectron2 == 0.6.0. Other versions might work as well.
-
Install other dependencies.
pip install -r requirements.txt # install gdown to download datasets from Google. pip install gdown pip install --upgrade gdown cd models/ops bash make.sh # build extensions if you don't have one, and CUDA is needed.
We give a full list of our conda in export_requirements.txt.
Download datasets
-
Download MOT17/MOT20 from the official website. Alternately, you can download it via the following commands.
mkdir data cd data wget https://motchallenge.net/data/MOT17.zip wget https://motchallenge.net/data/MOT20.zip unzip MOT17.zip unzip MOT20.zip wget https://motchallenge.net/data/MOT15.zip # optional unzip MOT15.zip
-
Download CityPersons from the official website. Alternately, you can download it via the following commands. When using the following commands, you need to have access to the google drive.
gdown --no-cookies --id 1DgLHqEkQUOj63mCrS_0UGFEM9BG8sIZs gdown --no-cookies --id 1BH9Xz59UImIGUdYwUR-cnP1g7Ton_LcZ gdown --no-cookies --id 1q_OltirP68YFvRWgYkBHLEFSUayjkKYE gdown --no-cookies --id 1VSL0SFoQxPXnIdBamOZJzHrHJ1N2gsTW cat Citypersons.z01 Citypersons.z02 Citypersons.z03 Citypersons.zip > c.zip zip -FF Citypersons.zip --out c.zip unzip c.zip mv Citypersons Cityscapes
You may need to retry the downloading if you find the "max retries" error. You may need to upgrade gdown if you meet the "permission" error. If you find the "End-of-centdir" error in unzip, you may need to redownload some of the downloaded files. If you meet the unzip error, you need to recover the zip file first.
-
Download CrowdHuman from the official website. Alternately, you can download it via the following commands.
gdown --no-cookies --id 134QOvaatwKdy0iIeNqA_p-xkAhkV4F8Y gdown --no-cookies --id 17evzPh7gc1JBNvnW1ENXLy5Kr4Q_Nnla gdown --no-cookies --id 1tdp0UCgxrqy1B6p8LkR-Iy0aIJ8l4fJW gdown --no-cookies --id 18jFI789CoHTppQ7vmRSFEdnGaSQZ4YzO gdown --no-cookies --id 1UUTea5mYqvlUObsC1Z8CFldHJAtLtMX3 gdown --no-cookies --id 10WIRwu8ju8GRLuCkZ_vT6hnNxs5ptwoL # then unzip the downloaded files
-
Download ETH from the official website. Alternately, you can download it via the following commands.
gdown --no-cookies --id 19QyGOCqn8K_rc9TXJ8UwLSxCx17e0GoY unzip ETHZ.zip
-
Then symbol link the downloaded data to the data folder under the project root. The downloaded data should be like:
data |ββββββmot # this is MOT17 | βββββββtrain | βββββββtest βββββββcrowdhuman | βββββββCrowdhuman_train | βββββββCrowdhuman_val | βββββββannotation_train.odgt | βββββββannotation_val.odgt βββββββMOT20 | βββββββtrain | βββββββtest βββββββCityscapes | βββββββimages | βββββββlabels_with_ids βββββββETHZ βββββββeth01 βββββββ... βββββββeth07
Preprocess Dataset
Convert downloaded data to the standard COCO format. The preprocess tools are inherited from ByteTrack. PROJECT_ROOT is the root path to this project.
```bash
cd ${PROJECT_ROOT}
bash preprocess/data_preprocess.sh
```
Training
- CoCo pretraining:
bash configs/standard/v100_mot17_coco.sh
- CrowdHuman pretraining:
bash configs/standard/v100_mot17_crowdhuman.sh
- MOT17 training:
bash configs/standard/v100_mot17_fine_tune_mot17.sh
Tracking
Get training performance scores (note this command needs only a 2080ti card to run):
bash configs/standard/v100_test_mot17.sh
Submit the results to MOT17 challenge website (note this command needs only a 2080ti card to run):
bash configs/standard/v100_submit_mot17.sh
Result and Models on MOT17
Method | Detector | Train Set | Test Set | Public | Inf time (fps) | HOTA | MOTA | IDF1 | FP | FN | IDSw. | Config | Download |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
P3AFormer | Our trained | CrowdHuman + MOT17--train | MOT17-test | N | - | 55.9 | 69.3 | 68.9 | 19,275 | 151,200 | 2904 | config | model/submission |
The Detectron2 version of the P3AFormer, some other parts and the whitles and bells are not organized well yet (on other dev branches, will be merged into main branch), however, you may find the raw codes at this Google drive link.
Code Structure
The code flow diagram is provided as follows:
TODO List
- Jupyter notebook support for a quick demo.
- YOLO-X style tracking objects as pixel-wise distributions.
Tips & QAs
-
What if the CUDA OOM happens when killing the program via "ctrl + C"?
Try this command to kill all programs using GPUs, do not run this if you have other useful processes using GPUs:
lsof /dev/nvidia* | awk '{print $2}' | xargs -I {} kill {}
-
How to develop?
Using the debug scripts under the configs folder first and then run experiments on 2080Ti / V100.
Citation & Acknowledgements
If you find our code helpful, please cite our paper:
@misc{zhao2022tracking,
title={Tracking Objects as Pixel-wise Distributions},
author={Zelin Zhao and Ze Wu and Yueqing Zhuang and Boxun Li and Jiaya Jia},
year={2022},
eprint={2207.05518},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
If you have any questions, please contact me at: [email protected].
This code uses codes from MOTR, Transcenter-V1 and ByteTrack. Many thanks to their wonderful work. Consider citing them as well:
@inproceedings{zeng2021motr,
title={MOTR: End-to-End Multiple-Object Tracking with TRansformer},
author={Zeng, Fangao and Dong, Bin and Zhang, Yuang and Wang, Tiancai and Zhang, Xiangyu and Wei, Yichen},
booktitle={European Conference on Computer Vision (ECCV)},
year={2022}
}
@misc{xu2021transcenter,
title={TransCenter: Transformers with Dense Representations for Multiple-Object Tracking},
author={Yihong Xu and Yutong Ban and Guillaume Delorme and Chuang Gan and Daniela Rus and Xavier Alameda-Pineda},
year={2021},
eprint={2103.15145},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
@article{zhang2022bytetrack,
title={ByteTrack: Multi-Object Tracking by Associating Every Detection Box},
author={Zhang, Yifu and Sun, Peize and Jiang, Yi and Yu, Dongdong and Weng, Fucheng and Yuan, Zehuan and Luo, Ping and Liu, Wenyu and Wang, Xinggang},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2022}
}