Robust Multi-Modality Multi-Object Tracking
This is the project page for our ICCV2019 paper: Robust Multi-Modality Multi-Object Tracking.
Authors: Wenwei Zhang, Hui Zhou, Shuyang Sun, Zhe Wang, Jianping Shi, Chen Change Loy
[ArXiv]Â [Project Page]Â [Poster]
Introduction
In this work, we design a generic sensor-agnostic multi-modality MOT framework (mmMOT), where each modality (i.e., sensors) is capable of performing its role independently to preserve reliability, and further improving its accuracy through a novel multi-modality fusion module. Our mmMOT can be trained in an end-to-end manner, enables joint optimization for the base feature extractor of each modality and an adjacency estimator for cross modality. Our mmMOT also makes the first attempt to encode deep representation of point cloud in data association process in MOT.
For more details, please refer our paper.
Install
This project is based on pytorch>=1.0, you can install it following the official guide.
We recommand you to build a new conda environment to run the projects as follows:
conda create -n mmmot python=3.7 cython
conda activate mmmot
conda install pytorch torchvision -c pytorch
conda install numba
Then install packages from pip:
pip install -r requirements.txt
You can also follow the guide to install SECOND, we use the same environment as that for SECOND.
Usage
We provide several configs and scripts in the experiments
directory.
To evaluate the pretrained models or the reimplemented models you can run command
python -u eval_seq.py --config ${work_path}/config.yaml \
--load-path=${work_path}/${model} \
--result-path=${work_path}/results \
--result_sha=eval
The --result_sha
option is used to distinguish different evaluation attempts.
You can also simply run command like
sh ./experiments/pp_pv_40e_mul_A/eval.sh ${partition}
To train the model on your own, you can run command
python -u main.py --config ${work_path}/config.yaml \
--result-path=${work_path}/results
You can also simply run command like
sh ./experiments/pp_pv_40e_mul_A/train.sh ${partition}
Note: Both the train and eval scripts use srun as default, you can just comment them if you do not use srun.
Pretrain Model
We provide four models in the google drive.
The corresponding configs can be found in the experiments
directory.
Following the usage you can directly inference the model and get results as follows:
Name | Method | MOTA |
---|---|---|
pp_pv_40e_mul_A | Fusion Module A | 77.57 |
pp_pv_40e_mul_B | Fusion Module B | 77.62 |
pp_pv_40e_mul_C | Fusion Module C | 78.18 |
pp_pv_40e_dualadd_subabs_C | Fusion Module C++ | 80.08 |
The results of Fusion Module A,B and C are the same as those in the Table 1 of the paper.
The Fusion Module C++ indicates that it uses absolute subtraction
and softmax with addition
to improve the results, and has the same MOTA as that in the last row of Table 3 of the paper.
Data
Currently it supports PointPillar/SECOND detector, and also support RRC-Net detector.
In the paper, we train a PointPillars model to obtain the train/val detection results for ablation study, using the official codebase. The detection data are provided in the google drive. Once you download the two pkl files, put them in the data
directory.
We also provide the data split used in our paper in the data
directory. You need to download and unzip the data from the KITTI Tracking Benchmark and put them in the kitti_t_o
directory or any path you like.
Do remember to change the path in the configs.
The RRC detection are obtained from the link provided by MOTBeyondPixels. We use RRC detection for the KITTI Tracking Benchmark.
Citation
If you use this codebase or model in your research, please cite:
@InProceedings{mmMOT_2019_ICCV,
author = {Zhang, Wenwei and Zhou, Hui and Sun, Shuyang, and Wang, Zhe and Shi, Jianping and Loy, Chen Change},
title = {Robust Multi-Modality Multi-Object Tracking},
booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
month = {October},
year = {2019}
}
Acknowledgement
This code benefits a lot from SECOND and use the detection results provided by MOTBeyondPixels. The GHM loss implementation is from GHM_Detection.