• Stars
    star
    108
  • Rank 311,192 (Top 7 %)
  • Language
    Python
  • License
    MIT License
  • Created 12 months ago
  • Updated 6 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Official PyTorch implementation of SparseTrack (the new version of code will come soon)

SparseTrack

SparseTrack is a simply and strong multi-object tracker.

PWC

PWC

SparseTrack: Multi-Object Tracking by Performing Scene Decomposition based on Pseudo-Depth

Zelin Liu, Xinggang Wang, Cheng Wang, Wenyu Liu, Xiang Bai

arXiv 2306.05238

News

  • Add yolov8 detector for tracking, please refer to branch v8.

Abstract

Exploring robust and efficient association methods has always been an important issue in multiple-object tracking (MOT). Although existing tracking methods have achieved impressive performance, congestion and frequent occlusions still pose challenging problems in multi-object tracking. We reveal that performing sparse decomposition on dense scenes is a crucial step to enhance the performance of associating occluded targets. To this end, we propose a pseudo-depth estimation method for obtaining the relative depth of targets from 2D images. Secondly, we design a depth cascading matching (DCM) algorithm, which can use the obtained depth information to convert a dense target set into multiple sparse target subsets and perform data association on these sparse target subsets in order from near to far. By integrating the pseudo-depth method and the DCM strategy into the data association process, we propose a new tracker, called SparseTrack. SparseTrack provides a new perspective for solving the challenging crowded scene MOT problem. Only using IoU matching, SparseTrack achieves comparable performance with the state-of-the-art (SOTA) methods on the MOT17 and MOT20 benchmarks.

Tracking performance

Results on MOT challenge test set

Dataset HOTA MOTA IDF1 MT ML FP FN IDs
MOT17 65.1 81.0 80.1 54.6% 14.3% 23904 81927 1170
MOT20 63.4 78.2 77.3 69.9% 9.2% 25108 86720 1116

Comparison on DanceTrack test set

Method HOTA DetA AssA MOTA IDF1
SparseTrack 55.5 (+7.8) 78.9 (+7.9) 39.1 (+7.0) 91.3 (+1.7) 58.3 (+4.4)
ByteTrack 47.7 71.0 32.1 89.6 53.9

Notes:

  • All the inference experiments are performed on 1 NVIDIA GeForce RTX 3090 GPUs.
  • Each experiment uses the same detector and model weights as ByteTrack .
  • SparseTrack relies on IoU distance association only and do not use any appearance embedding, learnable motion, and attention components.

Installation

Dependence

This project is an implementation version of Detectron2 and requires the compilation of OpenCV, Boost.

Compile GMC(Globle Motion Compensation) module

step 1: Downloading pbcvt, copy the python_module.cpp to the path <pbcvt/src/>.

step 2: Adding the relevant OpenCV modules in the pbcvt/CMakeLists.txt file. Here's what you should do: locate the line "find_package(OpenCV COMPONENTS REQUIRED)" in the CMakeLists.txt file and replace it with "find_package(OpenCV COMPONENTS core highgui video videoio videostab REQUIRED)".

step 3: Modifying the compilation path in the Makefile file before compiling pbcvt. The main modifications include updating the following entries:CMAKE_SOURCE_DIR, CMAKE_BINARY_DIR, cmake_progress_start.

step 4: Compiling pbcvt. For example, you can run the following script:

cmake -DPYTHON_DESIRED_VERSION=3.X  -DPYTHON3_INCLUDE_DIR=/home/lzl/miniconda3/envs/d2/include/python3.9 -DPYTHON3_NUMPY_INCLUDE_DIRS=/home/lzl/miniconda3/envs/d2/lib/python3.9/site-packages/numpy -DPYTHON3_LIBRARY=/home/lzl/miniconda3/envs/d2/lib/libpython3.9.so

# and then, running:
make 

step 5: Please copy the "pbcvt.xxxxxx.so" file compiled via pbcvt to the <ROOT/SparseTrack/tracker/> directory.

Install

git clone https://github.com/hustvl/SparseTrack.git
cd SparseTrack
pip install -r requirements.txt
pip install Cython  
pip install cython_bbox

Data preparation

Download MOT17, MOT20, CrowdHuman, Cityperson, ETHZ and put them under ROOT/ in the following structure:

ROOT
   |
   |——————SparseTrack(repo)
   |           └—————mix
   |                  └——————mix_17/annotations
   |                  └——————mix_20/annotations
   |                  └——————ablation_17/annotations
   |                  └——————ablation_20/annotations
   |——————MOT17
   |        └——————train
   |        └——————test
   └——————crowdhuman
   |         └——————Crowdhuman_train
   |         └——————Crowdhuman_val
   |         └——————annotation_train.odgt
   |         └——————annotation_val.odgt
   └——————MOT20
   |        └——————train
   |        └——————test
   └——————Citypersons
   |        └——————images
   |        └——————labels_with_ids
   └——————ETHZ
   |        └——————eth01
   |        └——————...
   |        └——————eth07
   └——————dancetrack
               └——————train
               └——————train_seqmap.txt
               └——————test
               └——————test_seqmap.txt
               └——————val
               └——————val_seqmap.txt

   

Then, you need to turn the datasets to COCO format and mix different training data:

cd <ROOT>/SparseTrack
python3 tools/convert_mot17_to_coco.py
python3 tools/convert_mot20_to_coco.py
python3 tools/convert_crowdhuman_to_coco.py
python3 tools/convert_cityperson_to_coco.py
python3 tools/convert_ethz_to_coco.py
python3 tools/convert_dance_to_coco.py

Creating different training mix_data:

cd <ROOT>/SparseTrack

# training on CrowdHuman and MOT17 half train, evaluate on MOT17 half val.
python3 tools/mix_data_ablation.py

# training on CrowdHuman and MOT20 half train, evaluate on MOT20 half val.
python3 tools/mix_data_ablation_20.py

# training on MOT17, CrowdHuman, ETHZ, Citypersons, evaluate on MOT17 train.
python3 tools/mix_data_test_mot17.py

# training on MOT20 and CrowdHuman, evaluate on MOT20 train.
python3 tools/mix_data_test_mot20.py

Model zoo

See ByteTrack.model_zoo. We used the publicly available ByteTrack model zoo trained on MOT17, MOT20 and ablation study for YOLOX object detection.

Additionally, we conducted joint training on MOT20 train half and Crowdhuman, and evaluated on MOT20 val half. The model as follows: yolox_x_mot20_ablation

The model trained on DanceTrack can be available at google:yolox_x_dancetrack or baidu: yolox_x_dancetrack, the extracted key as: sptk

Training

All training is conducted on a unified script. You need to change the VAL_JSON and VAL_PATH in register_data.py, and then run as follows:

# training on MOT17, CrowdHuman, ETHZ, Citypersons, evaluate on MOT17 train set.
CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --num-gpus 4  --config-file mot17_train_config.py 


# training on MOT20, CrowdHuman, evaluate on MOT20 train set.
CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --num-gpus 4  --config-file mot20_train_config.py 

Notes: For MOT20, you need to clip the bounding boxes inside the image.

Add clip operation in line 138-139 in data_augment.py, line 118-121 in mosaicdetection.py, line 213-221 in mosaicdetection.py, line 115-118 in boxes.py.

Tracking

All tracking experimental scripts are run in the following manner. You first place the model weights in the <ROOT/SparseTrack/pretrain/>, and change the VAL_JSON and VAL_PATH in register_data.py.

# tracking on mot17 train set or test set
CUDA_VISIBLE_DEVICES=0 python3 track.py  --num-gpus 1  --config-file mot17_track_cfg.py 


# tracking on mot20 train set or test set
CUDA_VISIBLE_DEVICES=0 python3 track.py  --num-gpus 1  --config-file mot20_track_cfg.py 


# tracking on mot17 val_half set
CUDA_VISIBLE_DEVICES=0 python3 track.py  --num-gpus 1  --config-file mot17_ab_track_cfg.py 


# tracking on mot20 val_half set
CUDA_VISIBLE_DEVICES=0 python3 track.py  --num-gpus 1  --config-file mot20_ab_track_cfg.py

Tracking on dancetrack test set

step 1: Please comment out line 368-373 in the sparse_tracker.py and modify the threshold for low-score matching stage from 0.3 to 0.35 (at line 402 in the sparse_tracker.py).

step 2: Running:

CUDA_VISIBLE_DEVICES=0 python3 track.py  --num-gpus 1  --config-file dancetrack_sparse_cfg.py

Citation -->

If you find SparseTrack is useful in your research or applications, please consider giving us a star 🌟 and citing it by the following BibTeX entry.

@inproceedings{SparseTrack,
  title={SparseTrack: Multi-Object Tracking by Performing Scene Decomposition based on Pseudo-Depth},
  author={Liu, Zelin and Wang, Xinggang and Wang, Cheng and Liu, Wenyu and Bai, Xiang},
  journal={arXiv preprint arXiv:2306.05238},
  year={2023}
}

Acknowledgements

A large part of the code is borrowed from YOLOX, FairMOT, ByteTrack, BoT-SORT, Detectron2. Many thanks for their wonderful works.

More Repositories

1

Vim

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Python
1,808
star
2

YOLOP

You Only Look Once for Panopitic Driving Perception.(MIR2022)
Python
1,804
star
3

4DGaussians

[CVPR 2024] 4D Gaussian Splatting for Real-Time Dynamic Scene Rendering
Jupyter Notebook
1,634
star
4

MapTR

[ICLR'23 Spotlight] MapTR: Structured Modeling and Learning for Online Vectorized HD Map Construction
Python
893
star
5

YOLOS

[NeurIPS 2021] You Only Look at One Sequence
Jupyter Notebook
810
star
6

SparseInst

[CVPR 2022] SparseInst: Sparse Instance Activation for Real-Time Instance Segmentation
Python
558
star
7

GaussianDreamer

GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models (CVPR 2024)
Python
503
star
8

Matte-Anything

Matte Anything: Interactive Natural Image Matting with Segment Anything Models
Python
402
star
9

QueryInst

[ICCV 2021] Instances as Queries
Python
400
star
10

VAD

[ICCV 2023] VAD: Vectorized Scene Representation for Efficient Autonomous Driving
Python
385
star
11

TopFormer

TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation, CVPR2022
Python
373
star
12

MIMDet

[ICCV 2023] You Only Look at One Partial Sequence
Python
326
star
13

TiNeuVox

TiNeuVox: Fast Dynamic Radiance Fields with Time-Aware Neural Voxels (SIGGRAPH Asia 2022)
Python
306
star
14

ViTMatte

[Information Fusion] Boosting Image Matting with Pretrained Plain Vision Transformers
Python
245
star
15

TeViT

Temporally Efficient Vision Transformer for Video Instance Segmentation, CVPR 2022, Oral
Python
234
star
16

GKT

Efficient and Robust 2D-to-BEV Representation Learning via Geometry-guided Kernel Transformer
Python
197
star
17

BMaskR-CNN

[ECCV 2020] Boundary-preserving Mask R-CNN
Python
184
star
18

HAIS

Hierarchical Aggregation for 3D Instance Segmentation (ICCV 2021)
Python
163
star
19

VMA

A general map auto annotation framework based on MapTR, with high flexibility in terms of spatial scale and element type
Python
157
star
20

Symphonies

[CVPR 2024] Symphonies (Scene-from-Insts): Symphonize 3D Semantic Scene Completion with Contextual Instance Queries
Python
123
star
21

WeakTr

WeakTr: Exploring Plain Vision Transformer for Weakly-supervised Semantic Segmentation
Python
116
star
22

LaneGAP

Lane Graph as Path: Continuity-preserving Path-wise Modeling for Online Lane Graph Construction
110
star
23

CrossVIS

[ICCV 2021] Crossover Learning for Fast Online Video Instance Segmentation
Python
85
star
24

MSG-Transformer

MSG-Transformer: Exchanging Local Spatial Information by Manipulating Messenger Tokens (CVPR 2022)
Python
80
star
25

PolarDETR

73
star
26

BoxTeacher

[CVPR 2023] Exploring High-Quality Pseudo Masks for Weakly Supervised Instance Segmentation
Python
72
star
27

TinyDet

Python
68
star
28

GNeuVox

GNeuVox: Generalizable Neural Voxels for Fast Human Radiance Fields
Python
59
star
29

AziNorm

AziNorm: Exploiting the Radial Symmetry of Point Cloud for Azimuth-Normalized 3D Perception, CVPR 2022.
Python
53
star
30

Featurized-QueryRCNN

Featurized Query R-CNN
Python
46
star
31

RILS

[CVPR 2023] RILS: Masked Visual Reconstruction in Language Semantic Space (https://arxiv.org/abs/2301.06958)
Python
43
star
32

PD-Quant

[CVPR 2023] PD-Quant: Post-Training Quantization Based on Prediction Difference Metric
Python
39
star
33

MIM4D

MIM4D: Masked Modeling with Multi-View Video for Autonomous Driving Representation Learning
36
star
34

NeuSample

Code of "NeuSample: Neural Sample Field for Efficient View Synthesis"
Python
35
star
35

SAUNet

A Simple Adaptive Unfolding Network for Hyperspectral Image Reconstruction
Python
29
star
36

Query6DoF

Query6DoF: Learning Sparse Queries as Implicit Shape Prior for Category-Level 6DoF Pose Estimation
Python
25
star
37

HDR-HexPlane

3DV 2024: Fast High Dynamic Range Radiance Fields for Dynamic Scenes
Python
25
star
38

WeakSAM

WeakSAM: Segment Anything Meets Weakly-supervised Instance-level Recognition
Python
24
star
39

ViTGaze

Python
23
star
40

CircuitFormer

[NeurIPS 2023] CircuitFormer: Circuit as Set of Points
Python
23
star
41

MMIL-Transformer

Python
19
star
42

LSFA

Real-Time and Accurate Object Detection in Compressed Video by Long Short-term Feature Aggregation
Python
19
star
43

EfficientPose

Cuda
18
star
44

OpenInst

Python
14
star
45

BoxCaseg

Jupyter Notebook
13
star
46

mancs

Mancs: A multi-task attentional network with curriculum sampling for person re-identification
Python
12
star
47

RND-SCI

A Range-Null Space Decomposition Approach for Fast and Flexible Spectral Compressive Imaging
Python
10
star
48

DGCN

Python
9
star
49

PySA

Pyramid Self-Attention for Semantic Segmentation
8
star
50

EM-OLN

Python
7
star
51

BCF

Xinggang Wang, Bin Feng, Xiang Bai, Wenyu Liu, and Longin Jan Latecki. Bag of Contour Fragments for Robust Shape Classification. Pattern Recognition, Volume 47, Issue 6, June 2014, Pages 2116-2125.
MATLAB
6
star
52

tbcl

1
star
53

DeepTunel

Python
1
star