• Stars
    star
    614
  • Rank 73,061 (Top 2 %)
  • Language
    Python
  • License
    Other
  • Created over 3 years ago
  • Updated 10 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

[ECCV2022] MOTR: End-to-End Multiple-Object Tracking with TRansformer

MOTR: End-to-End Multiple-Object Tracking with TRansformer

PWC PWC

This repository is an official implementation of the paper MOTR: End-to-End Multiple-Object Tracking with TRansformer.

Introduction

TL; DR. MOTR is a fully end-to-end multiple-object tracking framework based on Transformer. It directly outputs the tracks within the video sequences without any association procedures.

Abstract. The key challenge in multiple-object tracking task is temporal modeling of the object under track. Existing tracking-by-detection methods adopt simple heuristics, such as spatial or appearance similarity. Such methods, in spite of their commonality, are overly simple and lack the ability to learn temporal variations from data in an end-to-end manner.In this paper, we present MOTR, a fully end-to-end multiple-object tracking framework. It learns to model the long-range temporal variation of the objects. It performs temporal association implicitly and avoids previous explicit heuristics. Built upon DETR, MOTR introduces the concept of "track query". Each track query models the entire track of an object. It is transferred and updated frame-by-frame to perform iterative predictions in a seamless manner. Tracklet-aware label assignment is proposed for one-to-one assignment between track queries and object tracks. Temporal aggregation network together with collective average loss is further proposed to enhance the long-range temporal relation. Experimental results show that MOTR achieves competitive performance and can serve as a strong Transformer-based baseline for future research.

Updates

  • (2021/09/23) Report BDD100K results and release corresponding codes motr_bdd100k.
  • (2022/02/09) Higher performance achieved by not clipping the bounding boxes inside the image.
  • (2022/02/11) Add checkpoint support for training on RTX 2080ti.
  • (2022/02/11) Report DanceTrack results and scripts.
  • (2022/05/12) Higher performance achieved by removing the public detection filtering (filter_pub_det) trick.
  • (2022/07/04) MOTR is accepted by ECCV 2022.

Main Results

MOT17

Method Dataset Train Data HOTA DetA AssA MOTA IDF1 IDS URL
MOTR MOT17 MOT17+CrowdHuman Val 57.8 60.3 55.7 73.4 68.6 2439 model

DanceTrack

Method Dataset Train Data HOTA DetA AssA MOTA IDF1 URL
MOTR DanceTrack DanceTrack 54.2 73.5 40.2 79.7 51.5 model

BDD100K

Method Dataset Train Data MOTA IDF1 IDS URL
MOTR BDD100K BDD100K 32.0 43.5 3493 model

Note:

  1. MOTR on MOT17 and DanceTrack is trained on 8 NVIDIA RTX 2080ti GPUs.
  2. The training time for MOT17 is about 2.5 days on V100 or 4 days on RTX 2080ti;
  3. The inference speed is about 7.5 FPS for resolution 1536x800;
  4. All models of MOTR are trained with ResNet50 with pre-trained weights on COCO dataset.

Installation

The codebase is built on top of Deformable DETR.

Requirements

  • Linux, CUDA>=9.2, GCC>=5.4

  • Python>=3.7

    We recommend you to use Anaconda to create a conda environment:

    conda create -n deformable_detr python=3.7 pip

    Then, activate the environment:

    conda activate deformable_detr
  • PyTorch>=1.5.1, torchvision>=0.6.1 (following instructions here)

    For example, if your CUDA version is 9.2, you could install pytorch and torchvision as following:

    conda install pytorch=1.5.1 torchvision=0.6.1 cudatoolkit=9.2 -c pytorch
  • Other requirements

    pip install -r requirements.txt
  • Build MultiScaleDeformableAttention

    cd ./models/ops
    sh ./make.sh

Usage

Dataset preparation

  1. Please download MOT17 dataset and CrowdHuman dataset and organize them like FairMOT as following:
.
├── crowdhuman
│   ├── images
│   └── labels_with_ids
├── MOT15
│   ├── images
│   ├── labels_with_ids
│   ├── test
│   └── train
├── MOT17
│   ├── images
│   ├── labels_with_ids
├── DanceTrack
│   ├── train
│   ├── test
├── bdd100k
│   ├── images
│       ├── track
│           ├── train
│           ├── val
│   ├── labels
│       ├── track
│           ├── train
│           ├── val

  1. For BDD100K dataset, you can use the following script to generate txt file:
cd datasets/data_path
python3 generate_bdd100k_mot.py
cd ../../

Training and Evaluation

Training on single node

You can download COCO pretrained weights from Deformable DETR. Then training MOTR on 8 GPUs as following:

sh configs/r50_motr_train.sh

Evaluation on MOT15

You can download the pretrained model of MOTR (the link is in "Main Results" session), then run following command to evaluate it on MOT15 train dataset:

sh configs/r50_motr_eval.sh

For visual in demo video, you can enable 'vis=True' in eval.py like:

det.detect(vis=True)

Evaluation on MOT17

You can download the pretrained model of MOTR (the link is in "Main Results" session), then run following command to evaluate it on MOT17 test dataset (submit to server):

sh configs/r50_motr_submit.sh

Evaluation on BDD100K

For BDD100K dataset, please refer motr_bdd100k.

Test on Video Demo

We also provide a demo interface which allows for a quick processing of a given video.

EXP_DIR=exps/e2e_motr_r50_joint
python3 demo.py \
    --meta_arch motr \
    --dataset_file e2e_joint \
    --epoch 200 \
    --with_box_refine \
    --lr_drop 100 \
    --lr 2e-4 \
    --lr_backbone 2e-5 \
    --pretrained ${EXP_DIR}/motr_final.pth \
    --output_dir ${EXP_DIR} \
    --batch_size 1 \
    --sample_mode 'random_interval' \
    --sample_interval 10 \
    --sampler_steps 50 90 120 \
    --sampler_lengths 2 3 4 5 \
    --update_query_pos \
    --merger_dropout 0 \
    --dropout 0 \
    --random_drop 0.1 \
    --fp_ratio 0.3 \
    --query_interaction_layer 'QIM' \
    --extra_track_attn \
    --resume ${EXP_DIR}/motr_final.pth \
    --input_video figs/demo.avi

Citing MOTR

If you find MOTR useful in your research, please consider citing:

@inproceedings{zeng2021motr,
  title={MOTR: End-to-End Multiple-Object Tracking with TRansformer},
  author={Zeng, Fangao and Dong, Bin and Zhang, Yuang and Wang, Tiancai and Zhang, Xiangyu and Wei, Yichen},
  booktitle={European Conference on Computer Vision (ECCV)},
  year={2022}
}

More Repositories

1

NAFNet

The state-of-the-art image restoration model without nonlinear activation functions.
Python
2,195
star
2

ML-GCN

PyTorch implementation of Multi-Label Image Recognition with Graph Convolutional Networks, CVPR 2019.
Python
1,408
star
3

PETR

[ECCV2022] PETR: Position Embedding Transformation for Multi-View 3D Object Detection & [ICCV2023] PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images
Python
862
star
4

video_analyst

A series of basic algorithms that are useful for video understanding, including Single Object Tracking (SOT), Video Object Segmentation (VOS) and so on.
Python
829
star
5

mdistiller

The official implementation of [CVPR2022] Decoupled Knowledge Distillation https://arxiv.org/abs/2203.08679 and [ICCV2023] DOT: A Distillation-Oriented Trainer https://openaccess.thecvf.com/content/ICCV2023/papers/Zhao_DOT_A_Distillation-Oriented_Trainer_ICCV_2023_paper.pdf
Python
801
star
6

IJCAI2023-CoNR

IJCAI2023 - Collaborative Neural Rendering using Anime Character Sheets
Jupyter Notebook
797
star
7

HiDiffusion

[ECCV 2024] HiDiffusion: Increases the resolution and speed of your diffusion model by only adding a single line of code!
Jupyter Notebook
752
star
8

megactor

Python
742
star
9

BBN

The official PyTorch implementation of paper BBN: Bilateral-Branch Network with Cumulative Learning for Long-Tailed Visual Recognition
Python
659
star
10

neural-painter

Paint artistic patterns using random neural network.
Python
532
star
11

CREStereo

Official MegEngine implementation of CREStereo(CVPR 2022 Oral).
Python
483
star
12

megvii-pku-dl-course

Homepage for the joint course of Megvii Inc. and Peking University on Deep Learning.
Python
445
star
13

MOTRv2

[CVPR2023] MOTRv2: Bootstrapping End-to-End Multi-Object Tracking by Pretrained Object Detectors
Python
364
star
14

AnchorDETR

An official implementation of the Anchor DETR.
Python
335
star
15

MSPN

Multi-Stage Pose Network
Python
334
star
16

Sparsebit

A model compression and acceleration toolbox based on pytorch.
Python
325
star
17

FQ-ViT

[IJCAI 2022] FQ-ViT: Post-Training Quantization for Fully Quantized Vision Transformer
Python
304
star
18

FSCE

Python
280
star
19

OccDepth

Maybe the first academic open work on stereo 3D SSC method with vision-only input.
Python
278
star
20

TransMVSNet

(CVPR 2022) TransMVSNet: Global Context-aware Multi-view Stereo Network with Transformers.
Python
268
star
21

RevCol

Official Code of Paper "Reversible Column Networks" "RevColv2"
Python
248
star
22

TLC

Test-time Local Converter
Python
229
star
23

DCLS-SR

Official PyTorch implementation of the paper "Deep Constrained Least Squares for Blind Image Super-Resolution", CVPR 2022.
Python
220
star
24

SOLQ

"SOLQ: Segmenting Objects by Learning Queries", SOLQ is an end-to-end instance segmentation framework with Transformer.
Python
198
star
25

introduction-neural-3d-reconstruction

Course materials for Introduction to Neural 3D Reconstruction
185
star
26

AAAI2023-PVD

Official Implementation of PVD and PVDAL: http://sk-fun.fun/PVD-AL/
Python
183
star
27

tf-tutorials

Tutorials for deep learning course here:
Jupyter Notebook
180
star
28

DPGN

[CVPR 2020] DPGN: Distribution Propagation Graph Network for Few-shot Learning.
Python
178
star
29

CADDM

Official implementation of ID-unaware Deepfake Detection Model
C++
146
star
30

Far3D

[AAAI2024] Far3D: Expanding the Horizon for Surround-view 3D Object Detection
Jupyter Notebook
140
star
31

PMN

[TPAMI 2023 / ACMMM 2022 Best Paper Runner-Up Award] Learnability Enhancement for Low-light Raw Denoising: Where Paired Real Data Meets Noise Modeling (a Data Perspective)
Python
131
star
32

megfile

Megvii FILE Library - Working with Files in Python same as the standard library
Python
123
star
33

CR-DA-DET

The official PyTorch implementation of paper Exploring Categorical Regularization for Domain Adaptive Object Detection (CR-DA-DET)
Python
115
star
34

CVPR2023-UniDistill

CVPR2023 (highlight) - UniDistill: A Universal Cross-Modality Knowledge Distillation Framework for 3D Object Detection in Bird's-Eye View
Python
103
star
35

TreeEnergyLoss

[CVPR2022] Tree Energy Loss: Towards Sparsely Annotated Semantic Segmentation
Python
102
star
36

hpman

A hyperparameter manager for deep learning experiments.
Python
95
star
37

RealFlow

The official implementation of the ECCV 2022 Oral paper: RealFlow: EM-based Realistic Optical Flow Dataset Generation from Videos
Python
93
star
38

HDR-Transformer

The official MegEngine implementation of the ECCV 2022 paper: Ghost-free High Dynamic Range Imaging with Context-aware Transformer
Python
90
star
39

Iter-E2EDET

Official implementation of the paper "Progressive End-to-End Object Detection in Crowded Scenes"
Python
88
star
40

cv-master-ex

torch version of instant-ngp, image rendering
C++
80
star
41

FSSD_OoD_Detection

[SafeAI'21] Feature Space Singularity for Out-of-Distribution Detection.
Python
80
star
42

SSQL-ECCV2022

PyTorch implementation of SSQL (Accepted to ECCV2022 oral presentation)
Python
75
star
43

expman

Shell
62
star
44

BasesHomo

The official PyTorch implementation of the paper "Motion Basis Learning for Unsupervised Deep Homography Estimation with Subspace Projection".
Python
61
star
45

megvii-tsinghua-dl-course

Slides with modifications for a course at Tsinghua University.
57
star
46

LGD

Official Implementation of the detection self-distillation framework LGD.
Python
53
star
47

protoclip

📍 Official pytorch implementation of paper "ProtoCLIP: Prototypical Contrastive Language Image Pretraining" (IEEE TNNLS)
Python
46
star
48

D2C-SR

Official MegEngine implementation of ECCV2022 "D2C-SR: A Divergence to Convergence Approach for Real-World Image Super-Resolution".
Python
44
star
49

HomoGAN

This is the official implementation of HomoGAN, CVPR2022
Python
44
star
50

FullMatch

Official implementation of FullMatch (CVPR2023)
Python
44
star
51

KD-MVS

Code for ECCV2022 paper 'KD-MVS: Knowledge Distillation Based Self-supervised Learning for Multi-view Stereo'
Python
44
star
52

AGFlow

Learning Optical Flow with Adaptive Graph Reasoning (AGFlow, AAAI-2022)
Python
42
star
53

pytorch-gym

Implementation of the Deep Deterministic Policy Gradient(DDPG) in bullet Gym using pytorch
Python
41
star
54

TPS-CVPR2023

Python
41
star
55

KPAFlow

PyTorch implementation of KPA-Flow. Learning Optical Flow with Kernel Patch Attention (CVPR-2022)
Python
38
star
56

PCB

Official code for CVPR 2022 paper "Relieving Long-tailed Instance Segmentation via Pairwise Class Balance".
Python
37
star
57

FST-Matching

Official implementation of the FST-Matching Model.
Python
37
star
58

basecls

A codebase & model zoo for pretrained backbone based on MegEngine.
Python
32
star
59

US3L-CVPR2023

PyTorch implementation of US3L (Accepted to CVPR2023)
Python
31
star
60

Sobolev_INRs

[ECCV 2022] The official experimental code of "Sobolev Training for Implicit Neural Representations with Approximated Image Derivatives"
Python
30
star
61

Portraits_Correction

Python
29
star
62

basedet

An object detection codebase based on MegEngine.
Python
28
star
63

Co-mining

Co-mining: Self-Supervised Learning for Sparsely Annotated Object Detection, AAAI 2021.
Python
27
star
64

zipfls

This repo is the official megengine implementation of the ECCV2022 paper: Efficient One Pass Self-distillation with Zipf's Label Smoothing.
Python
25
star
65

tf-cpn

Cascade Pyramid Netwrok
Python
24
star
66

Arch-Net

Arch-Net: Model Distillation for Architecture Agnostic Model Deployment
Python
22
star
67

juicefs-python

JuiceFS Python SDK
Python
21
star
68

ED-Net

PyTorch implementation of A Lightweight Encoder-Decoder Path for Deep Residual Networks.
Python
19
star
69

IntLLaMA

IntLLaMA: A fast and light quantization solution for LLaMA
Python
18
star
70

CasPL

17
star
71

MSCL

[ECCV2022] Motion Sensitive Contrastive Learning for Self-supervised Video Representation
Python
17
star
72

LBHomo

This is the official PyTorch implementation of Semi-supervised Deep Large-baseline Homography Estimation with Progressive Equivalence Constraint, AAAI 2023
Python
17
star
73

RG-SENet_SP-SENet

PyTorch implementation of Delving Deep into Spatial Pooling for Squeeze-and-Excitation Networks.
Python
17
star
74

hpargparse

argparse extension for hpman
Python
16
star
75

Sparse-Beats-Dense

[ECCV 2024] Sparse Beats Dense: Rethinking Supervision in Radar-Camera Depth Completion
Python
15
star
76

MCTrack

This is the offical implementation of the paper "MCTrack: A Unified 3D Multi-Object Tracking Framework for Autonomous Driving"
Python
14
star
77

MEMD

Megvii Electric Moped Detector (ONNX based inference)
Python
13
star
78

DVN

Python
13
star
79

Occ2net

Jupyter Notebook
13
star
80

revisitAIRL

[ECCV2022] Revisiting the Critical Factors of Augmentation-Invariant Representation Learning
Python
11
star
81

megengine-face-recognition

Python
9
star
82

SimpleDG

This is the training and test code for ECCV2022 workshop NICO challenge
Python
7
star
83

GeneGAN

Pytorch version of GeneGAN
Python
7
star
84

basecore

basecore is a simple repo that provides deep learning frame for MegEngine.
Python
7
star
85

hpnevergrad

A nevergrad extension for hpman
Python
5
star
86

DRConv

Python
4
star
87

.github

2
star