• Stars
    star
    747
  • Rank 60,308 (Top 2 %)
  • Language
    Python
  • License
    MIT License
  • Created over 5 years ago
  • Updated over 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

[CVPR'19 Oral] Deeper and Wider Siamese Networks for Real-Time Visual Tracking

Deeper and Wider Siamese Networks for Real-Time Visual Tracking

We are hiring research interns for visual tracking and neural architecture search projects: [email protected]

News

  • πŸ† We are the Winner of VOT-19 RGB-D challenge [codes and models]
  • πŸ† We won the Runner-ups in VOT-19 Long-term and RGB-T challenges [codes and models]
  • β˜€οΈβ˜€οΈ We add the results on VOT-18, VOT-19, GOT10K, VISDRONE19, and LaSOT datasets.
  • β˜€οΈβ˜€οΈ The training and testing code of SiamFC+ and SiamRPN+ have been released.
  • β˜€οΈβ˜€οΈ Our paper has been accepted by CVPR2019 (Oral).
  • β˜€οΈβ˜€οΈ We provide a parameter tuning toolkit for siamese tracking framework.

Introduction

Siamese networks have drawn great attention in visual tracking because of their balanced accuracy and speed. However, the backbone network utilized in these trackers is still the classical AlexNet, which does not fully take advantage of the capability of modern deep neural networks.

Our proposals improve the performances of fully convolutional siamese trackers by,

  1. introducing CIR and CIR-D units to unveil the power of deeper and wider networks like ResNet and Inceptipon;
  2. designing backbone networks according to the analysis on internal network factors (e.g. receptive field, stride, output feature size), which affect tracking performances.

Main Results

Main results on VOT and OTB

Models OTB13 OTB15 VOT15 VOT16 VOT17
Alex-FC 0.608 0.579 0.289 0.235 0.188
Alex-RPN - 0.637 0.349 0.344 0.244
CIResNet22-FC 0.663 0.644 0.318 0.303 0.234
CIResIncep22-FC 0.662 0.642 0.310 0.295 0.236
CIResNext23-FC 0.659 0.633 0.297 0.278 0.229
CIResNet22-RPN 0.674 0.666 0.381 0.376 0.294

Main results trained with GOT-10k (SiamFC)

Models OTB13 OTB15 VOT15 VOT16 VOT17
Alex-FC - - - - 0.188
CIResNet22-FC 0.664 0.654 0.361 0.335 0.266
CIResNet22W-FC 0.689 0.674 0.368 0.352 0.269
CIResIncep22-FC 0.673 0.650 0.332 0.305 0.251
CIResNext22-FC 0.668 0.651 0.336 0.304 0.246
Raw Results πŸ“Ž OTB2013 πŸ“Ž OTB2015 πŸ“Ž VOT15 πŸ“Ž VOT16 πŸ“Ž VOT17
  • Some reproduced results listed above are slightly better than the ones in the paper.
  • Recently we found that training on GOT10K dataset can achieve better performance for SiamFC. So we provide the results being trained on GOT10K.

New added results

Benchmark VOT18 VOT19 GOT10K VISDRONE19 LaSOT
Performance 0.270 0.242 0.416 0.383 0.384
Raw Results πŸ“Ž VOT18 πŸ“Ž VOT19 πŸ“Ž GOT10K πŸ“Ž VISDRONE πŸ“Ž LaSOT
  • We add resutls of SiamFCRes22W on recent benchmarks.
  • Download pretrained on GOT10K model and hyper-parameters.

Environment

The code is developed with Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz GPU: NVIDIA .GTX1080

Quick Start

Test

See details in test.md

Train

See details in train.md

☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️

Citation

If any part of our paper and code is helpful to your work, please generously cite with:

@InProceedings{SiamDW_2019_CVPR,
author = {Zhang, Zhipeng and Peng, Houwen},
title = {Deeper and Wider Siamese Networks for Real-Time Visual Tracking},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2019}
} 

License

Licensed under an MIT license.

More Repositories

1

TTSR

[CVPR'20] TTSR: Learning Texture Transformer Network for Image Super-Resolution
Python
756
star
2

Stark

[ICCV'21] Learning Spatio-Temporal Transformer for Visual Tracking
Python
628
star
3

TracKit

[ECCV'20] Ocean: Object-aware Anchor-Free Tracking
Python
608
star
4

STTN

[ECCV'2020] STTN: Learning Joint Spatial-Temporal Transformations for Video Inpainting
Jupyter Notebook
462
star
5

AOT-GAN-for-Inpainting

[TVCG'2023] AOT-GAN for High-Resolution Image Inpainting (codebase for image inpainting)
Python
416
star
6

LightTrack

[CVPR21] LightTrack: Finding Lightweight Neural Network for Object Tracking via One-Shot Architecture Search
Python
387
star
7

MM-Diffusion

[CVPR'23] MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation
Python
354
star
8

PEN-Net-for-Inpainting

[CVPR'2019] PEN-Net: Learning Pyramid-Context Encoder Network for High-Quality Image Inpainting
Python
354
star
9

img2poem

[MM'18] Beyond Narrative Description: Generating Poetry from Images by Multi-Adversarial Training
Python
282
star
10

tasn

Trilinear Attention Sampling Network for Fine-grained Image Recognition
Python
218
star
11

soho

[CVPR'21 Oral] Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning
Python
205
star
12

TTVSR

[CVPR'22 Oral] TTVSR: Learning Trajectory-Aware Transformer for Video Super-Resolution
Python
197
star
13

FTVSR

[ECCV'22] FTVSR: Learning Spatiotemporal Frequency-Transformer for Compressed Video Super-Resolution
Python
151
star
14

DBTNet

Code for our NeurIPS'19 paper "Learning Deep Bilinear Transformation for Fine-grained Image Representation"
Python
105
star
15

generate-it

A collection of models for image<->text generation in ACM MM 2021.
Python
64
star
16

CKDN

[ICCV'21] CKDN: Learning Conditional Knowledge Distillation for Degraded-Reference Image Quality Assessment
Python
55
star
17

SariGAN

[NeurIPS'20] Learning Semantic-aware Normalization for Generative Adversarial Networks
Python
54
star
18

VOT2019

The Winner and Runner-up Trackers for VOT-2019 Challenges
Python
50
star
19

WSOD2

[ICCV'19] WSOD^2: Learning Bottom-up and Top-down Objectness Distillation for Weakly-supervised Object Detection
Python
46
star
20

CyDAS

Cyclic Differentiable Architecture Search
Python
34
star
21

VQD-SR

[ICCV'23] VQD-SR: Learning Data-Driven Vector-Quantized Degradation Model for Animation Video Super-Resolution
Python
34
star
22

NEAS

Python
19
star
23

2D-TAN

AAAI2020 - Learning 2D Temporal Localization Networks for Moment Localization with Natural Language
Python
16
star
24

AAST-pytorch

[MM'20] Aesthetic-Aware Image Style Transfer
Python
14
star
25

STTR

[ACCV'22] Fine-Grained Image Style Transfer with Visual Transformers
Python
12
star
26

davinci-videofactory

JavaScript
12
star
27

AI_Illustrator

[MM'22 Oral] AI Illustrator: Translating Raw Descriptions into Images by Prompt-based Cross-Modal Generation
Python
11
star
28

language-guided-animation

[TMM 2023] Language-Guided Face Animation by Recurrent StyleGAN-based Generator
Python
10
star