MCG-NJU/MOC-Detector

Stars
264
Rank 155,103 (Top 4 %)
Language
Python
License
MIT License
Created over 4 years ago
Updated almost 4 years ago

MCG-NJU/MOC-Detector

MCG-NJU

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

[ECCV 2020] Actions as Moving Points

Actions as Moving Points

Pytorch implementation of Actions as Moving Points (ECCV 2020).

View each action instance as a trajectory of moving points.

Visualization results on validation set. (GIFs will take a few minutes to load......)

(Note that the relative low scores are due to the property of the focal loss.)

News & Updates

Jul. 08, 2020 - First release of codes.

Jul. 24, 2020 - Update ucf-pretrained JHMDB model and speed test codes.

Aug. 02, 2020 - Update visualization codes. Extract frames from a video and get the detection result (like above gifs).

Aug. 17, 2020 - Now our visualization supports instance level detection results (reflects video mAP).

Aug. 23, 2020 - We upload MOC with ResNet-18 in Backbone.

MOC Detector Overview

We present a new action tubelet detection framework, termed as MovingCenter Detector (MOC-detector), by treating an action instance as a trajectory of moving points. MOC-detector is decomposed into three crucial head branches:

(1) Center Branch for instance center detection and action recognition.
(2) Movement Branch for movement estimation at adjacent frames to form moving point trajectories.
(3) Box Branch for spatial extent detection by directly regressing bounding box size at the estimated center point of each frame.

MOC-Detector Usage

1. Installation

Please refer to Installation.md for installation instructions.

2. Dataset

Please refer to Dataset.md for dataset setup instructions.

3. Evaluation

You can follow the instructions in Evaluation.md to evaluate our model and reproduce the results in original paper.

4. Train

You can follow the instructions in Train.md to train our models.

5. Visualization

You can follow the instructions in Visualization.md to get visualization results.

References

Data augmentation codes from ACT.
Evaluation codes from ACT.
DLA-34 backbone codes from CenterNet.

ACT LICENSE

CenterNet LICENSE

See more in NOTICE

Citation

If you find this code is useful in your research, please cite:

@InProceedings{li2020actions,
    title={Actions as Moving Points},
    author={Yixuan Li and Zixu Wang and Limin Wang and Gangshan Wu},
    booktitle={arXiv preprint arXiv:2001.04608},
    year={2020}
}

VideoMAE

[NeurIPS 2022 Spotlight] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

MixFormer

[CVPR 2022 Oral & TPAMI 2024] MixFormer: End-to-End Tracking with Iterative Mixed Attention

TDN

[CVPR 2021] TDN: Temporal Difference Networks for Efficient Action Recognition

EMA-VFI

[CVPR 2023] Extracting Motion and Appearance via Inter-Frame Attention for Efficient Video Frame Interpolatio

SparseBEV

[ICCV 2023] SparseBEV: High-Performance Sparse 3D Object Detection from Multi-Camera Videos

AdaMixer

[CVPR 2022 Oral] AdaMixer: A Fast-Converging Query-Based Object Detector

Jupyter Notebook

CamLiFlow

[CVPR 2022 Oral & TPAMI 2023] Learning Optical Flow and Scene Flow with Bidirectional Camera-LiDAR Fusion

SparseOcc

[ECCV 2024] Fully Sparse 3D Occupancy Prediction & RayIoU Evaluation Metric

MeMOTR

[ICCV 2023] MeMOTR: Long-Term Memory-Augmented Transformer for Multi-Object Tracking

MixFormerV2

[NeurIPS 2023] MixFormerV2: Efficient Fully Transformer Tracking

SportsMOT

[ICCV 2023] SportsMOT: A Large Multi-Object Tracking Dataset in Multiple Sports Scenes

SADRNet

[TIP 2021] SADRNet: Self-Aligned Dual Face Regression Networks for Robust 3D Dense Face Alignment and Reconstruction

MultiSports

[ICCV 2021] MultiSports: A Multi-Person Video Dataset of Spatio-Temporally Localized Sports Actions

FCOT

[CVIU] Fully Convolutional Online Tracking

MMN

[AAAI 2022] Negative Sample Matters: A Renaissance of Metric Learning for Temporal Grounding

RTD-Action

[ICCV 2021] Relaxed Transformer Decoders for Direct Action Proposal Generation

MOTIP

Multiple Object Tracking as ID Prediction

BCN

[ECCV 2020] Boundary-Aware Cascade Networks for Temporal Action Segmentation

LinK

[CVPR 2023] LinK: Linear Kernel for LiDAR-based 3D Perception

MixSort

[ICCV2023] MixSort: The Customized Tracker in SportsMOT

CPD-Video

Learning Spatiotemporal Features via Video and Text Pair Discrimination

SGM-VFI

[CVPR 2024] Sparse Global Matching for Video Frame Interpolation with Large Motion

Structured-Sparse-RCNN

[CVPR 2022] Structured Sparse R-CNN for Direct Scene Graph Generation

Jupyter Notebook

TRACE

[ICCV 2021] Target Adaptive Context Aggregation for Video Scene Graph Generation

CRCNN-Action

Context-aware RCNN: a Baseline for Action Detection in Videos

STMixer

[CVPR 2023] STMixer: A One-Stage Sparse Action Detector

BasicTAD

BasicTAD: an Astounding RGB-Only Baselinefor Temporal Action Detection

DDM

[CVPR 2022] Progressive Attention on Multi-Level Dense Difference Maps for Generic Event Boundary Detection

VideoMAE-Action-Detection

[NeurIPS 2022 Spotlight] VideoMAE for Action Detection

MGSampler

[ICCV 2021] MGSampler: An Explainable Sampling Strategy for Video Action Recognition

FSL-Video

[BMVC 2021] A Closer Look at Few-Shot Video Classification: A New Baseline and Benchmark

BIVDiff

[CVPR 2024] BIVDiff: A Training-free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models

PointTAD

[NeurIPS 2022] PointTAD: Multi-Label Temporal Action Detection with Learnable Query Points

TemporalPerceiver

[T-PAMI 2023] Temporal Perceiver: A General Architecture for Arbitrary Boundary Detection

TIA

[CVPR 2022] Task-specific Inconsistency Alignment for Domain Adaptive Object Detection

CoMAE

[AAAI 2023] CoMAE: Single Model Hybrid Pre-training on Small-Scale RGB-D Datasets

PDPP

[CVPR 2023 Hightlight] PDPP: Projected Diffusion for Procedure Planning in Instructional Videos

JoMoLD

[ECCV 2022] Joint-Modal Label Denoising for Weakly-Supervised Audio-Visual Video Parsing

EVAD

[ICCV 2023] Efficient Video Action Detection with Token Dropout and Context Refinement

CGA-Net

[CVPR 2021] CGA-Net: Category Guided Aggregation for Point Cloud Semantic Segmentation

SSD-LT

[ICCV 2021] Self Supervision to Distillation for Long-Tailed Visual Recognition

TREG

Target Transformed Regression for Accurate Tracking

VFIMamba

VFIMamba: Video Frame Interpolation with State Space Models

DEQDet

[ICCV 2023] Deep Equilibrium Object Detection

Jupyter Notebook

MGMAE

[ICCV 2023] MGMAE: Motion Guided Masking for Video Masked Autoencoding

OCSampler

[CVPR 2022] OCSampler: Compressing Videos to One Clip with Single-step Sampling

SportsHHI

[CVPR 2024] SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos

APP-Net

[TIP] APP-Net: Auxiliary-point-based Push and Pull Operations for Efficient Point Cloud Recognition

AMD

[CVPR 2024] Asymmetric Masked Distillation for Pre-Training Small Foundation Models

StageInteractor

[ICCV 2023] StageInteractor: Query-based Object Detector with Cross-stage Interaction

SPLAM

[ECCV 2024 Oral] SPLAM: Accelerating Image Generation with Sub-path Linear Approximation Model

CMPT

[IJCV 2021] Cross-Modal Pyramid Translation for RGB-D Scene Recognition

VLG

VLG: General Video Recognition with Web Textual Knowledge (https://arxiv.org/abs/2212.01638)

DGN

[IJCV 2023] Dual Graph Networks for Pose Estimation in Crowded Scenes

Dynamic-MDETR

[TPAMI 2024] Dynamic MDETR: A Dynamic Multimodal Transformer Decoder for Visual Grounding

BFRNet

ViT-TAD

[CVPR 2024] Adapting Short-Term Transformers for Action Detection in Untrimmed Videos

VideoEval

VideoEval: Comprehensive Benchmark Suite for Low-Cost Evaluation of Video Foundation Model

ZeroI2V

[ECCV 2024] ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video

PRVG

[CVIU 2024] End-to-end dense video grounding via parallel regression

LogN

[IJCV 2024] Logit Normalization for Long-Tail Object Detection