VideoMAE
[NeurIPS 2022 Spotlight] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-TrainingMixFormer
[CVPR 2022 Oral & TPAMI 2024] MixFormer: End-to-End Tracking with Iterative Mixed AttentionTDN
[CVPR 2021] TDN: Temporal Difference Networks for Efficient Action RecognitionEMA-VFI
[CVPR 2023] Extracting Motion and Appearance via Inter-Frame Attention for Efficient Video Frame InterpolatioSparseBEV
[ICCV 2023] SparseBEV: High-Performance Sparse 3D Object Detection from Multi-Camera VideosMOC-Detector
[ECCV 2020] Actions as Moving PointsAdaMixer
[CVPR 2022 Oral] AdaMixer: A Fast-Converging Query-Based Object DetectorCamLiFlow
[CVPR 2022 Oral & TPAMI 2023] Learning Optical Flow and Scene Flow with Bidirectional Camera-LiDAR FusionSADRNet
[TIP 2021] SADRNet: Self-Aligned Dual Face Regression Networks for Robust 3D Dense Face Alignment and ReconstructionMeMOTR
[ICCV 2023] MeMOTR: Long-Term Memory-Augmented Transformer for Multi-Object TrackingMixFormerV2
[NeurIPS 2023] MixFormerV2: Efficient Fully Transformer TrackingSportsMOT
[ICCV 2023] SportsMOT: A Large Multi-Object Tracking Dataset in Multiple Sports ScenesMultiSports
[ICCV 2021] MultiSports: A Multi-Person Video Dataset of Spatio-Temporally Localized Sports ActionsFCOT
[CVIU] Fully Convolutional Online TrackingSparseOcc
Fully Sparse 3D Occupancy Prediction & RayIoU Evaluation MetricMMN
[AAAI 2022] Negative Sample Matters: A Renaissance of Metric Learning for Temporal GroundingRTD-Action
[ICCV 2021] Relaxed Transformer Decoders for Direct Action Proposal GenerationBCN
[ECCV 2020] Boundary-Aware Cascade Networks for Temporal Action SegmentationLinK
[CVPR 2023] LinK: Linear Kernel for LiDAR-based 3D PerceptionMixSort
[ICCV2023] MixSort: The Customized Tracker in SportsMOTCPD-Video
Learning Spatiotemporal Features via Video and Text Pair DiscriminationStructured-Sparse-RCNN
[CVPR 2022] Structured Sparse R-CNN for Direct Scene Graph GenerationTRACE
[ICCV 2021] Target Adaptive Context Aggregation for Video Scene Graph GenerationCRCNN-Action
Context-aware RCNN: a Baseline for Action Detection in VideosDDM
[CVPR 2022] Progressive Attention on Multi-Level Dense Difference Maps for Generic Event Boundary DetectionBasicTAD
BasicTAD: an Astounding RGB-Only Baselinefor Temporal Action DetectionSTMixer
[CVPR 2023] STMixer: A One-Stage Sparse Action DetectorMGSampler
[ICCV 2021] MGSampler: An Explainable Sampling Strategy for Video Action RecognitionVideoMAE-Action-Detection
[NeurIPS 2022 Spotlight] VideoMAE for Action DetectionMOTIP
Multiple Object Tracking as ID PredictionFSL-Video
[BMVC 2021] A Closer Look at Few-Shot Video Classification: A New Baseline and BenchmarkPointTAD
[NeurIPS 2022] PointTAD: Multi-Label Temporal Action Detection with Learnable Query PointsTemporalPerceiver
[T-PAMI 2023] Temporal Perceiver: A General Architecture for Arbitrary Boundary DetectionTIA
[CVPR 2022] Task-specific Inconsistency Alignment for Domain Adaptive Object DetectionCoMAE
[AAAI 2023] CoMAE: Single Model Hybrid Pre-training on Small-Scale RGB-D DatasetsSGM-VFI
[CVPR 2024] Sparse Global Matching for Video Frame Interpolation with Large MotionPDPP
[CVPR 2023 Hightlight] PDPP: Projected Diffusion for Procedure Planning in Instructional VideosJoMoLD
[ECCV 2022] Joint-Modal Label Denoising for Weakly-Supervised Audio-Visual Video ParsingCGA-Net
[CVPR 2021] CGA-Net: Category Guided Aggregation for Point Cloud Semantic SegmentationSSD-LT
[ICCV 2021] Self Supervision to Distillation for Long-Tailed Visual RecognitionTREG
Target Transformed Regression for Accurate TrackingDEQDet
[ICCV 2023] Deep Equilibrium Object DetectionEVAD
[ICCV 2023] Efficient Video Action Detection with Token Dropout and Context RefinementBIVDiff
[CVPR 2024] BIVDiff: A Training-free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion ModelsOCSampler
[CVPR 2022] OCSampler: Compressing Videos to One Clip with Single-step SamplingMGMAE
[ICCV 2023] MGMAE: Motion Guided Masking for Video Masked AutoencodingAPP-Net
[TIP] APP-Net: Auxiliary-point-based Push and Pull Operations for Efficient Point Cloud RecognitionStageInteractor
[ICCV 2023] StageInteractor: Query-based Object Detector with Cross-stage InteractionCMPT
[IJCV 2021] Cross-Modal Pyramid Translation for RGB-D Scene RecognitionVLG
VLG: General Video Recognition with Web Textual Knowledge (https://arxiv.org/abs/2212.01638)DGN
[IJCV 2023] Dual Graph Networks for Pose Estimation in Crowded ScenesBFRNet
LogN
This repo is an official implementation of our IJCV paper: Logit Normalization for Long-Tail Object Detection, which was published in 08 January 2024.Love Open Source and this site? Check out how you can help us