There are no reviews yet. Be the first to send feedback to the community and the maintainers!
VideoMAE
[NeurIPS 2022 Spotlight] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-TrainingMixFormer
[CVPR 2022 Oral & TPAMI 2024] MixFormer: End-to-End Tracking with Iterative Mixed AttentionTDN
[CVPR 2021] TDN: Temporal Difference Networks for Efficient Action RecognitionEMA-VFI
[CVPR 2023] Extracting Motion and Appearance via Inter-Frame Attention for Efficient Video Frame InterpolatioSparseBEV
[ICCV 2023] SparseBEV: High-Performance Sparse 3D Object Detection from Multi-Camera VideosMOC-Detector
[ECCV 2020] Actions as Moving PointsAdaMixer
[CVPR 2022 Oral] AdaMixer: A Fast-Converging Query-Based Object DetectorCamLiFlow
[CVPR 2022 Oral & TPAMI 2023] Learning Optical Flow and Scene Flow with Bidirectional Camera-LiDAR FusionSparseOcc
[ECCV 2024] Fully Sparse 3D Occupancy Prediction & RayIoU Evaluation MetricMeMOTR
[ICCV 2023] MeMOTR: Long-Term Memory-Augmented Transformer for Multi-Object TrackingMixFormerV2
[NeurIPS 2023] MixFormerV2: Efficient Fully Transformer TrackingSportsMOT
[ICCV 2023] SportsMOT: A Large Multi-Object Tracking Dataset in Multiple Sports ScenesSADRNet
[TIP 2021] SADRNet: Self-Aligned Dual Face Regression Networks for Robust 3D Dense Face Alignment and ReconstructionMultiSports
[ICCV 2021] MultiSports: A Multi-Person Video Dataset of Spatio-Temporally Localized Sports ActionsFCOT
[CVIU] Fully Convolutional Online TrackingMMN
[AAAI 2022] Negative Sample Matters: A Renaissance of Metric Learning for Temporal GroundingRTD-Action
[ICCV 2021] Relaxed Transformer Decoders for Direct Action Proposal GenerationMOTIP
Multiple Object Tracking as ID PredictionBCN
[ECCV 2020] Boundary-Aware Cascade Networks for Temporal Action SegmentationLinK
[CVPR 2023] LinK: Linear Kernel for LiDAR-based 3D PerceptionMixSort
[ICCV2023] MixSort: The Customized Tracker in SportsMOTCPD-Video
Learning Spatiotemporal Features via Video and Text Pair DiscriminationSGM-VFI
[CVPR 2024] Sparse Global Matching for Video Frame Interpolation with Large MotionStructured-Sparse-RCNN
[CVPR 2022] Structured Sparse R-CNN for Direct Scene Graph GenerationTRACE
[ICCV 2021] Target Adaptive Context Aggregation for Video Scene Graph GenerationCRCNN-Action
Context-aware RCNN: a Baseline for Action Detection in VideosBasicTAD
BasicTAD: an Astounding RGB-Only Baselinefor Temporal Action DetectionDDM
[CVPR 2022] Progressive Attention on Multi-Level Dense Difference Maps for Generic Event Boundary DetectionVideoMAE-Action-Detection
[NeurIPS 2022 Spotlight] VideoMAE for Action DetectionMGSampler
[ICCV 2021] MGSampler: An Explainable Sampling Strategy for Video Action RecognitionFSL-Video
[BMVC 2021] A Closer Look at Few-Shot Video Classification: A New Baseline and BenchmarkBIVDiff
[CVPR 2024] BIVDiff: A Training-free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion ModelsPointTAD
[NeurIPS 2022] PointTAD: Multi-Label Temporal Action Detection with Learnable Query PointsTemporalPerceiver
[T-PAMI 2023] Temporal Perceiver: A General Architecture for Arbitrary Boundary DetectionTIA
[CVPR 2022] Task-specific Inconsistency Alignment for Domain Adaptive Object DetectionCoMAE
[AAAI 2023] CoMAE: Single Model Hybrid Pre-training on Small-Scale RGB-D DatasetsPDPP
[CVPR 2023 Hightlight] PDPP: Projected Diffusion for Procedure Planning in Instructional VideosJoMoLD
[ECCV 2022] Joint-Modal Label Denoising for Weakly-Supervised Audio-Visual Video ParsingEVAD
[ICCV 2023] Efficient Video Action Detection with Token Dropout and Context RefinementCGA-Net
[CVPR 2021] CGA-Net: Category Guided Aggregation for Point Cloud Semantic SegmentationSSD-LT
[ICCV 2021] Self Supervision to Distillation for Long-Tailed Visual RecognitionTREG
Target Transformed Regression for Accurate TrackingVFIMamba
VFIMamba: Video Frame Interpolation with State Space ModelsDEQDet
[ICCV 2023] Deep Equilibrium Object DetectionMGMAE
[ICCV 2023] MGMAE: Motion Guided Masking for Video Masked AutoencodingOCSampler
[CVPR 2022] OCSampler: Compressing Videos to One Clip with Single-step SamplingSportsHHI
[CVPR 2024] SportsHHI: A Dataset for Human-Human Interaction Detection in Sports VideosAPP-Net
[TIP] APP-Net: Auxiliary-point-based Push and Pull Operations for Efficient Point Cloud RecognitionAMD
[CVPR 2024] Asymmetric Masked Distillation for Pre-Training Small Foundation ModelsStageInteractor
[ICCV 2023] StageInteractor: Query-based Object Detector with Cross-stage InteractionSPLAM
[ECCV 2024 Oral] SPLAM: Accelerating Image Generation with Sub-path Linear Approximation ModelCMPT
[IJCV 2021] Cross-Modal Pyramid Translation for RGB-D Scene RecognitionVLG
VLG: General Video Recognition with Web Textual Knowledge (https://arxiv.org/abs/2212.01638)DGN
[IJCV 2023] Dual Graph Networks for Pose Estimation in Crowded ScenesDynamic-MDETR
[TPAMI 2024] Dynamic MDETR: A Dynamic Multimodal Transformer Decoder for Visual GroundingBFRNet
ViT-TAD
[CVPR 2024] Adapting Short-Term Transformers for Action Detection in Untrimmed VideosVideoEval
VideoEval: Comprehensive Benchmark Suite for Low-Cost Evaluation of Video Foundation ModelZeroI2V
[ECCV 2024] ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to VideoPRVG
[CVIU 2024] End-to-end dense video grounding via parallel regressionLogN
[IJCV 2024] Logit Normalization for Long-Tail Object DetectionLove Open Source and this site? Check out how you can help us