• Stars
    star
    113
  • Rank 310,115 (Top 7 %)
  • Language
    Python
  • Created almost 4 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

[AAAI2021] The source code for our paper 《Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion》.

DSM

The source code for paper Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion

Project Website;

Datasets list and some visualizations/provided weights are preparing now.

1. Introduction (scene-dominated to motion-dominated)

Video datasets are usually scene-dominated, We propose to decouple the scene and the motion (DSM) with two simple operations, so that the model attention towards the motion information is better paid.

The generated triplet is as below:

What DSM learned?

With DSM pretrain, the model learn to focus on motion region (Not necessarily actor) powerful without one label available.

2. Installation

Dataset

Please refer dataset.md for details.

Requirements

  • Python3
  • pytorch1.1+
  • PIL
  • Intel (on the fly decode)

3. Structure

  • datasets
    • list
      • hmdb51: the train/val lists of HMDB51
      • ucf101: the train/val lists of UCF101
      • kinetics-400: the train/val lists of kinetics-400
      • diving48: the train/val lists of diving48
  • experiments
    • logs: experiments record in detials
    • gradientes: grad check
    • visualization:
  • src
    • data: load data
    • loss: the loss evaluate in this paper
    • model: network architectures
    • scripts: train/eval scripts
    • augment: detail implementation of Spatio-temporal Augmentation
    • utils
    • feature_extract.py: feature extractor given pretrained model
    • main.py: the main function of finetune
    • trainer.py
    • option.py
    • pt.py: self-supervised pretrain
    • ft.py: supervised finetune

DSM(Triplet)/DSM/Random

Self-supervised Pretrain

Kinetics
bash scripts/kinetics/pt.sh
UCF101
bash scripts/ucf101/pt.sh

Supervised Finetune (Clip-level)

HMDB51
bash scripts/hmdb51/ft.sh
UCF101
bash scripts/ucf101/ft.sh
Kinetics
bash scripts/kinetics/ft.sh

Video-level Evaluation

Following common practice TSN and Non-local. The final video-level result is average by 10 temporal window sampling + corner crop, which lead to better result than clip-level. Refer test.py for details.

Pretrain And Eval In one step

bash scripts/hmdb51/pt_and_ft_hmdb51.sh

Notice: More Training Options and ablation study Can be find in scripts

Video Retrieve and other visualization

(1). Feature Extractor

As STCR can be easily extend to other video representation task, we offer the scripts to perform feature extract.

python feature_extractor.py

The feature will be saved as a single numpy file in the format [video_nums,features_dim] for further visualization.

(2). Reterival Evaluation

modify line60-line62 in reterival.py.

python reterival.py

Results

Action Recognition

UCF101 Pretrained (I3D)

Method UCF101 HMDB51
Random Initialization 47.9 29.6
MoCo Baseline 62.3 36.5
DSM(Triplet) 70.7 48.5
DSM 74.8 52.5

Kinetics Pretrained

Video Retrieve (UCF101-C3D)

Method @1 @5 @10 @20 @50
DSM 16.8 33.4 43.4 54.6 70.7

Video Retrieve (HMDB51-C3D)

Method @1 @5 @10 @20 @50
DSM 8.2 25.9 38.1 52.0 75.0

More Visualization

Acknowledgement

This work is partly based on STN, UEL and MoCo.

License

Citation

If you use our code in your research or wish to refer to the baseline results, pleasuse use the followint BibTex entry.

@inproceedings{wang2021enhancing,
  title={Enhancing unsupervised video representation learning by decoupling the scene and the motion},
  author={Wang, Jinpeng and Gao, Yuting and Li, Ke and Hu, Jianguo and Jiang, Xinyang and Guo, Xiaowei and Ji, Rongrong and Sun, Xing},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={35},
  number={11},
  pages={10129--10137},
  year={2021}
}

More Repositories

1

BE

[CVPR2021] The source code for our paper 《Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning》.
Python
152
star
2

real_time_video_action_recognition

tensorflow real time video action recognition based on C3D https://arxiv.org/abs/1412.0767
Python
78
star
3

Self-Supervised-Temporal-Discriminative-Representation-Learning-for-Video-Action-Recognition

[Arxiv2020] The code for our paper 《Self-Supervised Temporal-Discriminative Representation Learning for Video Action Recognition》 https://arxiv.org/abs/2008.02129
Python
77
star
4

3DNet_Visualization

Pytorch 3DNet attention feature map Visualization by [Cam](https://arxiv.org/abs/1512.04150); C3D, R3D, I3D, MF Net is support now!
Python
63
star
5

OA-Transformer

[CVPR 2022] The code for our paper 《Object-aware Video-language Pre-training for Retrieval》
Python
61
star
6

SerialTcpUdpBasedOnQt

A Serial,Tcp And Udp master Based On Qt For Embedded. Test on stm32 in keil.(QT实现的上位机软件,实现串口,PCP和UDP通信,在STM32板子上经过测试)
C++
31
star
7

Faster-rcnn_Ship_detection

Ocean Ship Image Process(DSP) And Detection(faster-rcnn).
MATLAB
12
star
8

FingerveinRecognitionModel1

A Fingervein Recognition Model Based On Gabor Multiple Filter;(基于高斯和Gabor八方向滤波器的指静脉识别模型)
MATLAB
11
star
9

FingerveinRecognitionModel3

A simple implement for A Novel Approach For Finger Vein Verification Based on Self-Taught Learning https://arxiv.org/pdf/1508.03710.pdf
MATLAB
11
star
10

awesome_video_self_supervised

awesome video-based self-supervised learning methods in recently years
10
star
11

activity_recognition

Activity Recognition Model Based On PyTorch
Python
8
star
12

RHE

[TCSVT2020] The code for our paper 《Revisiting Hard Example for Action Recognition》
Python
8
star
13

FingerveinRecognitionModel2

A Fingervein Recognition Model Based On Lbp + Dbn;基于降维LBP与深度信念网络的指静脉识别模型
MATLAB
7
star
14

FingerveinRecognitionModel4

A Fingervein Recognition Model Based On CNN;
Python
6
star
15

Awesome_text_to_video_generation_model

6
star