Implementation of "Self-Supervised Learning via Conditional Motion Propagation" (CMP)
Paper
Xiaohang Zhan, Xingang Pan, Ziwei Liu, Dahua Lin, Chen Change Loy, "Self-Supervised Learning via Conditional Motion Propagation", in CVPR 2019 [Project Page]
For further information, please contact Xiaohang Zhan.
YouTube)
Demos (Watching full demos in- Conditional motion propagation (motion prediction by guidance)
- Guided video generation (draw arrows to let a static image animated)
- Semi-automatic annotation (first row: interface, auto zoom-in, mask; second row: optical flows)
Data collection
YFCC frames (45G). YFCC optical flows (LiteFlowNet) (29G). YFCC lists (251M).
Model collection
-
Pre-trained models for semantic segmentation, instance segmentation and human parsing by CMP can be downloaded here
-
Models for demos (conditinal motion propagation, guided video generation and semi-automatic annotation) can be downloaded here
Requirements
-
python>=3.6
-
pytorch>=0.4.0
-
others
pip install -r requirements.txt
Usage
-
Clone the repo.
git clone [email protected]:XiaohangZhan/conditional-motion-propagation.git cd conditional-motion-propagation
Representation learning
-
Prepare data (YFCC as an example)
mkdir data mkdir data/yfcc cd data/yfcc # download YFCC frames, optical flows and lists to data/yfcc tar -xf UnsupVideo_Frames_v1.tar.gz tar -xf flow_origin.tar.gz tar -xf lists.tar.gz
Then folder
data
looks like:data βββ yfcc βββ UnsupVideo_Frames βββ flow_origin βββ lists βββ train.txt βββ val.txt
-
Train CMP for Representation Learning.
- If your server supports multi-nodes training.
sh experiments/rep_learning/alexnet_yfcc_voc_16gpu_70k/train.sh # 16 GPUs distributed training python tools/weight_process.py --config experiments/rep_learning/alexnet_yfcc_voc_16gpu_70k/config.yaml --iter 70000 # extract weights of the image encoder to experiments/rep_learning/alexnet_yfcc_voc_16gpu_70k/checkpoints/convert_iter_70000.pth.tar
- If your server does not support multi-nodes training.
sh experiments/rep_learning/alexnet_yfcc_voc_8gpu_140k/train.sh # 8 GPUs distributed training python tools/weight_process.py --config experiments/rep_learning/alexnet_yfcc_voc_8gpu_140k/config.yaml --iter 140000 # extract weights of the image encoder
Run demos
-
Download the model and move it to
experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints/
. -
Launch jupyter notebook and run
demos/cmp.ipynb
for conditional motion propagation, ordemos/demo_annot.ipynb
for semi-automatic annotation. -
Train the model by yourself (optional)
# data not ready sh experiments/semiauto_annot/resnet50_vip+mpii_liteflow/train.sh # 8 GPUs distributed training
Results
1. Pascal VOC 2012 Semantic Segmentation (AlexNet)
Method (AlexNet) | Supervision (data amount) | % mIoU |
---|---|---|
Krizhevsky et al. [1] | ImageNet labels (1.3M) | 48.0 |
Random | - (0) | 19.8 |
Pathak et al. [2] | In-painting (1.2M) | 29.7 |
Zhang et al. [3] | Colorization (1.3M) | 35.6 |
Zhang et al. [4] | Split-Brain (1.3M) | 36.0 |
Noroozi et al. [5] | Counting (1.3M) | 36.6 |
Noroozi et al. [6] | Jigsaw (1.3M) | 37.6 |
Noroozi et al. [7] | Jigsaw++ (1.3M) | 38.1 |
Jenni et al. [8] | Spot-Artifacts (1.3M) | 38.1 |
Larsson et al. [9] | Colorization (3.7M) | 38.4 |
Gidaris et al. [10] | Rotation (1.3M) | 39.1 |
Pathak et al. [11]* | Motion Segmentation (1.6M) | 39.7 |
Walker et al. [12]* | Flow Prediction (3.22M) | 40.4 |
Mundhenk et al. [13] | Context (1.3M) | 40.6 |
Mahendran et al. [14] | Flow Similarity (1.6M) | 41.4 |
Ours | CMP (1.26M) | 42.9 |
Ours | CMP (3.22M) | 44.5 |
Caron et al. [15] | Clustering (1.3M) | 45.1 |
Feng et al. [16] | Feature Decoupling (1.3M) | 45.3 |
2. Pascal VOC 2012 Semantic Segmentation (ResNet-50)
Method (ResNet-50) | Supervision (data amount) | % mIoU |
---|---|---|
Krizhevsky et al. [1] | ImageNet labels (1.2M) | 69.0 |
Random | - (0) | 42.4 |
Walker et al. [12]* | Flow Prediction (1.26M) | 54.5 |
Pathak et al. [11]* | Motion Segmentation (1.6M) | 54.6 |
Ours | CMP (1.26M) | 59.0 |
3. COCO 2017 Instance Segmentation (ResNet-50)
Method (ResNet-50) | Supervision (data amount) | Det. (% mAP) | Seg. (% mAP) |
---|---|---|---|
Krizhevsky et al. [1] | ImageNet labels (1.2M) | 37.2 | 34.1 |
Random | - (0) | 19.7 | 18.8 |
Pathak et al. [11]* | Motion Segmentation (1.6M) | 27.7 | 25.8 |
Walker et al. [12]* | Flow Prediction (1.26M) | 31.5 | 29.2 |
Ours | CMP (1.26M) | 32.3 | 29.8 |
4. LIP Human Parsing (ResNet-50)
Method (ResNet-50) | Supervision (data amount) | Single-Person (% mIoU) | Multi-Person (% mIoU) |
---|---|---|---|
Krizhevsky et al. [1] | ImageNet labels (1.2M) | 42.5 | 55.4 |
Random | - (0) | 32.5 | 35.0 |
Pathak et al. [11]* | Motion Segmentation (1.6M) | 36.6 | 50.9 |
Walker et al. [12]* | Flow Prediction (1.26M) | 36.7 | 52.5 |
Ours | CMP (1.26M) | 36.9 | 51.8 |
Ours | CMP (4.57M) | 40.2 | 52.9 |
References
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In NeurIPS, 2012.
- Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell, and Alexei A Efros. Context encoders: Feature learning by inpainting. In CVPR, 2016.
- Richard Zhang, Phillip Isola, and Alexei A Efros. Colorful image colorization. In ECCV. Springer, 2016.
- Richard Zhang, Phillip Isola, and Alexei A Efros. Split-brain autoencoders: Unsupervised learning by cross-channel prediction. In CVPR, 2017.
- Mehdi Noroozi, Hamed Pirsiavash, and Paolo Favaro. Representation learning by learning to count. In ICCV, 2017.
- Mehdi Noroozi and Paolo Favaro. Unsupervised learning of visual representations by solving jigsaw puzzles. In ECCV. Springer, 2016.
- Mehdi Noroozi, Ananth Vinjimoor, Paolo Favaro, and Hamed Pirsiavash. Boosting self-supervised learning via knowledge transfer. In CVPR, 2018.
- Simon Jenni and Paolo Favaro. Self-supervised feature learning by learning to spot artifacts. In CVPR, 2018.
- Gustav Larsson, Michael Maire, and Gregory Shakhnarovich. Colorization as a proxy task for visual understanding. In CVPR, 2017.
- Spyros Gidaris, Praveer Singh, and Nikos Komodakis. Unsupervised representation learning by predicting image rotations. In ICLR, 2018.
- Deepak Pathak, Ross B Girshick, Piotr Dollar, Trevor Darrell, and Bharath Hariharan. Learning features by watching objects move. In CVPR, 2017.
- Jacob Walker, Abhinav Gupta, and Martial Hebert. Dense optical flow prediction from a static image. In ICCV, 2015.
- T Nathan Mundhenk, Daniel Ho, and Barry Y Chen. Improvements to context based self-supervised learning. CVPR, 2018.
- A. Mahendran, J. Thewlis, and A. Vedaldi. Cross pixel optical flow similarity for self-supervised learning. In ACCV, 2018.
- Mathilde Caron, Piotr Bojanowski, Armand Joulin, and Matthijs Douze. Deep clustering for unsupervised learning of visual features. In ECCV, 2018.
- Zeyu Feng, Chang Xu, and Dacheng Tao. Self-Supervised Representation Learning by Rotation Feature Decoupling. In CVPR, 2019.
Core idea
A Chinese proverb: "η΅δΈεθε¨ε
¨θΊ«".
Bibtex
@inproceedings{zhan2019self,
author = {Zhan, Xiaohang and Pan, Xingang and Liu, Ziwei and Lin, Dahua and Loy, Chen Change},
title = {Self-Supervised Learning via Conditional Motion Propagation},
booktitle = {Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)},
month = {June},
year = {2019}
}