Implementation of "Self-Supervised Learning via Conditional Motion Propagation" (CMP)

Paper

Xiaohang Zhan, Xingang Pan, Ziwei Liu, Dahua Lin, Chen Change Loy, "Self-Supervised Learning via Conditional Motion Propagation", in CVPR 2019 [Project Page]

For further information, please contact Xiaohang Zhan.

Demos (Watching full demos in YouTube)

Conditional motion propagation (motion prediction by guidance)

Guided video generation (draw arrows to let a static image animated)

Semi-automatic annotation (first row: interface, auto zoom-in, mask; second row: optical flows)

Data collection

YFCC frames (45G). YFCC optical flows (LiteFlowNet) (29G). YFCC lists (251M).

Model collection

Pre-trained models for semantic segmentation, instance segmentation and human parsing by CMP can be downloaded here
Models for demos (conditinal motion propagation, guided video generation and semi-automatic annotation) can be downloaded here

Requirements

python>=3.6
pytorch>=0.4.0
others
```
pip install -r requirements.txt
```

Usage

Clone the repo.

git clone [email protected]:XiaohangZhan/conditional-motion-propagation.git
cd conditional-motion-propagation

Representation learning

Prepare data (YFCC as an example)

mkdir data
mkdir data/yfcc
cd data/yfcc
# download YFCC frames, optical flows and lists to data/yfcc
tar -xf UnsupVideo_Frames_v1.tar.gz
tar -xf flow_origin.tar.gz
tar -xf lists.tar.gz

Then folder data looks like:

data
  ├── yfcc
    ├── UnsupVideo_Frames
    ├── flow_origin
    ├── lists
      ├── train.txt
      ├── val.txt

Train CMP for Representation Learning.

If your server supports multi-nodes training.

sh experiments/rep_learning/alexnet_yfcc_voc_16gpu_70k/train.sh # 16 GPUs distributed training
python tools/weight_process.py --config experiments/rep_learning/alexnet_yfcc_voc_16gpu_70k/config.yaml --iter 70000 # extract weights of the image encoder to experiments/rep_learning/alexnet_yfcc_voc_16gpu_70k/checkpoints/convert_iter_70000.pth.tar

If your server does not support multi-nodes training.

sh experiments/rep_learning/alexnet_yfcc_voc_8gpu_140k/train.sh # 8 GPUs distributed training
python tools/weight_process.py --config experiments/rep_learning/alexnet_yfcc_voc_8gpu_140k/config.yaml --iter 140000 # extract weights of the image encoder

Run demos

Download the model and move it to experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints/.
Launch jupyter notebook and run demos/cmp.ipynb for conditional motion propagation, or demos/demo_annot.ipynb for semi-automatic annotation.

Train the model by yourself (optional)

# data not ready
sh experiments/semiauto_annot/resnet50_vip+mpii_liteflow/train.sh # 8 GPUs distributed training

Results

1. Pascal VOC 2012 Semantic Segmentation (AlexNet)

Method (AlexNet)	Supervision (data amount)	% mIoU
Krizhevsky et al. [1]	ImageNet labels (1.3M)	48.0
Random	- (0)	19.8
Pathak et al. [2]	In-painting (1.2M)	29.7
Zhang et al. [3]	Colorization (1.3M)	35.6
Zhang et al. [4]	Split-Brain (1.3M)	36.0
Noroozi et al. [5]	Counting (1.3M)	36.6
Noroozi et al. [6]	Jigsaw (1.3M)	37.6
Noroozi et al. [7]	Jigsaw++ (1.3M)	38.1
Jenni et al. [8]	Spot-Artifacts (1.3M)	38.1
Larsson et al. [9]	Colorization (3.7M)	38.4
Gidaris et al. [10]	Rotation (1.3M)	39.1
Pathak et al. [11]*	Motion Segmentation (1.6M)	39.7
Walker et al. [12]*	Flow Prediction (3.22M)	40.4
Mundhenk et al. [13]	Context (1.3M)	40.6
Mahendran et al. [14]	Flow Similarity (1.6M)	41.4
Ours	CMP (1.26M)	42.9
Ours	CMP (3.22M)	44.5
Caron et al. [15]	Clustering (1.3M)	45.1
Feng et al. [16]	Feature Decoupling (1.3M)	45.3

2. Pascal VOC 2012 Semantic Segmentation (ResNet-50)

Method (ResNet-50)	Supervision (data amount)	% mIoU
Krizhevsky et al. [1]	ImageNet labels (1.2M)	69.0
Random	- (0)	42.4
Walker et al. [12]*	Flow Prediction (1.26M)	54.5
Pathak et al. [11]*	Motion Segmentation (1.6M)	54.6
Ours	CMP (1.26M)	59.0

3. COCO 2017 Instance Segmentation (ResNet-50)

Method (ResNet-50)	Supervision (data amount)	Det. (% mAP)	Seg. (% mAP)
Krizhevsky et al. [1]	ImageNet labels (1.2M)	37.2	34.1
Random	- (0)	19.7	18.8
Pathak et al. [11]*	Motion Segmentation (1.6M)	27.7	25.8
Walker et al. [12]*	Flow Prediction (1.26M)	31.5	29.2
Ours	CMP (1.26M)	32.3	29.8

4. LIP Human Parsing (ResNet-50)

Method (ResNet-50)	Supervision (data amount)	Single-Person (% mIoU)	Multi-Person (% mIoU)
Krizhevsky et al. [1]	ImageNet labels (1.2M)	42.5	55.4
Random	- (0)	32.5	35.0
Pathak et al. [11]*	Motion Segmentation (1.6M)	36.6	50.9
Walker et al. [12]*	Flow Prediction (1.26M)	36.7	52.5
Ours	CMP (1.26M)	36.9	51.8
Ours	CMP (4.57M)	40.2	52.9

Note: Methods marked * have not reported the results in their paper, hence we reimplemented them to obtain the results.

References

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In NeurIPS, 2012.
Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell, and Alexei A Efros. Context encoders: Feature learning by inpainting. In CVPR, 2016.
Richard Zhang, Phillip Isola, and Alexei A Efros. Colorful image colorization. In ECCV. Springer, 2016.
Richard Zhang, Phillip Isola, and Alexei A Efros. Split-brain autoencoders: Unsupervised learning by cross-channel prediction. In CVPR, 2017.
Mehdi Noroozi, Hamed Pirsiavash, and Paolo Favaro. Representation learning by learning to count. In ICCV, 2017.
Mehdi Noroozi and Paolo Favaro. Unsupervised learning of visual representations by solving jigsaw puzzles. In ECCV. Springer, 2016.
Mehdi Noroozi, Ananth Vinjimoor, Paolo Favaro, and Hamed Pirsiavash. Boosting self-supervised learning via knowledge transfer. In CVPR, 2018.
Simon Jenni and Paolo Favaro. Self-supervised feature learning by learning to spot artifacts. In CVPR, 2018.
Gustav Larsson, Michael Maire, and Gregory Shakhnarovich. Colorization as a proxy task for visual understanding. In CVPR, 2017.
Spyros Gidaris, Praveer Singh, and Nikos Komodakis. Unsupervised representation learning by predicting image rotations. In ICLR, 2018.
Deepak Pathak, Ross B Girshick, Piotr Dollar, Trevor Darrell, and Bharath Hariharan. Learning features by watching objects move. In CVPR, 2017.
Jacob Walker, Abhinav Gupta, and Martial Hebert. Dense optical flow prediction from a static image. In ICCV, 2015.
T Nathan Mundhenk, Daniel Ho, and Barry Y Chen. Improvements to context based self-supervised learning. CVPR, 2018.
A. Mahendran, J. Thewlis, and A. Vedaldi. Cross pixel optical flow similarity for self-supervised learning. In ACCV, 2018.
Mathilde Caron, Piotr Bojanowski, Armand Joulin, and Matthijs Douze. Deep clustering for unsupervised learning of visual features. In ECCV, 2018.
Zeyu Feng, Chang Xu, and Dacheng Tao. Self-Supervised Representation Learning by Rotation Feature Decoupling. In CVPR, 2019.

Core idea

A Chinese proverb: "牵一发而动全身".

Bibtex

@inproceedings{zhan2019self,
 author = {Zhan, Xiaohang and Pan, Xingang and Liu, Ziwei and Lin, Dahua and Loy, Chen Change},
 title = {Self-Supervised Learning via Conditional Motion Propagation},
 booktitle = {Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)},
 month = {June},
 year = {2019}
}

XiaohangZhan/conditional-motion-propagation

XiaohangZhan

Reviews

Repository Details