Instance-aware Semantic Segmentation via Multi-task Network Cascades
By Jifeng Dai, Kaiming He, Jian Sun
This python version is re-implemented by Haozhi Qi when he was an intern at Microsoft Research.
Introduction
MNC is an instance-aware semantic segmentation system based on deep convolutional networks, which won the first place in COCO segmentation challenge 2015, and test at a fraction of a second per image. We decompose the task of instance-aware semantic segmentation into related sub-tasks, which are solved by multi-task network cascades (MNC) with shared features. The entire MNC network is trained end-to-end with error gradients across cascaded stages.
MNC was initially described in a CVPR 2016 oral paper.
This repository contains a python implementation of MNC, which is ~10% slower than the original matlab implementation.
This repository includes a bilinear RoI warping layer, which enables gradient back-propagation with respect to RoI coordinates.
Misc.
This code has been tested on Linux (Ubuntu 14.04), using K40/Titan X GPUs.
The code is built based on py-faster-rcnn.
MNC is released under the MIT License (refer to the LICENSE file for details).
Citing MNC
If you find MNC useful in your research, please consider citing:
@inproceedings{dai2016instance,
title={Instance-aware Semantic Segmentation via Multi-task Network Cascades},
author={Dai, Jifeng and He, Kaiming and Sun, Jian},
booktitle={CVPR},
year={2016}
}
Main Results
training data | test data | mAP^[email protected] | mAP^[email protected] | time (K40) | time (Titian X) | |
---|---|---|---|---|---|---|
MNC, VGG-16 | VOC 12 train | VOC 12 val | 65.0% | 46.3% | 0.42sec/img | 0.33sec/img |
Installation guide
- Clone the MNC repository:
# Make sure to clone with --recursive
git clone --recursive https://github.com/daijifeng001/MNC.git
-
Install Python packages:
numpy
,scipy
,cython
,python-opencv
,easydict
,yaml
. -
Build the Cython modules and the gpu_nms, gpu_mask_voting modules by:
cd $MNC_ROOT/lib
make
- Install
Caffe
andpycaffe
dependencies (see: Caffe installation instructions for official installation guide)
Note: Caffe must be built with support for Python layers!
# In your Makefile.config, make sure to have this line uncommented
WITH_PYTHON_LAYER := 1
# CUDNN is recommended in building to reduce memory footprint
USE_CUDNN := 1
- Build Caffe and pycaffe:
cd $MNC_ROOT/caffe-mnc # If you have all of the requirements installed # and your Makefile.config in place, then simply do: make -j8 && make pycaffe
Demo
First, download the trained MNC model.
./data/scripts/fetch_mnc_model.sh
Run the demo:
cd $MNC_ROOT
./tools/demo.py
Result demo images will be stored to data/demo/
.
The demo performs instance-aware semantic segmentation with a trained MNC model (using VGG-16 net). The model is pre-trained on ImageNet, and finetuned on VOC 2012 train set with additional annotations from SBD. The mAP^r of the model is 65.0% on VOC 2012 validation set. The test speed per image is ~0.33sec on Titian X and ~0.42sec on K40.
Training
This repository contains code to end-to-end train MNC for instance-aware semantic segmentation, where gradients across cascaded stages are counted in training.
Preparation:
- Run
./data/scripts/fetch_imagenet_models.sh
to download the ImageNet pre-trained VGG-16 net. - Download the VOC 2007 dataset to ./data/VOCdevkit2007
- Run
./data/scripts/fetch_sbd_data.sh
to download the VOC 2012 dataset together with the additional segmentation annotations in SBD to ./data/VOCdevkitSDS.
1. End-to-end training of MNC for instance-aware semantic segmentation
To end-to-end train a 5-stage MNC model (on VOC 2012 train), use experiments/scripts/mnc_5stage.sh
. Final mAP^[email protected] should be ~65.0% (mAP^[email protected] should be ~46.3%), on VOC 2012 validation.
cd $MNC_ROOT
./experiments/scripts/mnc_5stage.sh [GPU_ID] VGG16 [--set ...]
# GPU_ID is the GPU you want to train on
# --set ... allows you to specify fast_rcnn.config options, e.g.
# --set EXP_DIR seed_rng 1701 RNG_SEED 1701
2. Training of CFM for instance-aware semantic segmentation
The code also includes an entry to train a convolutional feature masking (CFM) model for instance aware semantic segmentation.
@inproceedings{dai2015convolutional,
title={Convolutional Feature Masking for Joint Object and Stuff Segmentation},
author={Dai, Jifeng and He, Kaiming and Sun, Jian},
booktitle={CVPR},
year={2015}
}
2.1. Download pre-computed MCG proposals
Download and process the pre-computed MCG proposals.
cd $MNC_ROOT
./data/scripts/fetch_mcg_data.sh
python ./tools/prepare_mcg_maskdb.py --para_job 24 --db train --output data/cache/voc_2012_train_mcg_maskdb/
python ./tools/prepare_mcg_maskdb.py --para_job 24 --db val --output data/cache/voc_2012_val_mcg_maskdb/
Resulting proposals would be at folder data/MCG/
.
2.2. Train the model
Run experiments/scripts/cfm.sh
to train on VOC 2012 train set. Final mAP^[email protected] should be ~60.5% (mAP^[email protected] should be ~42.6%), on VOC 2012 validation.
cd $MNC_ROOT
./experiments/scripts/cfm.sh [GPU_ID] VGG16 [--set ...]
# GPU_ID is the GPU you want to train on
# --set ... allows you to specify fast_rcnn.config options, e.g.
# --set EXP_DIR seed_rng 1701 RNG_SEED 1701
3. End-to-end training of Faster-RCNN for object detection
Faster-RCNN can be viewed as a 2-stage cascades composed of region proposal network (RPN) and object detection network. Run script experiments/scripts/faster_rcnn_end2end.sh
to train a Faster-RCNN model on VOC 2007 trainval. Final mAP^b should be ~69.1% on VOC 2007 test.
cd $MNC_ROOT
./experiments/scripts/faster_rcnn_end2end.sh [GPU_ID] VGG16 [--set ...]
# GPU_ID is the GPU you want to train on
# --set ... allows you to specify fast_rcnn.config options, e.g.
# --set EXP_DIR seed_rng1701 RNG_SEED 1701