CLIMS
Code repository for our paper "CLIMS: Cross Language Image Matching for Weakly Supervised Semantic Segmentation" in CVPR 2022.
Please to NOTE that this repository is an improved version of our camera-ready version (you can refer to the directory of previous_version/
). We recommend to use our improved version of CLIMS instead of camera-ready version.
Dataset
PASCAL VOC2012
You will need to download the images (JPEG format) in PASCAL VOC2012 dataset at here and train_aug ground-truth can be found at here. Make sure your data/VOC2012 folder
is structured as follows:
โโโ VOC2012/
| โโโ Annotations
| โโโ ImageSets
| โโโ SegmentationClass
| โโโ SegmentationClassAug
| โโโ SegmentationObject
MS-COCO 2014
You will need to download the images (JPEG format) in MSCOCO 2014 dataset at here and ground-truth mask can be found at here. Make sure your data/COCO folder
is structured as follows:
โโโ COCO/
| โโโ train2014
| โโโ val2014
| โโโ annotations
| | โโโ instances_train2014.json
| | โโโ instances_val2014.json
| โโโ mask
| | โโโ train2014
| | โโโ val2014
Training on PASCAL VOC2012
- Install CLIP.
$ pip install ftfy regex tqdm
$ pip install git+https://github.com/openai/CLIP.git
- Download pre-trained baseline CAM ('res50_cam.pth') at here and put it at the directory of
cam-baseline-voc12/
. - Train CLIMS on PASCAL V0C2012 dataset to generate initial CAMs.
CUDA_VISIBLE_DEVICES=0 python run_sample.py --voc12_root /data1/xjheng/dataset/VOC2012/ --hyper 10,24,1,0.2 --clims_num_epoches 15 --cam_eval_thres 0.15 --work_space clims_voc12 --cam_network net.resnet50_clims --train_clims_pass True --make_clims_pass True --eval_cam_pass True
- Train IRNet and generate pseudo semantic masks.
CUDA_VISIBLE_DEVICES=0 python run_sample.py --voc12_root /data1/xjheng/dataset/VOC2012/ --cam_eval_thres 0.15 --work_space clims_voc12 --cam_network net.resnet50_clims --cam_to_ir_label_pass True --train_irn_pass True --make_sem_seg_pass True --eval_sem_seg_pass True
- Train DeepLabv2 using pseudo semantic masks.
cd segmentation/
Evaluation Results
The quality of initial CAMs and pseudo masks on PASCAL VOC2012.
Method | backbone | CAMs | + RW | + IRNet |
---|---|---|---|---|
CLIMS(camera-ready) | R50 | 56.6 | 70.5 | - |
CLIMS(this repo) | R50 | 58.6 | ~73 | 74.1 |
Evaluation results on PASCAL VOC2012 val and test sets.
Please cite the results of camera-ready version
Method | Supervision | Network | Pretrained | val | test |
---|---|---|---|---|---|
AdvCAM | I | DeepLabV2 | ImageNet | 68.1 | 68.0 |
EDAM | I+S | DeepLabV2 | COCO | 70.9 | 70.6 |
CLIMS(camera-ready) | I | DeepLabV2 | ImageNet | 69.3 | 68.7 |
CLIMS(camera-ready) | I | DeepLabV2 | COCO | 70.4 | 70.0 |
CLIMS(this repo) | I | DeepLabV2 | ImageNet | 70.3 | 70.6 |
CLIMS(this repo) | I | DeepLabV2 | COCO | 71.4 | 71.2 |
CLIMS(this repo) | I | DeepLabV1-R38 | ImageNet | 73.3 | 73.4 |
(Please cite the results of camera-ready version. Initial CAMs, pseudo semantic masks, and pre-trained models of camera-ready version can be found at Google Drive)
Training on MSCOCO 2014
- Download pre-trained baseline CAM ('res50_cam.pth') at here and put it at the directory of
cam-baseline-coco/
. - Train CLIMS on MSCOCO 2014 dataset to generate initial CAMs.
CUDA_VISIBLE_DEVICES=6,7 python -m torch.distributed.launch --nproc_per_node=2 run_sample_coco.py --work_space clims_coco --clims_network net.resnet50_clims --train_clims_pass True --make_clims_pass True --eval_cam_pass True --clims_num_epoches 8 --cam_eval_thres 0.15 --hyper 2,14,1.25,0.2 --cam_batch_size 16 --clims_learning_rate 0.0005 --use_distributed_train True --cbs_loss_thresh 0.285
If you are using our code, please consider citing our paper.
@InProceedings{Xie_2022_CVPR,
author = {Xie, Jinheng and Hou, Xianxu and Ye, Kai and Shen, Linlin},
title = {CLIMS: Cross Language Image Matching for Weakly Supervised Semantic Segmentation},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2022},
pages = {4483-4492}
}
@article{xie2022cross,
title={Cross Language Image Matching for Weakly Supervised Semantic Segmentation},
author={Xie, Jinheng and Hou, Xianxu and Ye, Kai and Shen, Linlin},
journal={arXiv preprint arXiv:2203.02668},
year={2022}
}
This repository was highly based on IRNet, thanks for Jiwoon Ahn's great code.