Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP
This repo contains the code for our paper Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP
FC-CLIP is an universal model for open-vocabulary image segmentation problems, consisting of a class-agnostic segmenter, in-vocabulary classifier, out-of-vocabulary classifier. With everything built upon a shared single frozen convolutional CLIP model, FC-CLIP not only achieves state-of-the-art performance on various open-vocabulary segmentation benchmarks, but also enjoys a much lower training (3.2 days with 8 V100) and testing costs compared to prior arts.
Installation
See installation instructions.
Getting Started
See Preparing Datasets for FC-CLIP.
See Getting Started with FC-CLIP.
We also support FC-CLIP with HuggingFace 🤗 Demo
Model Zoo
ADE20K(A-150) | Cityscapes | Mapillary Vistas | ADE20K-Full (A-847) |
Pascal Context 59 (PC-59) |
Pascal Context 459 (PC-459) |
Pascal VOC 21 (PAS-21) |
Pascal VOC 20 (PAS-20) |
COCO (training dataset) |
download | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
PQ | mAP | mIoU | PQ | mAP | mIoU | PQ | mIoU | mIoU | mIoU | mIoU | mIoU | mIoU | PQ | mAP | mIoU | ||
FC-CLIP (ResNet50) | 17.9 | 9.5 | 23.3 | 40.3 | 21.6 | 53.2 | 15.9 | 24.4 | 7.1 | 50.5 | 12.9 | 75.9 | 89.5 | 50.7 | 40.7 | 58.8 | checkpoint |
FC-CLIP (ResNet101) | 19.1 | 10.2 | 24.0 | 40.9 | 24.1 | 53.9 | 16.7 | 23.2 | 7.7 | 48.9 | 12.3 | 77.6 | 91.3 | 51.4 | 41.6 | 58.9 | checkpoint |
FC-CLIP (ResNet50x4) | 21.8 | 11.7 | 26.8 | 42.2 | 23.8 | 54.6 | 17.4 | 24.6 | 8.7 | 54.0 | 13.1 | 79.0 | 92.9 | 52.1 | 42.8 | 60.4 | checkpoint |
FC-CLIP (ResNet50x16) | 22.5 | 13.6 | 29.4 | 42.0 | 25.6 | 56.0 | 17.8 | 26.1 | 10.3 | 56.4 | 15.7 | 80.7 | 94.5 | 54.4 | 45.0 | 63.3 | checkpoint |
FC-CLIP (ResNet50x64) | 22.8 | 13.6 | 28.4 | 42.7 | 27.4 | 55.1 | 18.2 | 27.3 | 10.8 | 55.7 | 16.2 | 80.3 | 95.1 | 55.6 | 46.4 | 65.3 | checkpoint |
FC-CLIP (ConvNeXt-Large) | 26.8 | 16.8 | 34.1 | 44.0 | 26.8 | 56.2 | 18.3 | 27.8 | 14.8 | 58.4 | 18.2 | 81.8 | 95.4 | 54.4 | 44.6 | 63.7 | checkpoint |
Citing FC-CLIP
If you use FC-CLIP in your research, please use the following BibTeX entry.
@inproceedings{yu2023fcclip,
title={Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP},
author={Qihang Yu and Ju He and Xueqing Deng and Xiaohui Shen and Liang-Chieh Chen},
journal={arXiv: 2308.02487},
year={2023}
}