Towards Open Vocabulary Learning: A Survey
arXiv, 2023
Jianzong Wu *
.
Xiangtai Li *
·
Shilin Xu *
·
Haobo Yuan *
·
Henghui Ding
·
Yibo Yang
·
Xia Li
·
Jiangning Zhang
·
Yunhai Tong
·
Xudong Jiang
·
Bernard Ghanem
·
Dacheng Tao
·
This repo is used for recording, tracking, and benchmarking several recent open vocabulary methods to supplement our survey.
If you find any work missing or have any suggestions (papers, implementations, and other resources), feel free to pull requests.
We will add the missing papers to this repo as soon as possible.
🔥New
[-] We update GitHub to record the available paper by the end of 2023/7/20.
🔥 Highlight!!
[1] The first survey for open vocabulary learning, including open vocabulary detection/segmentation/tracking.
[2] It also contains several related domains, including foundation model tuning and open-world detection.
[3] We list detailed results for the most representative works and give a more fair and clearer comparison of different approaches.
Introduction
This survey presents the first detailed survey on open vocabulary tasks, including open-vocabulary object detection, open-vocabulary segmentation, and 3D/video open-vocabulary tasks.
Summary of Contents
- Introduction
- Summary of Contents
- Methods: A Survey
- Related Domains and Beyond
- Acknowledgement
- Contact
Methods: A Survey
Keywords
cap.
: Use caption as auxiliary training datavlm.
: Use pretrained VLMs like CLIPpl.
: Generate pseudo labelsw/o ps.
: Training without pixel-level supervisionpre.
: Vision-language pretrainingdiff.
: Use diffusion modelsunify
: Unify several tasks (semantic segmentation, instance segmentation, and panoptic segmentation)sam
: Use SAM (Segment Anything Model)open.
: Demonstrated with open-set capability. (only for Video Understanding)audio.
: With audio modality.other
: Other methods that cannot be grouped into above ones.
Open Vocabulary Object Detection
Open Vocabulary Segmentation
Year | Venue | Keywords | Paper Title | Code/Project |
---|---|---|---|---|
2023 | CVPR | unify. , vlm. |
Primitive Generation and Semantic-related Alignment for Universal Zero-Shot Segmentation | Code |
2023 | CVPR | unify. , vlm. |
FreeSeg: Unified, Universal and Open-Vocabulary Image Segmentation | Code |
Semantic Segmentation
Instance Segmentation
Year | Venue | Keywords | Paper Title | Code/Project |
---|---|---|---|---|
2023 | CVPR | vlm. |
Semantic-Promoted Debiasing and Background Disambiguation for Zero-Shot Instance Segmentation | Code |
2022 | CVPR | cap. , pl. , vlm. |
Open-Vocabulary Instance Segmentation via Robust Cross-Modal Pseudo-Labeling | Code |
2023 | CVPR | vlm , cap , w/o ps. |
Mask-free OVIS: Open-Vocabulary Instance Segmentation without Manual Mask Annotations | Code |
2023 | arXiv | cap. |
Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation | Code |
Panoptic Segmentation
Year | Venue | Keywords | Paper Title | Code/Project |
---|---|---|---|---|
2023 | CVPR | unify. , vlm. |
Primitive Generation and Semantic-related Alignment for Universal Zero-Shot Segmentation | Code |
2022 | arXiv | vlm |
Open-Vocabulary Panoptic Segmentation with MaskCLIP | N/A |
2023 | CVPR | diff , vlm |
Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models | Code |
2023 | arXiv | vlm. |
Open-vocabulary Panoptic Segmentation with Embedding Modulation | N/A |
2023 | arXiv | vlm. , 'unify' |
Hierarchical Open-vocabulary Universal Image Segmentation | Code |
Open Vocabulary Video Understanding
Video Classification
Tracking
Year | Venue | Keywords | Paper Title | Code/Project |
---|---|---|---|---|
2023 | CVPR | vlm. ,open. |
OVTrack: Open-Vocabulary Multiple Object Tracking | Project |
Video Instance Segmentation
Year | Venue | Keywords | Paper Title | Code/Project |
---|---|---|---|---|
2023 | arXiv | vlm. ,open. |
Towards Open-Vocabulary Video Instance Segmentation | N/A |
2023 | arXiv | vlm. ,open. |
OpenVIS: Open-vocabulary Video Instance Segmentation | N/A |
Open Vocabulary 3D Scene Understanding
3D Classification
Year | Venue | Keywords | Paper Title | Code/Project |
---|---|---|---|---|
2022 | CVPR | vlm. |
PointCLIP: Point Cloud Understanding by CLIP | Code |
2022 | arXiv | vlm. |
CLIP2Point: Transfer CLIP to Point Cloud Classification with Image-Depth Pre-training | Code |
2022 | arXiv | vlm. |
PointCLIP V2: Adapting CLIP for Powerful 3D Open-world Learning | Code |
2022 | arXiv | vlm. |
LidarCLIP or: How I Learned to Talk to Point Clouds | Code |
2023 | CVPR | vlm. |
ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding | Code |
2023 | ICML | vlm. |
Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining | Code |
3D Detection
Year | Venue | Keywords | Paper Title | Code/Project |
---|---|---|---|---|
2022 | arXiv | vlm. |
Open-Vocabulary 3D Detection via Image-level Class and Debiased Cross-modal Contrastive Learning | N/A |
2023 | CVPR | vlm. |
Open-Vocabulary Point-Cloud Object Detection without 3D Annotation | Code |
3D segmentation
Year | Venue | Keywords | Paper Title | Code/Project |
---|---|---|---|---|
2023 | CVPR | vlm. |
PLA: Language-Driven Open-Vocabulary 3D Scene Understanding | Code |
2023 | CVPR | vlm. |
CLIP2Scene: Towards Label-efficient 3D Scene Understanding by CLIP | Code |
2023 | arXiv | vlm. |
CLIP-FO3D: Learning Free Open-world 3D Scene Representations from 2D Dense CLIP | N/A |
2023 | arXiv | vlm. |
OpenMask3D: Open-Vocabulary 3D Instance Segmentation | Project |
Related Domains and Beyond
Class-agnostic Detection and Segmentation
Year | Venue | Keywords | Paper Title | Code/Project |
---|---|---|---|---|
2022 | RA-L | - | Learning Open-World Object Proposals without Learning to Classify | Code |
2021 | ICCV | - | Unidentified Video Objects: A Benchmark for Dense, Open-World Segmentation | Project |
2022 | CVPR | - | Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity | Project |
2022 | ECCV | - | Class-agnostic object detection with multi-modal transformer | Code |
2022 | TPAMI | - | Open World Entity Segmentation | Project |
2022 | arXiv | - | Fine-Grained Entity Segmentation | Project |
Open-World Object Detection
Year | Venue | Keywords | Paper Title | Code/Project |
---|---|---|---|---|
2015 | CVPR | - | Towards Open World Recognition | N/A |
2021 | CVPR | - | Towards Open World Object Detection. | Code |
2022 | CVPR | - | OW-DETR: Open-world Detection Transformer | Code |
2022 | ECCV | - | UC-OWOD: Unknown-Classified Open World Object Detection | Code |
2022 | arXiv | - | Revisiting Open World Object Detection | Code |
2022 | arXiv | - | Rectifying Open-set Object Detection: A Taxonomy, Practical Applications, and Proper Evaluation | [N/A] |
2022 | arXiv | - | Open World DETR: Transformer based Open World Object Detection | N/A |
2022 | arXiv | - | PROB: Probabilistic Objectness for Open World Object Detection | Code |
Open-Set Panoptic Segmentation
Year | Venue | Keywords | Paper Title | Code/Project |
---|---|---|---|---|
2021 | CVPR | - | Exemplar-Based Open-Set Panoptic Segmentation Network | Project |
2022 | arXiv | - | Dual Decision Improves Open-Set Panoptic Segmentation | Code |
Acknowledgement
If you find our survey and repository useful for your research project, please consider citing our paper:
@article{wu2023open,
title={Towards Open Vocabulary Learning: A Survey},
author={Jianzong Wu and Xiangtai Li and Shilin Xu and Haobo Yuan and Henghui Ding and Yibo Yang and Xia Li and Jiangning Zhang and Yunhai Tong and Xudong Jiang and Bernard Ghanem and Dacheng Tao},
year={2023},
journal={arXiv pre-print},
}