• Stars
    star
    244
  • Rank 164,910 (Top 4 %)
  • Language
    Python
  • License
    MIT License
  • Created over 1 year ago
  • Updated 5 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Official Implementation of "CAT-Seg🐱: Cost Aggregation for Open-Vocabulary Semantic Segmentation"

PWC
PWC
PWC
PWC
PWC

CAT-Seg🐱: Cost Aggregation for Open-Vocabulary Semantic Segmentation

This is our official implementation of CAT-Seg!

[arXiv] [Project] [HuggingFace Demo] [Segment Anything with CAT-Seg]

by Seokju Cho*, Heeseong Shin*, Sunghwan Hong, Seungjun An, Seungjun Lee, Anurag Arnab, Paul Hongsuck Seo, Seungryong Kim

Introduction

We introduce cost aggregation to open-vocabulary semantic segmentation, which jointly aggregates both image and text modalities within the matching cost.

For further details and visualization results, please check out our paper and our project page.

❗️Update: We released a demo for combining CAT-Seg and Segment Anything for open-vocabulary semantic segmentation! We also released the code and installation guide in the demo branch for trying out the demo on your local devices!

🔥TODO

  • Train/Evaluation Code (Mar 21, 2023)
  • Pre-trained weights (Mar 30, 2023)
  • Code of interactive demo (Jul 13, 2023)

Installation

Please follow installation.

Data Preparation

Please follow dataset preperation.

Demo

If you want to try your own images locally, please try interactive demo.

Training

We provide shell scripts for training and evaluation. run.sh trains the model in default configuration and evaluates the model after training.

To train or evaluate the model in different environments, modify the given shell script and config files accordingly.

Training script

sh run.sh [CONFIG] [NUM_GPUS] [OUTPUT_DIR] [OPTS]

# For ViT-B variant
sh run.sh configs/vitb_r101_384.yaml 4 output/
# For ViT-L variant
sh run.sh configs/vitl_swinb_384.yaml 4 output/
# For ViT-H variant
sh run.sh configs/vitl_swinb_384.yaml 4 output/ MODEL.SEM_SEG_HEAD.CLIP_PRETRAINED "ViT-H" MODEL.SEM_SEG_HEAD.TEXT_GUIDANCE_DIM 1024
# For ViT-G variant
sh run.sh configs/vitl_swinb_384.yaml 4 output/ MODEL.SEM_SEG_HEAD.CLIP_PRETRAINED "ViT-G" MODEL.SEM_SEG_HEAD.TEXT_GUIDANCE_DIM 1280

Evaluation

eval.sh automatically evaluates the model following our evaluation protocol, with weights in the output directory if not specified. To individually run the model in different datasets, please refer to the commands in eval.sh.

Evaluation script

sh run.sh [CONFIG] [NUM_GPUS] [OUTPUT_DIR] [OPTS]

sh eval.sh configs/vitl_swinb_384.yaml 4 output/ MODEL.WEIGHTS path/to/weights.pth

Pretrained Models

We provide pretrained weights for our models reported in the paper. All of the models were evaluated with 4 NVIDIA RTX 3090 GPUs, and can be reproduced with the evaluation script above.

Name Backbone CLIP A-847 PC-459 A-150 PC-59 PAS-20 PAS-20b Download
CAT-Seg (B) R101 ViT-B/16 8.9 16.6 27.2 57.5 93.7 78.3 ckpt 
CAT-Seg (L) Swin-B ViT-L/14 11.4 20.4 31.5 62.0 96.6 81.8 ckpt 
CAT-Seg (H) Swin-B ViT-H/14 13.1 20.1 34.4 61.2 96.7 80.2 ckpt 
CAT-Seg (G) Swin-B ViT-G/14 14.1 21.4 36.2 61.5 97.1 81.4 ckpt 

Acknowledgement

We would like to acknowledge the contributions of public projects, such as Zegformer, whose code has been utilized in this repository. We also thank Benedikt for finding an error in our inference code and evaluating CAT-Seg over various datasets!

Citing CAT-Seg 🐱🙏

@misc{cho2023catseg,
      title={CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation}, 
      author={Seokju Cho and Heeseong Shin and Sunghwan Hong and Seungjun An and Seungjun Lee and Anurag Arnab and Paul Hongsuck Seo and Seungryong Kim},
      year={2023},
      eprint={2303.11797},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

More Repositories

1

3DFuse

Official implementation of "Let 2D Diffusion Model Know 3D-Consistency for Robust Text-to-3D Generation"
Python
561
star
2

RAIN-GS

Code for "Relaxing Accurate Initialization Constraint for 3D Gaussian Splatting" by Jaewoo Jung, Jisang Han, Honggyu An, Jiwon Kang, Seonghoon Park, and Seungryong Kim
Python
252
star
3

GaussianTalker

Official implementation of “GaussianTalker: Real-Time High-Fidelity Talking Head Synthesis with Audio-Driven 3D Gaussian Splatting” by Kyusun Cho, Joungbin Lee, Heeji Yoon, Yeobin Hong, Jaehoon Ko, Sangjun Ahn and Seungryong Kim
Python
209
star
4

LANIT

Official repository for LANIT: Language-Driven Image-to-Image Translation for Unlabeled Data (CVPR 2023)
Python
137
star
5

DiffMatch

Official implementation of "Diffusion Model for Dense Matching" (ICLR'24 Oral)
Python
135
star
6

MoDiTalker

Python
133
star
7

Talk3D

Python
127
star
8

3DGAN-Inversion

Official Implementation of WACV 2023 paper "3D GAN Inversion with Pose Optimization".
Python
107
star
9

NeMF

Official code implementation of NeMF (NeurIPS'22)
Python
83
star
10

DirecT2V

Python
75
star
11

MIDMs

Official code implementation of MIDMs: Matching Interleaved Diffusion Models for Exemplar-based Image Translation (AAAI'23)
Python
65
star
12

DaRF

Official code implementation of "DäRF: Boosting Radiance Fields from Sparse Inputs with Monocular Depth Adaptation"(NeurIPS 2023)
Python
63
star
13

InstaFormer

Official repository for InstaFormer: Instance-aware Image-to-Image Translation with Transformer (CVPR 2022)
Python
51
star
14

CATs-PlusPlus

Official repository for CATs++: Boosting Cost Aggregation with Convolutions and Transformers (TPAMI'22)
Python
42
star
15

MaskingDepth

Python
39
star
16

DAG

33
star
17

INR-st

Official repository for Controllable Style Transfer via Test-time Training of Implicit Neural Representation
Python
28
star
18

GeCoNeRF

HTML
28
star
19

SE-NeRF

16
star
20

MoA

Mixture-of-Adapters
Python
15
star
21

SplitNet

11
star
22

RetDream

Official implementation of "Retrieval-Augmented Score Distillation for Text-to-3D Generation"
5
star
23

CATs-PlusPlus-Project-Page

HTML
5
star
24

LocoTrack

Official implementation of "Local All-Pair Correspondence for Point Tracking" (ECCV 2024)
5
star
25

3DFuse-threestudio

Threestudio extension of the paper "Let 2D Diffusion Model Know 3D-Consistency for Robust Text-to-3D Generation".
Python
2
star
26

GSD

Geometry-Aware Score Distillation via 3D Consistent Noising and Gradient Consistency Modeling
1
star