• Stars
    star
    654
  • Rank 66,201 (Top 2 %)
  • Language
    Jupyter Notebook
  • License
    MIT License
  • Created over 2 years ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Language-Driven Semantic Segmentation

Language-driven Semantic Segmentation (LSeg)

The repo contains official PyTorch Implementation of paper Language-driven Semantic Segmentation.

ICLR 2022

Authors:

Overview

We present LSeg, a novel model for language-driven semantic image segmentation. LSeg uses a text encoder to compute embeddings of descriptive input labels (e.g., ''grass'' or 'building'') together with a transformer-based image encoder that computes dense per-pixel embeddings of the input image. The image encoder is trained with a contrastive objective to align pixel embeddings to the text embedding of the corresponding semantic class. The text embeddings provide a flexible label representation in which semantically similar labels map to similar regions in the embedding space (e.g., ''cat'' and ''furry''). This allows LSeg to generalize to previously unseen categories at test time, without retraining or even requiring a single additional training sample. We demonstrate that our approach achieves highly competitive zero-shot performance compared to existing zero- and few-shot semantic segmentation methods, and even matches the accuracy of traditional segmentation algorithms when a fixed label set is provided.

Please check our Video Demo (4k) to further showcase the capabilities of LSeg.

Usage

Installation

Option 1:

pip install -r requirements.txt

Option 2:

conda install ipython
pip install torch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2
pip install git+https://github.com/zhanghang1989/PyTorch-Encoding/
pip install pytorch-lightning==1.3.5
pip install opencv-python
pip install imageio
pip install ftfy regex tqdm
pip install git+https://github.com/openai/CLIP.git
pip install altair
pip install streamlit
pip install --upgrade protobuf
pip install timm
pip install tensorboardX
pip install matplotlib
pip install test-tube
pip install wandb

Data Preparation

By default, for training, testing and demo, we use ADE20k.

python prepare_ade20k.py
unzip ../datasets/ADEChallengeData2016.zip

Note: for demo, if you want to use random inputs, you can ignore data loading and comment the code at link.

๐ŸŒป Try demo now

Download Demo Model

name backbone text encoder url
Model for demo ViT-L/16 CLIP ViT-B/32 download

๐Ÿ‘‰ Option 1: Running interactive app

Download the model for demo and put it under folder checkpoints as checkpoints/demo_e200.ckpt.

Then streamlit run lseg_app.py

๐Ÿ‘‰ Option 2: Jupyter Notebook

Download the model for demo and put it under folder checkpoints as checkpoints/demo_e200.ckpt.

Then follow lseg_demo.ipynb to play around with LSeg. Enjoy!

Training and Testing Example

Training: Backbone = ViT-L/16, Text Encoder from CLIP ViT-B/32

bash train.sh

Testing: Backbone = ViT-L/16, Text Encoder from CLIP ViT-B/32

bash test.sh

Zero-shot Experiments

Data Preparation

Please follow HSNet and put all dataset in data/Dataset_HSN

Pascal-5i

for fold in 0 1 2 3; do
python -u test_lseg_zs.py --backbone clip_resnet101 --module clipseg_DPT_test_v2 --dataset pascal \
--widehead --no-scaleinv --arch_option 0 --ignore_index 255 --fold ${fold} --nshot 0 \
--weights checkpoints/pascal_fold${fold}.ckpt 
done

COCO-20i

for fold in 0 1 2 3; do
python -u test_lseg_zs.py --backbone clip_resnet101 --module clipseg_DPT_test_v2 --dataset coco \
--widehead --no-scaleinv --arch_option 0 --ignore_index 255 --fold ${fold} --nshot 0 \
--weights checkpoints/pascal_fold${fold}.ckpt 
done

FSS

python -u test_lseg_zs.py --backbone clip_vitl16_384 --module clipseg_DPT_test_v2 --dataset fss \
--widehead --no-scaleinv --arch_option 0 --ignore_index 255 --fold 0 --nshot 0 \
--weights checkpoints/fss_l16.ckpt 
python -u test_lseg_zs.py --backbone clip_resnet101 --module clipseg_DPT_test_v2 --dataset fss \
--widehead --no-scaleinv --arch_option 0 --ignore_index 255 --fold 0 --nshot 0 \
--weights checkpoints/fss_rn101.ckpt 

Model Zoo

dataset fold backbone text encoder performance url
pascal 0 ResNet101 CLIP ViT-B/32 52.8 download
pascal 1 ResNet101 CLIP ViT-B/32 53.8 download
pascal 2 ResNet101 CLIP ViT-B/32 44.4 download
pascal 3 ResNet101 CLIP ViT-B/32 38.5 download
coco 0 ResNet101 CLIP ViT-B/32 22.1 download
coco 1 ResNet101 CLIP ViT-B/32 25.1 download
coco 2 ResNet101 CLIP ViT-B/32 24.9 download
coco 3 ResNet101 CLIP ViT-B/32 21.5 download
fss - ResNet101 CLIP ViT-B/32 84.7 download
fss - ViT-L/16 CLIP ViT-B/32 87.8 download

If you find this repo useful, please cite:

@inproceedings{
li2022languagedriven,
title={Language-driven Semantic Segmentation},
author={Boyi Li and Kilian Q Weinberger and Serge Belongie and Vladlen Koltun and Rene Ranftl},
booktitle={International Conference on Learning Representations},
year={2022},
url={https://openreview.net/forum?id=RriDjddCLN}
}

Acknowledgement

Thanks to the code base from DPT, Pytorch_lightning, CLIP, Pytorch Encoding, Streamlit, Wandb

More Repositories

1

Open3D

Open3D: A Modern Library for 3D Data Processing
C++
10,396
star
2

MiDaS

Code for robust monocular depth estimation described in "Ranftl et. al., Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer, TPAMI 2022"
Python
4,041
star
3

OpenBot

OpenBot leverages smartphones as brains for low-cost robots. We have designed a small electric vehicle that costs about $50 and serves as a robot body. Our software stack for Android smartphones supports advanced robotics workloads such as person following and real-time autonomous navigation.
Swift
2,679
star
4

DPT

Dense Prediction Transformers
Python
1,794
star
5

ZoeDepth

Metric depth estimation from a single image
Jupyter Notebook
1,750
star
6

Open3D-ML

An extension of Open3D to address 3D Machine Learning tasks
Python
1,644
star
7

PhotorealismEnhancement

Code & Data for Enhancing Photorealism Enhancement
Python
1,237
star
8

MultiObjectiveOptimization

Source code for Neural Information Processing Systems (NeurIPS) 2018 paper "Multi-Task Learning as Multi-Objective Optimization"
Python
753
star
9

FastGlobalRegistration

Fast Global Registration
C++
489
star
10

Open3D-PointNet2-Semantic3D

Semantic3D segmentation with Open3D and PointNet++
Python
461
star
11

FreeViewSynthesis

Code repository for "Free View Synthesis", ECCV 2020.
Python
262
star
12

StableViewSynthesis

Python
212
star
13

DeepLagrangianFluids

Code repository for "Lagrangian Fluid Simulation with Continuous Convolutions", ICLR 2020.
Python
187
star
14

spear

SPEAR: A Simulator for Photorealistic Embodied AI Research
C++
173
star
15

DirectFuturePrediction

Code for the paper "Learning to Act by Predicting the Future", Alexey Dosovitskiy and Vladlen Koltun, ICLR 2017
Python
152
star
16

VI-Depth

Code for Monocular Visual-Inertial Depth Estimation (ICRA 2023)
Python
139
star
17

NPHard

Combinatorial Optimization with Graph Convolutional Networks and Guided Tree Search
Python
139
star
18

redwood-3dscan

Python
100
star
19

Intseg

Interactive Image Segmentation with Latent Diversity
Python
78
star
20

TanksAndTemples

Toolbox for the TanksAndTemples benchmark website
Python
58
star
21

dcflow

Code for the paper "Accurate Optical Flow via Direct Cost Volume Processing. Jia Xu, Renรฉ Ranftl, and Vladlen Koltun. CVPR 2017"
C++
52
star
22

adaptive-surface-reconstruction

Adaptive Surface Reconstruction for 3D Data Processing
Python
48
star
23

DFE

Python
43
star
24

open3d-cmake-find-package

Find pre-installed Open3D package in CMake
C++
42
star
25

vision-for-action

Code to accompany "Does computer vision matter for action?"
Python
41
star
26

LMRS

Source code for ICLR 2020 paper: "Learning to Guide Random Search"
Python
39
star
27

open3d_downloads

Hosting Open3D test data for development use
23
star
28

Open3D-3rdparty

C
20
star
29

open3d-cmake-external-project

Use Open3D as a CMake external project
CMake
15
star
30

0shot-object-insertion

Simulation and robot code for contact-rich household object insertion (ICRA 2023).
Python
11
star
31

objects-with-lighting

8
star
32

Open3D-Viewer

C++
7
star
33

generalized-smoothing

Companion code for the ICML 2022 paper "Generalizing Gaussian Smoothing for Random Search"
Python
5
star
34

Open3D-Python-CI

Testing Open3D Python package from PyPI and Conda
4
star
35

MetaLearningTradeoffs

Source code for the NeurIPS 2020 Paper: Modeling and Optimization Trade-off in Meta-learning.
Python
4
star
36

hello-world-docker-action

Dockerfile
1
star
37

mshadow

Forked from https://github.com/dmlc/mshadow
C++
1
star