• Stars
    star
    525
  • Rank 84,404 (Top 2 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created over 2 years ago
  • Updated about 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Mask Transfiner for High-Quality Instance Segmentation, CVPR 2022

Mask Transfiner

Mask Transfiner for High-Quality Instance Segmentation [Mask Transfiner, CVPR 2022].

This is the official pytorch implementation of Transfiner built on the open-source detectron2. Our project website contains more information, including the visual slider comparison: vis.xyz/pub/transfiner.

Mask Transfiner for High-Quality Instance Segmentation
Lei Ke, Martin Danelljan, Xia Li, Yu-Wing Tai, Chi-Keung Tang, Fisher Yu
CVPR, 2022

Updates

🔥🔥 We released the Video Mask Transfiner and HQ-YTVIS benchmark in ECCV'2022.

Highlights

  • Transfiner: High-quality instance segmentation with state-of-the-art performance and extreme details.
  • Novelty: An efficient transformer targeting for high-resolution instance masks predictions based on the quadtree structure.
  • Efficacy: Large mask and boundary AP improvements on three instance segmentation benchmarks, including COCO, Cityscapes and BDD100k.
  • Simple: Small additional computation burden compared to standard transformer and easy to use.

Mask Transfiner with Quadtree Transformer

Results on COCO test-dev

(Check Table 9 of the paper for full results, all methods are trained on COCO train2017. This is a reimplementation. Thus, the numbers might be slightly different from the ones reported in our original paper.)

Backbone(configs) Method mAP(mask)
R50-FPN Mask R-CNN (ICCV'17) 34.2
R50-FPN PANet (CVPR'18) 36.6
R50-FPN MS R-CNN (CVPR'19) 35.6
R50-FPN PointRend (1x, CVPR'20) 36.3
R50-FPN Transfiner (1x, CVPR'22) 37.0, Pretrained Model
Res-R50-FPN BCNet (CVPR'21) 38.4
R50-FPN Transfiner (3x, CVPR'22) 39.2, Pretrained Model
R50-FPN-DCN Transfiner (3x, CVPR'22) 40.5, Pretrained Model
Backbone(configs) Method mAP(mask)
R101-FPN Mask R-CNN (ICCV'17) 36.1
R101-FPN MS R-CNN (CVPR'19) 38.3
R101-FPN BMask R-CNN (ECCV'20) 37.7
R101-FPN SOLOv2 (NeurIPS'20) 39.7
R101-FPN BCNet (CVPR'21) 39.8
R101-FPN Transfiner (3x, CVPR'22) 40.5, Pretrained Model
R101-FPN-DCN Transfiner (3x, CVPR'22) 42.2, Pretrained Model
Backbone(configs) Pretrain Lr Schd Size Method mAP(box) on Val2017 mAP(mask) on Val2017
Swin-T,init_weight of imagenet (d2 format) ImageNet-1k 3X [480-800] Transfiner 46.9 43.5, Pretrained Model
Swin-B,init_weight of imagenet (d2 format) ImageNet-22k 3X [480-800] Transfiner 49.8 45.5,Pretrained Model

Results on LVIS Dataset, v0.5

Backbone(configs) Lr Schd Method mAP(mask)
X101-FPN 1x Mask R-CNN 27.1
X101-FPN 1x Transfiner 29.2, Pretrained Model

Introduction

Two-stage and query-based instance segmentation methods have achieved remarkable results. However, their segmented masks are still very coarse. In this paper, we present Mask Transfiner for high-quality and efficient instance segmentation. Instead of operating on regular dense tensors, our Mask Transfiner decomposes and represents the image regions as a quadtree. Our transformer-based approach only processes detected error-prone tree nodes and self-corrects their errors in parallel. While these sparse pixels only constitute a small proportion of the total number, they are critical to the final mask quality. This allows Mask Transfiner to predict highly accurate instance masks, at a low computational cost. Extensive experiments demonstrate that Mask Transfiner outperforms current instance segmentation methods on three popular benchmarks, significantly improving both two-stage and query-based frameworks by a large margin of +3.0 mask AP on COCO and BDD100K, and +6.6 boundary AP on Cityscapes.

Step-by-step Installation

conda create -n transfiner python=3.7 -y
conda activate transfiner
 
conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=11.0 -c pytorch
 
# Coco api and visualization dependencies
pip install ninja yacs cython matplotlib tqdm
pip install opencv-python==4.4.0.40
# Boundary dependency
pip install scikit-image
pip install kornia==0.5.11
 
export INSTALL_DIR=$PWD
 
# install pycocotools. Please make sure you have installed cython.
cd $INSTALL_DIR
git clone https://github.com/cocodataset/cocoapi.git
cd cocoapi/PythonAPI
python setup.py build_ext install
 
# install transfiner
cd $INSTALL_DIR
git clone --recursive https://github.com/SysCV/transfiner.git
cd transfiner/
python3 setup.py build develop
 
unset INSTALL_DIR

Dataset Preparation

Prepare for coco2017 dataset and Cityscapes following this instruction.

  mkdir -p datasets/coco
  ln -s /path_to_coco_dataset/annotations datasets/coco/annotations
  ln -s /path_to_coco_dataset/train2017 datasets/coco/train2017
  ln -s /path_to_coco_dataset/test2017 datasets/coco/test2017
  ln -s /path_to_coco_dataset/val2017 datasets/coco/val2017

Multi-GPU Training and Evaluation on Validation set

Refer to our scripts folder for more traning, testing and visualization commands:

bash scripts/train_transfiner_3x_101.sh

Or

bash scripts/train_transfiner_1x_50.sh

Pretrained Models

Download the pretrained models from the above result table:

  mkdir pretrained_model
  #And put the downloaded pretrained models in this directory.

Testing on Test-dev

bash scripts/test_3x_transfiner_101.sh

Visualization

bash scripts/visual.sh

for swin-based model:

bash scripts/visual_swinb.sh

Citation

If you find Mask Transfiner useful in your research or refer to the provided baseline results, please star ⭐ this repository and consider citing 📝:

@inproceedings{transfiner,
    author={Ke, Lei and Danelljan, Martin and Li, Xia and Tai, Yu-Wing and Tang, Chi-Keung and Yu, Fisher},
    title={Mask Transfiner for High-Quality Instance Segmentation},
    booktitle = {CVPR},
    year = {2022}
}  

If you are interested in Video Mask Transfiner and High-Quality Video Instance Segmentation data:

@inproceedings{vmt,
    title = {Video Mask Transfiner for High-Quality Video Instance Segmentation},
    author = {Ke, Lei and Ding, Henghui and Danelljan, Martin and Tai, Yu-Wing and Tang, Chi-Keung and Yu, Fisher},
    booktitle = {European Conference on Computer Vision (ECCV)},
    year = {2022}
}

Related Links

Related NeurIPS 2021 Work on multiple object tracking & segmentation: PCAN

Related CVPR 2021 Work on occlusion-aware instance segmentation: BCNet

Related ECCV 2020 Work on partially supervised instance segmentation: CPMask

More Repositories

1

sam-hq

Segment Anything in High Quality [NeurIPS 2023]
Python
3,689
star
2

sam-pt

SAM-PT: Extending SAM to zero-shot video segmentation with point-based tracking.
Python
970
star
3

qd-3dt

Official implementation of Monocular Quasi-Dense 3D Object Tracking, TPAMI 2022
Python
515
star
4

qdtrack

Quasi-Dense Similarity Learning for Multiple Object Tracking, CVPR 2021 (Oral)
Python
382
star
5

pcan

Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation, NeurIPS 2021 Spotlight
Python
362
star
6

MaskFreeVIS

Mask-Free Video Instance Segmentation [CVPR 2023]
Python
358
star
7

bdd100k-models

Model Zoo of BDD100K Dataset
Python
285
star
8

idisc

iDisc: Internal Discretization for Monocular Depth Estimation [CVPR 2023]
Python
279
star
9

LiDAR_snow_sim

LiDAR snowfall simulation
Python
172
star
10

r3d3

Python
144
star
11

P3Depth

Python
123
star
12

shift-dev

SHIFT Dataset DevKit - CVPR2022
Python
103
star
13

cascade-detr

[ICCV'23] Cascade-DETR: Delving into High-Quality Universal Object Detection
Python
92
star
14

tet

Implementation of Tracking Every Thing in the Wild, ECCV 2022
Python
69
star
15

TrafficBots

TrafficBots: Towards World Models for Autonomous Driving Simulation and Motion Prediction. ICRA 2023. Code is now available at https://github.com/zhejz/TrafficBots
51
star
16

nutsh

A Platform for Visual Learning from Human Feedback
TypeScript
42
star
17

vmt

Video Mask Transfiner for High-Quality Video Instance Segmentation (ECCV'2022)
Jupyter Notebook
29
star
18

spc2

Instance-Aware Predictive Navigation in Multi-Agent Environments, ICRA 2021
Python
20
star
19

CISS

Unsupervised condition-level adaptation for semantic segmentation
Python
20
star
20

shift-detection-tta

This repository implements continuous test-time adaptation algorithms for object detection on the SHIFT dataset.
Python
18
star
21

vis4d

A modular library for visual 4D scene understanding
Python
17
star
22

dla-afa

Official implementation of Dense Prediction with Attentive Feature Aggregation, WACV 2023
Python
12
star
23

soccer-player

Python
8
star
24

project-template

Python
4
star
25

vis4d_cuda_ops

Cuda
3
star
26

vis4d-template

Vis4D Template.
Shell
3
star