• Stars
    star
    622
  • Rank 72,195 (Top 2 %)
  • Language
    Python
  • License
    MIT License
  • Created almost 4 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Reviving Iterative Training with Mask Guidance for Interactive Segmentation

Reviving Iterative Training with Mask Guidance for Interactive Segmentation

drawing drawing

Open In Colab The MIT License

This repository provides the source code for training and testing state-of-the-art click-based interactive segmentation models with the official PyTorch implementation of the following paper:

Reviving Iterative Training with Mask Guidance for Interactive Segmentation
Konstantin Sofiiuk, Ilia Petrov, Anton Konushin
Samsung AI Center Moscow
https://arxiv.org/abs/2102.06583

Abstract: Recent works on click-based interactive segmentation have demonstrated state-of-the-art results by using various inference-time optimization schemes. These methods are considerably more computationally expensive compared to feedforward approaches, as they require performing backward passes through a network during inference and are hard to deploy on mobile frameworks that usually support only forward passes. In this paper, we extensively evaluate various design choices for interactive segmentation and discover that new state-of-the-art results can be obtained without any additional optimization schemes. Thus, we propose a simple feedforward model for click-based interactive segmentation that employs the segmentation masks from previous steps. It allows not only to segment an entirely new object, but also to start with an external mask and correct it. When analyzing the performance of models trained on different datasets, we observe that the choice of a training dataset greatly impacts the quality of interactive segmentation. We find that the models trained on a combination of COCO and LVIS with diverse and high-quality annotations show performance superior to all existing models.

Setting up an environment

This framework is built using Python 3.6 and relies on the PyTorch 1.4.0+. The following command installs all necessary packages:

pip3 install -r requirements.txt

You can also use our Dockerfile to build a container with the configured environment.

If you want to run training or testing, you must configure the paths to the datasets in config.yml.

Interactive Segmentation Demo

drawing

The GUI is based on TkInter library and its Python bindings. You can try our interactive demo with any of the provided models. Our scripts automatically detect the architecture of the loaded model, just specify the path to the corresponding checkpoint.

Examples of the script usage:

# This command runs interactive demo with HRNet18 ITER-M model from cfg.INTERACTIVE_MODELS_PATH on GPU with id=0
# --checkpoint can be relative to cfg.INTERACTIVE_MODELS_PATH or absolute path to the checkpoint
python3 demo.py --checkpoint=hrnet18_cocolvis_itermask_3p --gpu=0

# This command runs interactive demo with HRNet18 ITER-M model from /home/demo/isegm/weights/
# If you also do not have a lot of GPU memory, you can reduce --limit-longest-size (default=800)
python3 demo.py --checkpoint=/home/demo/fBRS/weights/hrnet18_cocolvis_itermask_3p --limit-longest-size=400

# You can try the demo in CPU only mode
python3 demo.py --checkpoint=hrnet18_cocolvis_itermask_3p --cpu
Running demo in docker
# activate xhost
xhost +
docker run -v "$PWD":/tmp/ \
           -v /tmp/.X11-unix:/tmp/.X11-unix \
           -e DISPLAY=$DISPLAY <id-or-tag-docker-built-image> \
           python3 demo.py --checkpoint resnet34_dh128_sbd --cpu

Controls:

Key Description
Left Mouse Button Place a positive click
Right Mouse Button Place a negative click
Scroll Wheel Zoom an image in and out
Right Mouse Button +
Move Mouse
Move an image
Space Finish the current object mask
Initializing the ITER-M models with an external segmentation mask

drawing

According to our paper, ITER-M models take an image, encoded user input, and a previous step mask as their input. Moreover, a user can initialize the model with an external mask before placing any clicks and correct this mask using the same interface. As it turns out, our models successfully handle this situation and make it possible to change the mask.

To initialize any ITER-M model with an external mask use the "Load mask" button in the menu bar.

Interactive segmentation options
  • ZoomIn (can be turned on/off using the checkbox)
    • Skip clicks - the number of clicks to skip before using ZoomIn.
    • Target size - ZoomIn crop is resized so its longer side matches this value (increase for large objects).
    • Expand ratio - object bbox is rescaled with this ratio before crop.
    • Fixed crop - ZoomIn crop is resized to (Target size, Target size).
  • BRS parameters (BRS type can be changed using the dropdown menu)
    • Network clicks - the number of first clicks that are included in the network's input. Subsequent clicks are processed only using BRS (NoBRS ignores this option).
    • L-BFGS-B max iterations - the maximum number of function evaluation for each step of optimization in BRS (increase for better accuracy and longer computational time for each click).
  • Visualisation parameters
    • Prediction threshold slider adjusts the threshold for binarization of probability map for the current object.
    • Alpha blending coefficient slider adjusts the intensity of all predicted masks.
    • Visualisation click radius slider adjusts the size of red and green dots depicting clicks.

Datasets

We train all our models on SBD and COCO+LVIS and evaluate them on GrabCut, Berkeley, DAVIS, SBD and PascalVOC. We also provide links to additional datasets: ADE20k and OpenImages, that are used in ablation study.

Dataset Description Download Link
ADE20k 22k images with 434k instances (total) official site
OpenImages 944k images with 2.6M instances (total) official site
MS COCO 118k images with 1.2M instances (train) official site
LVIS v1.0 100k images with 1.2M instances (total) official site
COCO+LVIS* 99k images with 1.5M instances (train) original LVIS images +
our combined annotations
SBD 8498 images with 20172 instances for (train)
2857 images with 6671 instances for (test)
official site
Grab Cut 50 images with one object each (test) GrabCut.zip (11 MB)
Berkeley 96 images with 100 instances (test) Berkeley.zip (7 MB)
DAVIS 345 images with one object each (test) DAVIS.zip (43 MB)
Pascal VOC 1449 images with 3417 instances (validation) official site
COCO_MVal 800 images with 800 instances (test) COCO_MVal.zip (127 MB)

Don't forget to change the paths to the datasets in config.yml after downloading and unpacking.

(*) To prepare COCO+LVIS, you need to download original LVIS v1.0, then download and unpack our pre-processed annotations that are obtained by combining COCO and LVIS dataset into the folder with LVIS v1.0.

Testing

Pretrained models

We provide pretrained models with different backbones for interactive segmentation.

You can find model weights and evaluation results in the tables below:

Train
Dataset
Model GrabCut Berkeley SBD DAVIS Pascal
VOC
COCO
MVal
NoC
85%
NoC
90%
NoC
90%
NoC
85%
NoC
90%
NoC
85%
NoC
90%
NoC
85%
NoC
90%
SBD HRNet18 IT-M
(38.8 MB)
1.76 2.04 3.22 3.39 5.43 4.94 6.71 2.51 4.39
COCO+
LVIS
HRNet18
(38.8 MB)
1.54 1.70 2.48 4.26 6.86 4.79 6.00 2.59 3.58
HRNet18s IT-M
(16.5 MB)
1.54 1.68 2.60 4.04 6.48 4.70 5.98 2.57 3.33
HRNet18 IT-M
(38.8 MB)
1.42 1.54 2.26 3.80 6.06 4.36 5.74 2.28 2.98
HRNet32 IT-M
(119 MB)
1.46 1.56 2.10 3.59 5.71 4.11 5.34 2.57 2.97

Evaluation

We provide the script to test all the presented models in all possible configurations on GrabCut, Berkeley, DAVIS, Pascal VOC, and SBD. To test a model, you should download its weights and put them in ./weights folder (you can change this path in the config.yml, see INTERACTIVE_MODELS_PATH variable). To test any of our models, just specify the path to the corresponding checkpoint. Our scripts automatically detect the architecture of the loaded model.

The following command runs the NoC evaluation on all test datasets (other options are displayed using '-h'):

python3 scripts/evaluate_model.py <brs-mode> --checkpoint=<checkpoint-name>

Examples of the script usage:

# This command evaluates HRNetV2-W18-C+OCR ITER-M model in NoBRS mode on all Datasets.
python3 scripts/evaluate_model.py NoBRS --checkpoint=hrnet18_cocolvis_itermask_3p

# This command evaluates HRNet-W18-C-Small-v2+OCR ITER-M model in f-BRS-B mode on all Datasets.
python3 scripts/evaluate_model.py f-BRS-B --checkpoint=hrnet18s_cocolvis_itermask_3p

# This command evaluates HRNetV2-W18-C+OCR ITER-M model in NoBRS mode on GrabCut and Berkeley datasets.
python3 scripts/evaluate_model.py NoBRS --checkpoint=hrnet18_cocolvis_itermask_3p --datasets=GrabCut,Berkeley

Jupyter notebook

You can also interactively experiment with our models using test_any_model.ipynb Jupyter notebook.

Training

We provide the scripts for training our models on the SBD dataset. You can start training with the following commands:

# ResNet-34 non-iterative baseline model
python3 train.py models/noniterative_baselines/r34_dh128_cocolvis.py --gpus=0 --workers=4 --exp-name=first-try

# HRNet-W18-C-Small-v2+OCR ITER-M model
python3 train.py models/iter_mask/hrnet18s_cocolvis_itermask_3p.py --gpus=0 --workers=4 --exp-name=first-try

# HRNetV2-W18-C+OCR ITER-M model
python3 train.py models/iter_mask/hrnet18_cocolvis_itermask_3p.py --gpus=0,1 --workers=6 --exp-name=first-try

# HRNetV2-W32-C+OCR ITER-M model
python3 train.py models/iter_mask/hrnet32_cocolvis_itermask_3p.py --gpus=0,1,2,3 --workers=12 --exp-name=first-try

For each experiment, a separate folder is created in the ./experiments with Tensorboard logs, text logs, visualization and checkpoints. You can specify another path in the config.yml (see EXPS_PATH variable).

Please note that we trained ResNet-34 and HRNet-18s on 1 GPU, HRNet-18 on 2 GPUs, HRNet-32 on 4 GPUs (we used Nvidia Tesla P40 for training). To train on a different GPU you should adjust the batch size using the command line argument --batch-size or change the default value in a model script.

We used the pre-trained HRNetV2 models from the official repository. If you want to train interactive segmentation with these models, you need to download the weights and specify the paths to them in config.yml.

License

The code is released under the MIT License. It is a short, permissive software license. Basically, you can do whatever you want as long as you include the original copyright and license notice in any copy of the software/source.

Citation

If you find this work is useful for your research, please cite our papers:

@inproceedings{ritm2022,
  title={Reviving iterative training with mask guidance for interactive segmentation},
  author={Sofiiuk, Konstantin and Petrov, Ilya A and Konushin, Anton},
  booktitle={2022 IEEE International Conference on Image Processing (ICIP)},
  pages={3141--3145},
  year={2022},
  organization={IEEE}
}

@inproceedings{fbrs2020,
   title={f-brs: Rethinking backpropagating refinement for interactive segmentation},
   author={Sofiiuk, Konstantin and Petrov, Ilia and Barinova, Olga and Konushin, Anton},
   booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
   pages={8623--8632},
   year={2020}
}

More Repositories

1

fbrs_interactive_segmentation

[CVPR2020] f-BRS: Rethinking Backpropagating Refinement for Interactive Segmentation https://arxiv.org/abs/2001.10331
Jupyter Notebook
581
star
2

NeuralHaircut

Neural Haircut: Prior-Guided Strand-Based Hair Reconstruction. ICCV 2023
Python
510
star
3

rome

Realistic mesh-based avatars. ECCV 2022
Python
424
star
4

adaptis

[ICCV19] AdaptIS: Adaptive Instance Selection Network, https://arxiv.org/abs/1909.07829
Jupyter Notebook
335
star
5

imvoxelnet

[WACV2022] ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection
Python
271
star
6

image_harmonization

[WACV2021] Foreground-aware Semantic Representations for Image Harmonization https://arxiv.org/abs/2006.00809
Python
266
star
7

pytorch-ensembles

Pitfalls of In-Domain Uncertainty Estimation and Ensembling in Deep Learning, ICLR 2020
Jupyter Notebook
236
star
8

fcaf3d

[ECCV2022] FCAF3D: Fully Convolutional Anchor-Free 3D Object Detection
Python
223
star
9

iterdet

[S+SSPR2020] IterDet: Iterative Scheme for Object Detection in Crowded Environments
Python
206
star
10

FineControlNet

Official Pytorch Implementation of "FineControlNet: Fine-level Text Control for Image Generation with Spatially Aligned Text Control Injection", 2023
Python
177
star
11

SPIn-NeRF

3D Scene Inpainting with NeRFs
Jupyter Notebook
167
star
12

TwiTi

This is a project of "#Twiti: Social Listening for Threat Intelligence" (TheWebConf 2021)
Python
167
star
13

zero-cost-nas

Zero-Cost Proxies for Lightweight NAS
Jupyter Notebook
141
star
14

BayesDLL

Python
141
star
15

tr3d

[ICIP2023] TR3D: Towards Real-Time Indoor 3D Object Detection
Python
138
star
16

ASAM

Implementation of ASAM: Adaptive Sharpness-Aware Minimization for Scale-Invariant Learning of Deep Neural Networks, ICML 2021.
Python
138
star
17

MLI

Novel View Synthesis with multiplane/multilayer representation: CVPR2022, WACV2023
Python
136
star
18

td3d

[WACV'24] TD3D: Top-Down Beats Bottom-Up in 3D Instance Segmentation
Python
131
star
19

day-to-night

Python
106
star
20

saic_depth_completion

Official implementation of "Decoder Modulation for Indoor Depth Completion" https://arxiv.org/abs/2005.08607
Python
105
star
21

Butterfly_Acc

The codes and artifacts associated with our MICRO'22 paper titled: "Adaptable Butterfly Accelerator for Attention-based NNs via Hardware and Algorithm Co-design"
Verilog
103
star
22

DINAR

Inference code for "DINAR: Diffusion Inpainting of Neural Textures for One-Shot Human Avatars"
Python
98
star
23

tqc_pytorch

Implementation of Truncated Quantile Critics method for continuous reinforcement learning. https://bayesgroup.github.io/tqc/
Python
87
star
24

SummaryMixing

This repository implements SummaryMixing, a simpler, faster and much cheaper replacement to self-attention for automatic speech recognition (see: https://arxiv.org/abs/2307.07421). The code is ready to be used with the SpeechBrain toolkit).
Python
86
star
25

style-people

Inference code for "StylePeople: A Generative Model of Fullbody Human Avatars" paper
Python
72
star
26

MTL

Python
71
star
27

RAMP

[IROS 2023] RAMP: Hierarchical Reactive Motion Planning for Manipulation Tasks Using Implicit Signed Distance Functions
Python
51
star
28

ffc_se

Code for the paper "FFC-SE: Fast Fourier Convolution for Speech Enhancement" (published at Interspeech 2022 conference)
Python
48
star
29

hifi_plusplus

HiFi++: a Unified Framework for Bandwidth Extension and Speech Enhancement (ICASSP 2023)
Python
47
star
30

deep-weight-prior

The Deep Weight Prior, ICLR 2019
Jupyter Notebook
44
star
31

odometry

Training Deep SLAM on Single Frames https://arxiv.org/abs/1912.05405
Python
43
star
32

eagle

Measuring and predicting on-device metrics (latency, power, etc.) of machine learning models
Python
42
star
33

point_based_clothing

Official PyTorch implementation of ICCV'21 paper Point-Based Modeling of Human Clothing
Jupyter Notebook
41
star
34

HandNeRF

Official Pytorch Implementation of "HandNeRF: Learning to Reconstruct Hand-Object Interaction Scene from a Single RGB Image", ICRA 2024
Python
39
star
35

HIO-SDF

[ICRA 2024] HIO-SDF: Hierarchical Incremental Online Signed Distance Fields
Python
39
star
36

gps-augment

Simple but high-performing method for learning a policy of test-time augmentation
Jupyter Notebook
38
star
37

Noise2NoiseFlow

Python
36
star
38

cloud_transformers

[ICCV, 2021] Cloud Transformers: A Universal Approach To Point Cloud Processing Tasks https://arxiv.org/abs/2007.11679
Python
33
star
39

SceneGrasp

[IROS 2023] Real-time Simultaneous Multi-Object 3D Shape Reconstruction, 6DoF Pose Estimation and Dense Grasp Prediction
Python
32
star
40

Sparse-Multi-DNN-Scheduling

Open-source artifacts and codes of our MICRO'23 paper titled “Sparse-DySta: Sparsity-Aware Dynamic and Static Scheduling for Sparse Multi-DNN Workloads”.
Python
32
star
41

Drop-DTW

Python
30
star
42

ltmnet

Learning Tone Curves for Local Image Enhancement
Python
30
star
43

semi-supervised-NFs

Code for the paper Semi-Conditional Normalizing Flows for Semi-Supervised Learning
Python
28
star
44

W2E

This is a project of "Cybersecurity Event Detection with New and Re-emerging Words". (ASIACCS 2020)
28
star
45

FastFlow

FastFlow is a system that automatically detects CPU bottlenecks in deep learning training pipelines and resolves the bottlenecks with data pipeline offloading to remote resources .
Python
25
star
46

geometry-preserving-de

Towards General Purpose, Geometry Preserving Single-View Depth Estimation https://arxiv.org/abs/2009.12419
Python
22
star
47

neural-textures

Inference code for "StylePeople: A Generative Model of Fullbody Human Avatars" paper. This code is for the part of the paper describing video-based avatars.
Python
22
star
48

graphics2raw

Code associated with our paper "Graphics2RAW: Mapping Computer Graphics Images to Sensor RAW Images". The paper has been accepted to the International Conference on Computer Vision (ICCV'23).
Python
22
star
49

nb-asr

Python
21
star
50

FACaP

[IROS 2022]Floorplan-Aware Camera Poses Refinement
Python
21
star
51

content-aware-metadata

Python
20
star
52

coordinate_based_inpainting

[CVPR2019] Coordinate-based texture inpainting for pose-guided human image generation https://arxiv.org/abs/1811.11459
Jupyter Notebook
18
star
53

Genie

Official Implementation of "Genie: Show Me the Data for Quantization" (CVPR 2023)
Python
17
star
54

blox

Macro Neural Architecture Search Benchmark
Python
16
star
55

StepFormer

Python
16
star
56

hierarchical-act

This supplementary code is for IROS 2024 paper "Hierarchical Action Chunk Transformer: Learning Temporal Multimodality from Demonstrations with Fast Imitation Behavior"
Python
14
star
57

Undiff

Test code disclosure for the research paper "UnDiff: Unsupervised Voice Restoration with Unconditional Diffusion Model", as a supplementary material for the paper accepted to the upcoming Interspeech2023 conference.
Python
14
star
58

EdgeViTs

[ECCV 2022] EdgeViTs: Competing Light-weight CNNs on Mobile Devices with Vision Transformers
Python
13
star
59

genren

Implementation of 2D-3D Cyclic Generative Renderer (3DV-2020).
Python
13
star
60

awesomeyaml

Utility library to help parsing, transforming and querying yaml-based configs
Python
12
star
61

pyworkers

Abstraction over threading, multiprocessing and TCP-based RPC
Python
11
star
62

ShellRecontruction

[IROS 2022] Object Shell Reconstruction: Camera-centric Object Representation for Robotic Grasping
Python
11
star
63

StereoLayers

11
star
64

PALinux

In-Kernel Control-Flow Integrity on Commodity OSes using ARM Pointer Authentication
11
star
65

c2g-HOF

[ICRA 2021, IROS 2021] Cost-to-Go Function Generating Networks for High Dimensional Motion Planning
Python
11
star
66

two-camera-white-balance

Python
10
star
67

hole-robust-wf

Data and code for the WACV 2022 paper, "Hole-robust Wireframe Detection"
Python
10
star
68

video-retrieval-sampler

The official implementation for the paper 'mmSampler: Efficient Frame Sampler for Multimodal Video Retrieval'.
Python
9
star
69

ordered_dropout

Technique of Ordered Dropout as used in the paper "Fjord: Fair and accurate federated learning under heterogeneous targets with ordered dropout", NeurIPS'21
Jupyter Notebook
9
star
70

myQASR

Open source the codebase related to the paper: E. Fish, U. Michieli, M. Ozay, "A Model for Every User and Budget: Label-Free and Personalized Mixed-Precision Quantization", 2023. The paper has been accepted for publication at the INTERSPEECH 2023 Conference.
Jupyter Notebook
8
star
71

FedorAS

FedorAS: Federated Architecture Search under system heterogeneity
Python
8
star
72

AdaCLIP

This repository contains the code for AdaCLIP, a computation and latency-aware system for pragmatic multimodal video retrieval.
Python
8
star
73

prime-count

This repository contains codes for Prime+Count paper.
C
7
star
74

appbuddy

Python
7
star
75

RIC

RIC: Rotate-Inpaint-Complete for Generalizable Scene Reconstruction
Python
7
star
76

X-MRS

Food image / recipe (text) cross-modal representation learning, retrieval and (image) synthesis. Code from ACM-Multimedia 2021 "Cross-Modal Retrieval and Synthesis (X-MRS): Closing the Modality Gap in Shared Representation Learning"
Python
7
star
77

FineControlNet-project-page

Project webpage of "FineControlNet: Fine-level Text Control for Image Generation with Spatially Aligned Text Control Injection", 2023
JavaScript
7
star
78

RGBD-FGN

RGBD Fusion Grasp Network with Large-Scale Tableware Grasp Dataset
Python
6
star
79

smoke-bomb

SmokeBomb: Effective Mitigation Method against Cache Side-channel Attacks on the ARM Architecture
C
6
star
80

fastflow-tensorflow

A customized Tensorflow with partial offloading and profiling features for FastFlow project.
C++
5
star
81

NASR

Jupyter Notebook
5
star
82

Z-Fold

Official Implementation of "Z-Fold: A Frustratingly Easy Post-Training Quantization Scheme for LLMs" (EMNLP 2023)
Python
5
star
83

procedure-planning

Python
4
star
84

NAFLD

Two-dimensional convolutional neural network using quantitative US for non-invasive assessment of hepatic steatosis in NAFLD
Python
4
star
85

transpr

Python
4
star
86

ExpandersPruning

This respository contains the code and experiments for the paper "Data-Free Model Pruning at Initialization via Expanders", appearing at the Efficient Deep Learning for Computer Vision CVPR Workshop, 2023. Authors: James Stewart, Umberto Michieli, and Mete Ozay.
Python
4
star
87

NB-MLM

Python
3
star
88

SAGE

Python
3
star
89

WatchYourSteps

3D scenes editing using NeRFs
Python
3
star
90

saic-is

Python
2
star
91

MotionID

Python
2
star
92

MoRF-project-page

JavaScript
2
star
93

Multitask-RFG

Code to reproduce experiments for End-to-end recipe flow graph parsing
Python
2
star
94

viola-project-page

Project webpage for "VioLA: Aligning Videos to 2D LiDAR Scans"
JavaScript
2
star
95

Metis

[ATC '24] Metis: Fast automatic distributed training on heterogeneous GPUs (https://www.usenix.org/conference/atc24/presentation/um)
1
star
96

SwissDINO

Code release of our paper: "Swiss DINO: Efficient and Versatile Vision Framework for On-device Personal Object Search"; Kirill Paramonov, Jia-Xing Zhong, Umberto Michieli, Jijoong Moon, Mete Ozay; IROS 2024.
Python
1
star
97

iiTransformer

Code for "iiTransformer: A Unified Approach to Exploiting Local and Non-Local Information for Image Restoration" (Kang et al., BMVC 2022)
Python
1
star
98

FROST

Codebase release for our accepted paper at ICASSP 2024.
1
star
99

HandNeRF-project-page

Project webpage of "HandNeRF: Learning to Reconstruct Hand-Object Interaction Scene from a Single RGB Image", ICRA 2024
JavaScript
1
star
100

HIO-SDF-project-page

Project page for "HIO-SDF: Hierarchical Incremental Online Signed Distance Fields"
JavaScript
1
star