• Stars
    star
    558
  • Rank 79,819 (Top 2 %)
  • Language
    Python
  • License
    MIT License
  • Created over 2 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

[CVPR 2022] SparseInst: Sparse Instance Activation for Real-Time Instance Segmentation


Tianheng Cheng, Xinggang Wang, Shaoyu Chen, Wenqiang Zhang, Qian Zhang, Chang Huang, Zhaoxiang Zhang, Wenyu Liu
(: corresponding author)

Highlights



PWC

  • SparseInst presents a new object representation method, i.e., Instance Activation Maps (IAM), to adaptively highlight informative regions of objects for recognition.
  • SparseInst is a simple, efficient, and fully convolutional framework without non-maximum suppression (NMS) or sorting, and easy to deploy!
  • SparseInst achieves good trade-off between speed and accuracy, e.g., 37.9 AP and 40 FPS with 608x input.

Updates

This project is under active development, please stay tuned!

  • [2022-10-31]: We release the models & weights for the CSP-DarkNet53 backbone. Which is a strong baseline with highly-competitve inference speed and accuracy.

  • [2022-10-19]: We provide the implementation and inference code based on MindSpore, a nice and efficient Deep Learning framework. Thanks Ruiqi Wang for this kind contribution!

  • [2022-8-9]: We provide the FLOPs counter get_flops.py to obtain the FLOPs/Parameters of SparseInst. This update also includes some bugfixs.

  • [2022-7-17]: Faster🚀: SparseInst now supports training and inference with FP16. Inference with FP16 improves the speed by 30%. Robust: we replace the Sigmoid + Norm with Softmax for numerical stability, especially for ONNX. Easy-to-Use: we provide the script for exporting SparseInst to ONNX models.

  • [2022-4-29]: We fix the common issue about the visualization demo.py, e.g., ValueError: GenericMask cannot handle ....

  • [2022-4-7]: We provide the demo code for visualization and inference on images. Besides, we have added more backbones for SparseInst, including ResNet-101, CSPDarkNet, and PvTv2. We are still supporting more backbones.

  • [2022-3-25]: We have released the code and models for SparseInst!

Overview

SparseInst is a conceptually novel, efficient, and fully convolutional framework for real-time instance segmentation. In contrast to region boxes or anchors (centers), SparseInst adopts a sparse set of instance activation maps as object representation, to highlight informative regions for each foreground objects. Then it obtains the instance-level features by aggregating features according to the highlighted regions for recognition and segmentation. The bipartite matching compels the instance activation maps to predict objects in a one-to-one style, thus avoiding non-maximum suppression (NMS) in post-processing. Owing to the simple yet effective designs with instance activation maps, SparseInst has extremely fast inference speed and achieves 40 FPS and 37.9 AP on COCO (NVIDIA 2080Ti), significantly outperforms the counter parts in terms of speed and accuracy.

Models

We provide two versions of SparseInst, i.e., the basic IAM (3x3 convolution) and the Group IAM (G-IAM for short), with different backbones. All models are trained on MS-COCO train2017.

Fast models

model backbone input aug APval AP FPS weights
SparseInst R-50 640 32.8 33.2 44.3 model
SparseInst R-50-vd 640 34.1 34.5 42.6 model
SparseInst (G-IAM) R-50 608 33.4 34.0 44.6 model
SparseInst (G-IAM, Softmax) R-50 608 33.6 - 44.6 model
SparseInst (G-IAM) R-50 608 34.2 34.7 44.6 model
SparseInst (G-IAM) R-50-DCN 608 36.4 36.8 41.6 model
SparseInst (G-IAM) R-50-vd 608 35.6 36.1 42.8 model
SparseInst (G-IAM) R-50-vd-DCN 608 37.4 37.9 40.0 model
SparseInst (G-IAM) R-50-vd-DCN 640 37.7 38.1 39.3 model

SparseInst with other backbones

model backbone input APval AP FPS weights
SparseInst (G-IAM) CSPDarkNet 640 35.1 - - model

Larger models

model backbone input aug APval AP FPS weights
SparseInst (G-IAM) R-101 640 34.9 35.5 - model
SparseInst (G-IAM) R-101-DCN 640 36.4 36.9 - model

SparseInst with Vision Transformers

model backbone input aug APval AP FPS weights
SparseInst (G-IAM) PVTv2-B1 640 35.3 36.0 33.5 (48.9) model
SparseInst (G-IAM) PVTv2-B2-li 640 37.2 38.2 26.5 model

: measured on RTX 3090.

Note:

  • We will continue adding more models including more efficient convolutional networks, vision transformers, and larger models for high performance and high speed, please stay tuned 😁!
  • Inference speeds are measured on one NVIDIA 2080Ti unless specified.
  • We haven't adopt TensorRT or other tools to accelerate the inference of SparseInst. However, we are working on it now and will provide support for ONNX, TensorRT, MindSpore, Blade, and other frameworks as soon as possible!
  • AP denotes AP evaluated on MS-COCO test-dev2017
  • input denotes the shorter side of the input, e.g., 512x864 and 608x864, we keep the aspect ratio of the input and the longer side is no more than 864.
  • The inference speed might slightly change on different machines (2080 Ti) and different versions of detectron (we mainly use v0.3). If the change is sharp, e.g., > 5ms, please feel free to contact us.
  • For aug (augmentation), we only adopt the simple random crop (crop size: [384, 600]) provided by detectron2.
  • We adopt weight decay=5e-2 as default setting, which is slightly different from the original paper.
  • [Weights on BaiduPan]: we also provide trained models on BaiduPan: ShareLink (password: lkdo).

Installation and Prerequisites

This project is built upon the excellent framework detectron2, and you should install detectron2 first, please check official installation guide for more details.

Updates: SparseInst works well on detectron2-v0.6.

Note: previously, we mainly use v0.3 of detectron2 for experiments and evaluations. Besides, we also test our code on the newest version v0.6. If you find some bugs or incompatibility problems of higher version of detectron2, please feel free to raise a issue!

Install the detectron2:

git clone https://github.com/facebookresearch/detectron2.git
# if you swith to a specific version, e.g., v0.3 (recommended) or v0.6
git checkout tags/v0.6
# build detectron2
python setup.py build develop

Getting Start

🔥 SparseInst with FP16

SparseInst with FP16 achieves 30% faster inference speed and saves much training memory, we provide some comparisons about the memory, inference speed, and training speed in the below table.

FP16 train mem.(log) train mem.(nvidia-smi) train speed infer. speed
6.0G 10.5G 0.8690s/iter 52.17 FPS
3.9G 6.8G 0.6949s/iter 67.57 FPS

Note: statistics are measured on NVIDIA 3090. With FP16, we have faster training speed and can also increase the batch size for better performance.

  • Training with FP16: enable FP16 is simple, you only need to enable SOLVER.AMP.ENABLED=True, or add this configuration to the config file.
python tools/train_net.py --config-file configs/sparse_inst_r50_giam_fp16.yaml --num-gpus 8 SOLVER.AMP.ENABLED True
  • Testing with FP16: enable FP16 for inference by adding --fp16.
python tools/test_net.py --config-file configs/sparse_inst_r50_giam_fp16.yaml --fp16 MODEL.WEIGHTS model_final.pth 

Testing SparseInst

Before testing, you should specify the config file <CONFIG> and the model weights <MODEL-PATH>. In addition, you can change the input size by setting the INPUT.MIN_SIZE_TEST in both config file or commandline.

  • [Performance Evaluation] To obtain the evaluation results, e.g., mask AP on COCO, you can run:
python tools/train_net.py --config-file <CONFIG> --num-gpus <GPUS> --eval MODEL.WEIGHTS <MODEL-PATH>
# example:
python tools/train_net.py --config-file configs/sparse_inst_r50_giam.yaml --num-gpus 8 --eval MODEL.WEIGHTS sparse_inst_r50_giam_aug_2b7d68.pth
  • [Inference Speed] To obtain the inference speed (FPS) on one GPU device, you can run:
python tools/test_net.py --config-file <CONFIG> MODEL.WEIGHTS <MODEL-PATH> INPUT.MIN_SIZE_TEST 512
# example:
python tools/test_net.py --config-file configs/sparse_inst_r50_giam.yaml MODEL.WEIGHTS sparse_inst_r50_giam_aug_2b7d68.pth INPUT.MIN_SIZE_TEST 512

Note:

  • The tools/test_net.py only supports 1 GPU and 1 image per batch for measuring inference speed.
  • The inference time consists of the pure forward time and the post-processing time. While the evaluation processing, data loading, and pre-processing for wrappers (e.g., ImageList) are not included.
  • COCOMaskEvaluator is modified from COCOEvaluator for evaluating mask-only results.

FLOPs and Parameters

The get_flops.py is built based on detectron2 and fvcore.

python tools/get_flops.py --config-file <CONFIG> --tasks parameter flop

Visualizing Images with SparseInst

To inference or visualize the segmentation results on your images, you can run:

python demo.py --config-file <CONFIG> --input <IMAGE-PATH> --output results --opts MODEL.WEIGHTS <MODEL-PATH>
# example
python demo.py --config-file configs/sparse_inst_r50_giam.yaml --input datasets/coco/val2017/* --output results --opt MODEL.WEIGHTS sparse_inst_r50_giam_aug_2b7d68.pth INPUT.MIN_SIZE_TEST 512
  • Besides, the demo.py also supports inference on video (--video-input), camera (--webcam). For inference on video, you might refer to issue #9 to avoid someerrors.
  • --opts supports modifications to the config-file, e.g., INPUT.MIN_SIZE_TEST 512.
  • --input can be single image or a folder of images, e.g., xxx/*.
  • If --output is not specified, a popup window will show the visualization results for each image.
  • Lowering the confidence-threshold will show more instances but with more false positives.

Visualization results (SparseInst-R50-GIAM)

Training SparseInst

To train the SparseInst model on COCO dataset with 8 GPUs. 8 GPUs are required for the training. If you only have 4 GPUs or GPU memory is limited, it doesn't matter and you can reduce the batch size through SOLVER.IMS_PER_BATCH or reduce the input size. If you adjust the batch size, learning schedule should be adjusted according to the linear scaling rule.

python tools/train_net.py --config-file <CONFIG> --num-gpus 8 
# example
python tools/train_net.py --config-file configs/sparse_inst_r50vd_dcn_giam_aug.yaml --num-gpus 8

Custom Training of SparseInst

  1. We suggest you convert your custom datasets into the COCO format, which enables the usage of the default dataset mappers and loaders. You may find more details in the official guide of detectron2.
  2. You need to check whether NUM_CLASSES and NUM_MASKS should be changed according to your scenarios or tasks.
  3. Change the configurations accordingly.
  4. After finishing the above procedures, you can easily train SparseInst by train_net.py.

Acknowledgements

SparseInst is based on detectron2, OneNet, DETR, and timm, and we sincerely thanks for their code and contribution to the community!

Citing SparseInst

If you find SparseInst is useful in your research or applications, please consider giving us a star 🌟 and citing SparseInst by the following BibTeX entry.

@inproceedings{Cheng2022SparseInst,
  title     =   {Sparse Instance Activation for Real-Time Instance Segmentation},
  author    =   {Cheng, Tianheng and Wang, Xinggang and Chen, Shaoyu and Zhang, Wenqiang and Zhang, Qian and Huang, Chang and Zhang, Zhaoxiang and Liu, Wenyu},
  booktitle =   {Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR)},
  year      =   {2022}
}

License

SparseInst is released under the MIT Licence.

More Repositories

1

Vim

[ICML 2024] Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Python
2,734
star
2

4DGaussians

[CVPR 2024] 4D Gaussian Splatting for Real-Time Dynamic Scene Rendering
Jupyter Notebook
2,115
star
3

YOLOP

You Only Look Once for Panopitic Driving Perception.(MIR2022)
Python
1,906
star
4

MapTR

[ICLR'23 Spotlight] MapTR: Structured Modeling and Learning for Online Vectorized HD Map Construction
Python
1,034
star
5

YOLOS

[NeurIPS 2021] You Only Look at One Sequence
Jupyter Notebook
826
star
6

GaussianDreamer

GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models (CVPR 2024)
Python
632
star
7

VAD

[ICCV 2023] VAD: Vectorized Scene Representation for Efficient Autonomous Driving
Python
628
star
8

Matte-Anything

[Image and Vision Computing (Vol.147 Jul. '24)] Interactive Natural Image Matting with Segment Anything Models
Python
473
star
9

QueryInst

[ICCV 2021] Instances as Queries
Python
402
star
10

TopFormer

TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation, CVPR2022
Python
375
star
11

MIMDet

[ICCV 2023] You Only Look at One Partial Sequence
Python
336
star
12

TiNeuVox

TiNeuVox: Fast Dynamic Radiance Fields with Time-Aware Neural Voxels (SIGGRAPH Asia 2022)
Python
322
star
13

ViTMatte

[Information Fusion] Boosting Image Matting with Pretrained Plain Vision Transformers
Python
245
star
14

TeViT

Temporally Efficient Vision Transformer for Video Instance Segmentation, CVPR 2022, Oral
Python
237
star
15

GKT

Efficient and Robust 2D-to-BEV Representation Learning via Geometry-guided Kernel Transformer
Python
218
star
16

BMaskR-CNN

[ECCV 2020] Boundary-preserving Mask R-CNN
Python
184
star
17

HAIS

Hierarchical Aggregation for 3D Instance Segmentation (ICCV 2021)
Python
163
star
18

Symphonies

[CVPR 2024] Symphonies (Scene-from-Insts): Symphonize 3D Semantic Scene Completion with Contextual Instance Queries
Python
160
star
19

VMA

A general map auto annotation framework based on MapTR, with high flexibility in terms of spatial scale and element type
Python
157
star
20

WeakTr

WeakTr: Exploring Plain Vision Transformer for Weakly-supervised Semantic Segmentation
Python
122
star
21

LaneGAP

[ECCV 2024] Lane Graph as Path: Continuity-preserving Path-wise Modeling for Online Lane Graph Construction
114
star
22

SparseTrack

Official PyTorch implementation of SparseTrack (the new version of code will come soon)
Python
108
star
23

CrossVIS

[ICCV 2021] Crossover Learning for Fast Online Video Instance Segmentation
Python
85
star
24

MSG-Transformer

MSG-Transformer: Exchanging Local Spatial Information by Manipulating Messenger Tokens (CVPR 2022)
Python
80
star
25

PolarDETR

73
star
26

BoxTeacher

[CVPR 2023] Exploring High-Quality Pseudo Masks for Weakly Supervised Instance Segmentation
Python
72
star
27

TinyDet

Python
68
star
28

osp

[ECCV 2024] Occupancy as Set of Points
Python
63
star
29

GNeuVox

GNeuVox: Generalizable Neural Voxels for Fast Human Radiance Fields
Python
60
star
30

AziNorm

AziNorm: Exploiting the Radial Symmetry of Point Cloud for Azimuth-Normalized 3D Perception, CVPR 2022.
Python
53
star
31

Featurized-QueryRCNN

Featurized Query R-CNN
Python
46
star
32

RILS

[CVPR 2023] RILS: Masked Visual Reconstruction in Language Semantic Space (https://arxiv.org/abs/2301.06958)
Python
43
star
33

PD-Quant

[CVPR 2023] PD-Quant: Post-Training Quantization Based on Prediction Difference Metric
Python
39
star
34

MIM4D

MIM4D: Masked Modeling with Multi-View Video for Autonomous Driving Representation Learning
36
star
35

NeuSample

Code of "NeuSample: Neural Sample Field for Efficient View Synthesis"
Python
36
star
36

SAUNet

A Simple Adaptive Unfolding Network for Hyperspectral Image Reconstruction
Python
29
star
37

Query6DoF

Query6DoF: Learning Sparse Queries as Implicit Shape Prior for Category-Level 6DoF Pose Estimation
Python
25
star
38

HDR-HexPlane

3DV 2024: Fast High Dynamic Range Radiance Fields for Dynamic Scenes
Python
25
star
39

WeakSAM

WeakSAM: Segment Anything Meets Weakly-supervised Instance-level Recognition
Python
24
star
40

ViTGaze

Python
23
star
41

CircuitFormer

[NeurIPS 2023] CircuitFormer: Circuit as Set of Points
Python
23
star
42

EfficientPose

Cuda
20
star
43

MMIL-Transformer

Python
20
star
44

LSFA

Real-Time and Accurate Object Detection in Compressed Video by Long Short-term Feature Aggregation
Python
19
star
45

OpenInst

Python
14
star
46

BoxCaseg

Jupyter Notebook
14
star
47

mancs

Mancs: A multi-task attentional network with curriculum sampling for person re-identification
Python
12
star
48

RND-SCI

A Range-Null Space Decomposition Approach for Fast and Flexible Spectral Compressive Imaging
Python
10
star
49

DGCN

Python
9
star
50

PySA

Pyramid Self-Attention for Semantic Segmentation
8
star
51

EM-OLN

Python
7
star
52

BCF

Xinggang Wang, Bin Feng, Xiang Bai, Wenyu Liu, and Longin Jan Latecki. Bag of Contour Fragments for Robust Shape Classification. Pattern Recognition, Volume 47, Issue 6, June 2014, Pages 2116-2125.
MATLAB
6
star
53

DiG

Python
3
star
54

TOGS

The official code of "TOGS: Gaussian Splatting with Temporal Opacity Offset for Real-Time 4D DSA Rendering"
Python
2
star
55

tbcl

1
star
56

DeepTunel

Python
1
star