• Stars
    star
    131
  • Rank 275,867 (Top 6 %)
  • Language
    Python
  • Created over 5 years ago
  • Updated over 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Rethinking RGB-D Salient Object Detection: Models, Datasets, and Large-Scale Benchmarks, IEEE TNNLS 2020

RGB-D Salient Object Detection: Models, Data Sets, and Large-Scale Benchmarks (TNNLS2021)

Rethinking RGB-D Salient Object Detection: Models, Data Sets, and Large-Scale Benchmarks, IEEE TNNLS 2021.

๐Ÿ”ฅ NEWS ๐Ÿ”ฅ

  • [2022/06/09] ๐Ÿ’ฅ Update the related works.
  • [2020/08/02] : Release the training code.


Figure 1: Illustration of the proposed D3Net. In the training stage (Left), the input RGB and depth images are processed with three parallel sub-networks, e.g., RgbNet, RgbdNet, and DepthNet. The three sub-networks are based on a same modified structure of Feature Pyramid Networks (FPN) (see ยง IV-A for details). We introduced these sub-networks to obtain three saliency maps (i.e., Srgb, Srgbd, and Sdepth) which considered both coarse and fine details of the input. In the test phase (Right), a novel depth depurator unit (DDU) (ยง IV-B) is utilized for the first time in this work to explicitly discard (i.e., Srgbd) or keep (i.e., Srgbd) the saliency map introduced by the depth map. In the training/test phase, these components form a nested structure and are elaborately designed (e.g., gate connection in DDU) to automatically learn the salient object from the RGB image and Depth image jointly.

Table of Contents

Abstract

The use of RGB-D information for salient object detection has been explored in recent years. However, relatively few efforts have been spent in modeling salient object detection over real-world human activity scenes with RGB-D. In this work, we fill the gap by making the following contributions to RGB-D salient object detection. First, we carefully collect a new salient person (SIP) dataset, which consists of 1K high-resolution images that cover diverse real-world scenes from various viewpoints, poses, occlusion, illumination, and background. Second, we conduct a large-scale and so far the most comprehensive benchmark comparing contemporary methods, which has long been missing in the area and can serve as a baseline for future research. We systematically summarized 31 popular models, evaluated 17 state-of-the-art methods over seven datasets with totally about 91K images. Third, we propose a simple baseline architecture, called Deep Depth-Depurator Network (D3Net). It consists of a depth depurator unit and a feature learning module, performing initial low-quality depth map filtering and cross-modal feature learning respectively. These components form a nested structure and are elaborately designed to be learned jointly. D3Net exceeds the performance of any prior contenders across five metrics considered, thus serves as a strong baseline to advance the research frontier. We also demonstrate that D3Net can be used to efficiently extract salient person masks from the real scenes, enabling effective background changed book cover application with 20 fps on a single GPU. All the saliency maps, our new SIP dataset, baseline model, and evaluation tools are made publicly available at https://github.com/DengPingFan/D3NetBenchmark.

Notion of Depth Depurator Unit

The statistics of the depth maps in existing datasets (e.g., NJU2K, NLPR, RGBD135, STERE, and LFSD) suggest that โ€” โ€œhigh quality depth maps usually contain clear objects, but the elements in low-quality depth maps are cluttered (2nd row in Fig. 2)โ€


Figure 2: The smoothed histogram (c) of high-quality (1st row), lowquality (2nd row) depth map, respectively.

Related Works

Please refer to our recent survey paper: https://github.com/taozh2017/RGBD-SODsurvey

Paper with code: https://paperswithcode.com/task/rgb-d-salient-object-detection

SIP dataset


Figure 3: Representative subsets in our SIP. The images in SIP are grouped into eight subsets according to background objects (i.e., grass, car, barrier, road, sign, tree, flower, and other), different lighting conditions (i.e., low light and sunny with clear object boundary), and various number of objects (i.e., 1, 2, โ‰ฅ3).


Figure 4: Examples of images, depth maps and annotations (i.e., object level and instance level) in our SIP data set with different numbers of salient objects, object sizes, object positions, scene complexities, and lighting conditions. Note that the โ€œRGBโ€ and โ€œGrayโ€ images are captured by two different monocular cameras from short distances. Thus, the โ€œGrayโ€ images are slightly different from the grayscale images obtained from colorful (RGB) image. Our SIP data set provides a new direction, such as depth estimating from โ€œRGBโ€ and โ€œGrayโ€ images, and instance-level RGB-D SOD.

RGB-D SOD Datasets:

No. Dataset Year Pub. Size #Obj. Types Resolution Download
1 STERE 2012 CVPR 1000 ~One Internet [251-1200] * [222-900] Baidu: rcql/Google (1.29G)
2 GIT 2013 BMVC 80 Multiple Home environment 640 * 480 Baidu/Google (35.6M)
3 DES 2014 ICIMCS 135 One Indoor 640 * 480 Baidu: qhen/Google (60.4M)
4 NLPR 2014 ECCV 1000 Multiple Indoor/outdoor 640 * 480, 480 * 640 Baidu: n701/Google (546M)
5 LFSD 2014 CVPR 100 One Indoor/outdoor 360 * 360 Baidu/Google (32M)
6 NJUD 2014 ICIP 1985 ~One Moive/internet/photo [231-1213] * [274-828] Baidu: zjmf/Google (1.54G)
7 SSD 2017 ICCVW 80 Multiple Movies 960 * 1080 Baidu: e4qz/Google (119M)
8 DUT-RGBD 2019 ICCV 1200 Multiple Indoor/outdoor 400 * 600 Baidu: 6rt0/Google (100M)
9 SIP 2020 TNNLS 929 Multiple Person in wild 992 * 774 Baidu: 46w8/Google (2.16G)
10 Overall Baidu: 39un/Google (5.33G)

Train

Put the three datasets 'NJU2K_TRAIN', 'NLPR_TRAIN','NJU2K_TEST' into the created folder "dataset".

Put the vgg-pretrained model 'vgg16_feat.pth' ( GoogleDrive | BaiduYun code: zsxh ) into the created folder "model".

python train.py --net RgbNet
python train.py --net RgbdNet
python train.py --net DepthNet

Requirement

  • PyTorch>=0.4.1
  • Opencv

Pretrained models

-RgbdNet,RgbNet,DepthNet pretrained models can be downloaded from ( GoogleDrive | BaiduYun code: xf1h )

Training and Testing Sets

Our training dataset is:

https://drive.google.com/open?id=1osdm_PRnupIkM82hFbz9u0EKJC_arlQI

Our testing dataset is:

https://drive.google.com/open?id=1ABYxq0mL4lPq2F0paNJ7-5T9ST6XVHl1

Evaluation

Put the three pretrained models into the created folder "eval/pretrained_model".

python eval.py

Toolbox (updated in 2022/06/09):

[Baidu: i09j] (https://pan.baidu.com/s/1ArnPZ4OwP67NR71OWYjitg)

[Google] (https://drive.google.com/file/d/1I4Z7rA3wefN7KeEQvkGA92u99uXS_aI_/view?usp=sharing)


Table1. Running time comparison.

Results


Results of our model on seven benchmark datasets can be found:

Baidu Pan(https://pan.baidu.com/s/13z0ZEptUfEU6hZ6yEEISuw) ๆๅ–็ : r295

Google Drive(https://drive.google.com/drive/folders/1T46FyPzi3XjsB18i3HnLEqkYQWXVbCnK?usp=sharing)

Citation

If you find this work or code is helpful in your research, please cite:

@article{fan2019rethinking,
  title={{Rethinking RGB-D salient object detection: Models, datasets, and large-scale benchmarks}},
  author={Fan, Deng-Ping and Lin, Zheng and Zhang, Zhao and Zhu, Menglong and Cheng, Ming-Ming},
  journal={IEEE TNNLS},
  year={2021}
}
@article{zhou2021rgbd,
  title={RGB-D Salient Object Detection: A Survey},
  author={Zhou, Tao and Fan, Deng-Ping and Cheng, Ming-Ming and Shen, Jianbing and Shao, Ling},
  journal={CVMJ},
  year={2021}
}

More Repositories

1

SINet

Camouflaged Object Detection, CVPR 2020 (Oral)
Python
517
star
2

PraNet

PraNet: Parallel Reverse Attention Network for Polyp Segmentation, MICCAI 2020 (Oral). Code using Jittor Framework is available.
Python
434
star
3

Inf-Net

Inf-Net: Automatic COVID-19 Lung Infection Segmentation from CT Images, IEEE TMI 2020.
Python
347
star
4

DAVSOD

Shifting More Attention to Video Salient Objection Detection, CVPR 2019 (Best Paper Finalist)
Jupyter Notebook
205
star
5

Polyp-PVT

Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers, AIR 2023.
Python
188
star
6

CSU

Concealed Scene Understanding, Visual Intelligence (VI), 2023
Python
67
star
7

SODBenchmark

Salient objects in clutter, TPAMI, 2022
56
star
8

S-measure

Structure-measure: A New Way to Evaluate Foreground Maps, IJCV2021 (ICCV 2017-Spotlight)
MATLAB
56
star
9

CODToolbox

EvaluationToolBox for Camouflaged Object Detection Task
MATLAB
49
star
10

FSGAN

Python
48
star
11

FS2K

Python
39
star
12

BBS-Net

BBS-Net: RGB-D Salient Object Detection with a Bifurcated Backbone Strategy Network, ECCV 2020
Python
35
star
13

FaceSketch-Awesome-List

Deep Facial Synthesis: A New Challenge
29
star
14

CoEGNet

Re-thinking Co-Salient Object Detection, TPAMI 2021
Python
24
star
15

E-measure

Enhanced-alignment Measure for Binary Foreground Map Evaluation, IJCAI 2018 (Oral)
MATLAB
21
star
16

Saliency-Authors

20
star
17

SOC-DataAug

Salient Objects in Clutter, arXiv, 2021 (ECCV2018 extenstion).
Python
11
star
18

CoSOD3K

8
star
19

FS2KToolbox

MATLAB
7
star
20

Scoot

code for "Scoot: A Perceptual Metric for Facial Sketches" published in ICCV 2019
MATLAB
7
star
21

Camouflaged-Scene-Understanding

Visual Intelligence 2023-Submission
3
star
22

S-measure_cpp

C++
3
star
23

RGBDBenchmark

Rethinking RGB-D Salient Object Detection: Models, Datasets, and Large-Scale Benchmarks
3
star
24

DengPingFan.github.io

HTML
2
star
25

CoSODToolbox

CoSODToolbox
2
star
26

Picture

PostScript
2
star
27

FPM

Cuda
2
star
28

DengPingFan

DengPing Portfolio
2
star
29

Polyp-Awesome-List

1
star