• Stars
    star
    126
  • Rank 284,543 (Top 6 %)
  • Language
    C++
  • License
    Other
  • Created about 8 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

This repository is the official release of the code for the following paper "FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-based CNN Architecture" which is published at the 13th Asian Conference on Computer Vision (ACCV 2016).

FuseNet

[PyTorch]

Please refer to PyTorch implementation for an up-to-date implementation.

FuseNet is developed as a general architecture for deep convolutional neural network (CNN) to train dataset with RGB-D images. It can be used for semantic segmentation, scene classification and other applications. This repository is an official release of this paper, and it is implemented based on the BVLC/caffe framework.

Usage

Installation

The code is compatible with an early Caffe version of June 2016. It is developed under Ubuntu 16.04 with CUDA 7.5 and cuDNN v5.0. If you use the program under other Ubuntu distributions, you may need to comment out line 72--73 in the root CMakeLists.txt file. If you compile under other OS, please use Google as your friend. We mostly test the program with Nvidia Titan X GPU. Please note multi-GPU training is supported.

git clone https://github.com/tum-vision/fusenet.git
mkdir build && cd build
cmake ..
make -j10
make runtest -j10

Training and Testing

We provide all needed python scripts and prototxt files to reproduce our published results under ./fusenet/. A short guideline is given below. For further detailed instructions, check here.

Initiazation

Our network architecture is based on VGGNet-16layer. However, since we have extra input channel for depth, we provide the compute the

Data preparation

To store dataset, we save paired RGB-D images into LMDB. We also scale the original depth image to the the range of [0, 255]. It is optional further cast the scaled depth value into unsigned char (grayscale), so as to save memory. If you do not want to lose precision, store the scaled depth as float. To prepare LMDB, we provide the following python scripts for your reference. However, you can also write your own image input layer to grab paired RGB-D images.

demo   ./fusenet/scripts/save_lmdb.py

LMDB shuffling

We support LMDB shuffling, and recommend to do shuffling after each epoch during training. To enable this option, flag shuffle to be true for the DataLayer in the prototxt. Note that we do not support shuffling with LevelDB.

demo   ./fusenet/segmentation/nyuv2_sf1/train.prototxt

Weighted cross-entropy loss

On common technique to handle class imbalance is to give the loss of each class a different weight, which typically has a higher value for less frequent classes and a lower value for more frequent class. For semantic segmentation, we support this loss weighting with the SoftmaxWithLossLayer by allowing user to specify a weight for each label. One way to set the weights is accordingly to the inverse class frequency (see our paper for detail). We provide the weights used in our paper in ./fusenet/data/.

Batch normalization

We use batch normalization after each convolution. This is supported by the Caffe BatchNormLayer. Notice that we add ScaleLayer after each BatchNormLayer.

Testing

To test the semantic segmentation performance, we provide the python scripts to calculate the global accuracy, average class accuracy and average intersection-over-union score. The implementation is based on confusion matrix.

demo   ./fusenet/scripts/test_segmentation.py

Released Caffemodel

Semantic Image Segmentation

The items marked with ticks are already available for downloading, otherwise they will be released soon. Unless otherwise stated, all models are finetuned from pretrained VGGNet-16Layer model. Stay tuned 🔥

NYUv2 40-class semantic segmentation

More information about the dataset, check here.

  • FuseNet-SF1:

    This model is trained with the FuseNet Sparse-Fusion1 architecture on 320x240 resolution. To obtain 640x480 full resolution, you can use bilinear upsample the segmenation or better with CRF refinement.

  • FuseNet-SF5:

    This model is trained the FuseNet SparseFusion5 architecture on 320x240 resolution. It gives 66.0% global pixelwise accuracy, 43.4% average classwise accuracy and 32.7% average classwise IoU.

SUN-RGBD 37-class semantic segmentation

More information about the dataset, check here.

  • FuseNet-SF5:

    This model is trained with 224x224 resolution. It gives 76.3% global pixelwise accuracy, 48.30% average classwise accuracy and 37.3% average classwise IoU,

Publication

If you use this code or our trainined model in your work, please consider cite the following paper.

Caner Hazirbas, Lingni Ma, Csaba Domokos and Daniel Cremers, "FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-based CNN Architecture", in proceedings of the 13th Asian Conference on Computer Vision, 2016. (pdf)

@inproceedings{fusenet2016accv,
 author    = "C. Hazirbas and L. Ma and C. Domokos and D. Cremers",
 title     = "FuseNet: incorporating depth into semantic segmentation via fusion-based CNN architecture",
 booktitle = "Asian Conference on Computer Vision",
 year      = "2016",
 month     = "November",
}

License and Contact

BVLC/caffe is released under the BSD 2-Clause license. The modification to the original code is released under GNU General Public License Version 3 (GPLv3).

Contact Lingni Ma ✉️ for questions, comments and reporting bugs.

More Repositories

1

lsd_slam

LSD-SLAM
C++
2,486
star
2

tandem

[CoRL 21'] TANDEM: Tracking and Dense Mapping in Real-time using Deep Multi-view Stereo
C++
911
star
3

LDSO

DSO with SIM(3) pose graph optimization and loop closure
C++
653
star
4

dvo_slam

Dense Visual Odometry and SLAM
C++
607
star
5

fastfusion

Volumetric 3D Mapping in Real-Time on a CPU
C++
543
star
6

online_photometric_calibration

Implementation of online photometric calibration (https://vision.in.tum.de/research/vslam/photometric-calibration)
C++
306
star
7

mono_dataset_code

Code for Monocular Visual Odometry Dataset - https://vision.cs.tum.edu/data/datasets/mono-dataset
C++
261
star
8

tum_ardrone

Repository for the tum_ardrone ROS package, implementing autonomous flight with PTAM-based visual navigation for the Parrot AR.Drone.
C++
221
star
9

dvo

Dense Visual Odometry
C++
148
star
10

pnec

[CVPR 2022] README.md The Probabilistic Normal Epipolar Constraint for Frame-To-Frame Rotation Optimization under Uncertain Feature Positions
C++
117
star
11

captcha_recognition

Python
71
star
12

intrinsic-neural-fields

[ECCV '22] Intrinsic Neural Fields: Learning Functions on Manifolds
Jupyter Notebook
66
star
13

dbatk

Distributed Bundle Adjustment Toolkit
59
star
14

fastms

Real-Time Minimization of the Piecewise Smooth Mumford-Shah Functional
C++
57
star
15

ardrone_autonomy

This is a slightly modified version of the official ardrone_autonomy package, which You can find here: https://github.com/AutonomyLab/ardrone_autonomy
C
53
star
16

learn_prox_ops

Implementation of "Learning Proximal Operators: Using Denoising Networks for Regularizing Inverse Imaging Problems"
Python
43
star
17

tum_simulator

C++
40
star
18

prost

A fast and flexible convex optimization framework based on proximal splitting
C++
35
star
19

afs

Automatic Feature Selection
C++
31
star
20

rgbd_scribble_benchmark

RGB-D Scribble-based Segmentation Benchmark
Python
26
star
21

autonavx_ardrone

Code for AR.Drone Exercises
C++
24
star
22

autonavx_web

interactive exercises for AUTONAVx course
JavaScript
24
star
23

sublabel_relax

Code for sublabel-accurate multi-labeling papers (published at CVPR '16, ECCV '16)
C++
20
star
24

csd_lmnn

Combined Spectral Descriptors and LMNN for non-rigid 3D shape retrieval
MATLAB
19
star
25

rgbd_demo

Simple ROS demo for processing RGB-D data
C++
17
star
26

mem

Masked Event Modeling: Self-Supervised Pretraining for Event Cameras (WACV '24)
Python
15
star
27

kfusion_ros

ROS integration for kfusion
C++
11
star
28

openni2_camera

OpenNI2 camera node for ROS
C++
9
star
29

articulation

articulation models
C++
6
star
30

nnascg

Source code for experiments in paper "Deriving Neural Network Design and Learning from the Probabilistic Framework of Chain Graphs" by Yuesong Shen and Daniel Cremers.
Python
4
star
31

lgm

Implementation of Layered Graphical Model with demo code
Python
4
star
32

dca

Source code for the NeurIPS 2022 paper "Deep Combinatorial Aggregation"
Python
4
star
33

flbo

2
star
34

hierahyp

1
star