• Stars
    star
    287
  • Rank 143,352 (Top 3 %)
  • Language
    Python
  • License
    MIT License
  • Created almost 5 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

4D Spatio-Temporal Semantic Segmentation on a 3D video (a sequence of 3D scans)

Spatio-Temporal Segmentation

This repository contains the accompanying code for 4D-SpatioTemporal ConvNets: Minkowski Convolutional Neural Networks, CVPR'19.

Change Log

  • 2020-05-19 The latest Minkowski Engine since the commit be5c3, does not require explicit cache clear and can use the memory more efficiently.
  • 2020-05-04: As pointed out by Thomas Chaton on Issue#30, I also found out that the training script contains bugs that models cannot reach the target performance described in the Model Zoo with the latest MinkowskiEngine. I am in the process of debugging the bugs, but I am having some difficulty finding the bugs. So, I created another git repo SpatioTemporalSegmentation-ScanNet from my other private repo that reaches the target performance. Please refer to the SpatioTemporalSegmentation-ScanNet for the ScanNet training. I'll update this repo once I find the bugs and merge SpatioTemporalSegmentation-ScanNet with this repo. Sorry for the trouble.

Requirements

  • Ubuntu 14.04 or higher
  • CUDA 10.1 or higher
  • pytorch 1.3 or higher
  • python 3.6 or higher
  • GCC 6 or higher

Installation

You need to install pytorch and Minkowski Engine either with pip or with anaconda.

Pip

The MinkowskiEngine is distributed via PyPI MinkowskiEngine which can be installed simply with pip. First, install pytorch following the instruction. Next, install openblas.

sudo apt install libopenblas-dev

pip install torch torchvision

pip install -U git+https://github.com/StanfordVL/MinkowskiEngine

Next, clone the repository and install the rest of the requirements

git clone https://github.com/chrischoy/SpatioTemporalSegmentation/

cd SpatioTemporalSegmentation

pip install -r requirements.txt

Troubleshooting

Please visit the MinkowskiEngine issue pages if you have difficulties installing Minkowski Engine.

ScanNet Training

  1. Download the ScanNet dataset from the official website. You need to sign the terms of use.

  2. Next, preprocess all scannet raw point cloud with the following command after you set the path correctly.

python -m lib.datasets.preprocessing.scannet
  1. Train the network with
export BATCH_SIZE=N;
./scripts/train_scannet.sh 0 \
	-default \
	"--scannet_path /path/to/preprocessed/scannet"

Modify the BATCH_SIZE accordingly.

The first argument is the GPU id and the second argument is the path postfix and the last argument is the miscellaneous arguments.

mIoU vs. Overall Accuracy

The official evaluation metric for ScanNet is mIoU. OA, Overal Accuracy is not the official metric since it is not discriminative. This is the convention from the 2D semantic segmentation as the pixelwise overall accuracy does not capture the fidelity of the semantic segmentation. On 3D ScanNet semantic segmentation, OA: 89.087 -> mIOU 71.496 mAP 76.127 mAcc 79.660 on the ScanNet validation set v2.

Then why is the overall accuracy least discriminative metric? This is due to the fact that most of the scenes consist of large structures such as walls, floors, or background and scores on these will dominate the statistics if you use Overall Accuracy.

Synthia 4D Experiment

  1. Download the dataset from download

  2. Extract

cd /path/to/extract/synthia4d
wget http://cvgl.stanford.edu/data2/Synthia4D.tar
tar -xf Synthia4D.tar
tar -xvjf *.tar.bz2
  1. Training
export BATCH_SIZE=N; \
./scripts/train_synthia4d.sh 0 \
	"-default" \
	"--synthia_path /path/to/extract/synthia4d"

The above script trains a network. You have to change the arguments accordingly. The first argument to the script is the GPU id. Second argument is the log directory postfix; change to mark your experimental setup. The final argument is a series of the miscellaneous aruments. You have to specify the synthia directory here. Also, you have to wrap all arguments with " ".

Stanford 3D Dataset

  1. Download the stanford 3d dataset from the website

  2. Preprocess

Modify the input and output directory accordingly in

lib/datasets/preprocessing/stanford.py

And run

python -m lib.datasets.preprocessing.stanford
  1. Train
./scripts/train_stanford.sh 0 \
	"-default" \
	"--stanford3d_path /PATH/TO/PREPROCESSED/STANFORD"

Model Zoo

Model Dataset Voxel Size Conv1 Kernel Size Performance Link
Mink16UNet34C ScanNet train + val 2cm 3 Test set 73.6% mIoU, no sliding window download
Mink16UNet34C ScanNet train 2cm 5 Val 72.219% mIoU, no rotation average, no sliding window per class performance download
Mink16UNet18 Stanford Area5 train 5cm 5 Area 5 test 65.828% mIoU, no rotation average, no sliding window per class performance download
Mink16UNet34 Stanford Area5 train 5cm 5 Area 5 test 66.348% mIoU, no rotation average, no sliding window per class performance download
3D Mink16UNet14A Synthia CVPR19 train 15cm 3 CVPR19 test 81.903% mIoU, no rotation average, no sliding window per class performance download
3D Mink16UNet18 Synthia CVPR19 train 15cm 3 CVPR19 test 82.762% mIoU, no rotation average, no sliding window per class performance download

Note that sliding window style evaluation (cropping and stitching results) used in many related works effectively works as an ensemble (rotation averaging) which boosts the performance.

Demo

The demo code will download weights and an example scene first and then visualize prediction results.

Dataset Scannet Stanford
Command python -m demo.scannet python -m demo.stanford
Result

Citing this work

If you use the Minkowski Engine, please cite:

@inproceedings{choy20194d,
  title={4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks},
  author={Choy, Christopher and Gwak, JunYoung and Savarese, Silvio},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={3075--3084},
  year={2019}
}

Related projects

More Repositories

1

3D-R2N2

Single/multi view image(s) to voxel reconstruction using a recurrent neural network
Python
1,346
star
2

FCGF

Fully Convolutional Geometric Features: Fast and accurate 3D features for registration and correspondence.
Python
632
star
3

pytorch-custom-cuda-tutorial

Tutorial for building a custom CUDA function for Pytorch
Python
512
star
4

DeepGlobalRegistration

[CVPR 2020 Oral] A differentiable framework for 3D registration
Python
467
star
5

fully-differentiable-deep-ndf-tf

Fully differentiable deep-neural decision forest in tensorflow
Python
228
star
6

MakePytorchPlusPlus

How and why you want to make your pytorch CUDA/CPP extension with a Makefile
Makefile
170
star
7

knn_cuda

Fast K-Nearest Neighbor search with GPU
Cuda
141
star
8

open-ucn

The first fully convolutional metric learning for geometric/semantic image correspondences.
Python
87
star
9

pytorch_knn_cuda

K-Nearest Neighbor in Pytorch
Cuda
67
star
10

HighDimConvNets

[CVPR 2020 Oral] High-dimensional Convolutional Networks for Geometric Pattern Recognition
Python
39
star
11

gesvd

Pytorch extension for Singular Value Decompostion (SVD) with LAPACK gesvd backend
C++
28
star
12

SUN_RGBD

Reorganized SUN RGBD dataset
Shell
25
star
13

SpatioTemporalSegmentation-ScanNet

Python
22
star
14

enriching_object_detection

C++
21
star
15

CUDA-FFT-Convolution

CUDA FFT convolution
C++
14
star
16

segmentation_lecture

Python
12
star
17

python-venv-setup

Make python virtual environment setup on old servers less painful
Shell
10
star
18

MinkowskiEngineBenchmark

Python
7
star
19

mini_lseg

Python
5
star
20

PybindNumpyExample

A simple reference template for pybind11 + numpy
C++
4
star
21

env-setup

Setup my dev environment
Shell
3
star
22

dotfiles

dot files
Vim Script
2
star
23

torch_spmm

Cuda
1
star