• Stars
    star
    115
  • Rank 304,100 (Top 7 %)
  • Language
    C++
  • License
    Apache License 2.0
  • Created almost 6 years ago
  • Updated over 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Source code for: Flex-Convolution (Million-Scale Point-Cloud Learning Beyond Grid-Worlds), accepted at ACCV 2018

Flex-Convolution (Million-Scale Point-Cloud Learning Beyond Grid-Worlds)

Fabian Groh, Patrick Wieschollek, Hendrik P.A. Lensch

Build Status TensorFlow v1.10

Abstract

Traditional convolution layers are specifically designed to exploit the natural data representation of images -- a fixed and regular grid. However, unstructured data like 3D point clouds containing irregular neighborhoods constantly breaks the grid-based data assumption. Therefore applying best-practices and design choices from 2D-image learning methods towards processing point clouds are not readily possible. In this work, we introduce a natural generalization flex-convolution of the conventional convolution layer along with an efficient GPU implementation. We demonstrate competitive performance on rather small benchmark sets using fewer parameters and lower memory consumption and obtain significant improvements on a million-scale real-world dataset. Ours is the first which allows to efficiently process 7 million points concurrently.

The following figure shows the raw network semantic segmentation prediction on a real-world example:

This repository contains the source code of our FlexConv Layer from our 2018 ACCV paper "Flex-Convolution (Million-Scale Point-Cloud Learning Beyond Grid-Worlds)".

Provided novel operations

In the following we summarize the operations described in our paper along with a highly tuned (online, exhaustive) nearest neighbor search layer for 3d point-clouds. All layers follow the tf.layers interface and can be directly used in your TensorFlow model. We further provide unit-tests to verify the correctness of our implementation.

# some point-cloud data
features = np.random.randn(B, Din, N).astype(np.float32)
positions = np.random.randn(B, Dp, N).astype(np.float32)

# To find neighborhoods of K neighbors (not used in our paper, just for your convenience):
neighborhoods = knn_bruteforce(positions, K=8)

# To apply our Flex-Convolution operation to a given set of points with some input-features, position and neighborhood information:
new_features = flex_convolution(features, positions, neighborhoods, out_channels=32)
new_features = flex_convolution_transpose(features, positions, neighborhoods, out_channels=32)

# To apply max-pooling for each neighborhood:
new_features = flex_pooling(features, neighborhoods)

Build Instructions

Our CUDA kernel implementations use CUB primitives. You might install this header-only library by

user@host $ # apt-get install unzip
user@host $ cd /tmp
user@host $ wget https://github.com/NVlabs/cub/archive/v1.8.0.zip
user@host $ unzip v1.8.0.zip -d $HOME/libs
user@host $ export CUB_INC=$HOME/libs/cub-1.8.0/
user@host $ rm /tmp/v1.8.0.zip

We provide GPU-tailored CUDA implementations of our novel FlexConv, FlexPool, FlexDeconv, NearestNeighbor operations in TensorFlow, which require a compilation/linking step. To build our operations just use

user@host $ pip install tensorflow-gpu --user                # optional if not yet installed
user@host $ cd user_ops
user@host $ cmake . -DPYTHON_EXECUTABLE=python2 && make -j   # switch the python version when necessary
user@host $ python test_all.py                               # run all unit-tests to verify the operations
user@host $ cd ..
user@host $ python example.py                                # fully functional toy-example

Deep learning on point-clouds is a complex matter and an active research area. Hence, our internal codebase reflects that complexity and we try our best to provide a usable implementation. To provide a simple training example, we demonstrate training on a very basic 3D-MNIST dataset which deliberately omits fancy parts to give you an idea how to actually train such a model with our operations:

user@host $ python basic_mnist_3d.py --gpu 0

Benchmark

We benchmarked the inference time of entire network on the 2D-3D-S dataset and with a recent test on a NVIDIA V100 GPU, we were able to process ~18 Million Points (the paper stated 7 Million Points on 1080 GTX).

ShapeNet Segmentation

ShapeNet part segmentation results per category and mIoU (%) for different methods and inference speed (on a NVIDIA GeForce GTX 1080 Ti).

mIoU shapes/sec Airplane Bag Cap Car Chair Earphones Guitar Knife Lamp Laptop Motorbike Mug Pistol Rocket Skateboard Table
Kd-Network [4] 77.4 n.a. 80.1 74.6 74.3 70.3 88.6 73.5 90.2 87.2 81.0 94.9 57.4 86.7 78.1 51.8 69.9 80.3
PointNet [1] 80.4 n.a. 83.4 78.7 82.5 74.9 89.6 73.0 91.5 85.9 80.8 95.3 65.2 93.0 81.2 57.9 72.8 80.6
PointNet++ [2] 81.9 2.7 82.4 79.0 87.7 77.3 90.8 71.8 91.0 85.9 83.7 95.3 71.6 94.1 81.3 58.7 76.4 82.6
SPLATNet3D [3] 82.0 9.4 81.9 83.9 88.6 79.5 90.1 73.5 91.3 84.7 84.5 96.3 69.7 95.0 81.7 59.2 70.4 81.3
SGPN [5] 82.8 n.a. 80.4 78.6 78.8 71.5 88.6 78.0 90.9 83.0 78.8 95.8 77.8 93.8 87.4 60.1 92.3 89.4
Ours 85.0 489.3 83.6 91.2 96.7 79.5 84.7 71.7 92.0 86.5 83.2 96.6 71.7 95.7 86.1 74.8 81.4 84.5

example segmentation

2D-3D-S dataset

Class specific average precision (AP) on the 2D-3D-S dataset.

mAP Table Chair Sofa Bookcase Board Ceiling Floor Wall Beam Col. Wind. Door
Armeni [6] 49.93 46.02 16.15 6.78 54.71 3.91 71.61 88.70 72.86 66.67 91.77 25.92 54.11
Armeni [6] 44.19 39.87 11.43 4.91 57.76 3.73 50.74 80.48 65.59 68.53 85.08 21.17 45.39
PointNet [1] n.a. 46.67 33.80 4.76 n.a. 11.72 n.a. n.a. n.a. n.a. n.a. n.a. n.a.
SGPN [5] 54.35 46.90 40.77 6.38 47.61 11.05 79.44 66.29 88.77 77.98 60.71 66.62 56.75
Ours 55.27 66.03 51.75 15.59 39.03 43.50 87.20 96.00 65.53 54.76 52.74 55.34 35.81
Ours** 56.55 67.02 52.75 16.61 39.26 47.68 87.33 96.10 65.52 56.83 55.10 57.66 36.76

example segmentation

More Resources

Citation

If you use the code in this repository, please cite our paper:

@inproceedings{accv2018/Groh,
  author    = {Fabian Groh and
               Patrick Wieschollek and
               Hendrik P. A. Lensch
               },
  title     = {Flex-Convolution (Million-Scale Point-Cloud Learning Beyond Grid-Worlds)},
  booktitle = {Asian Conference on Computer Vision (ACCV)},
  month     = {Dezember},
  year      = {2018}
}

References

[1] C. Qi, H. Su, K. Mo, L. Guibas, "PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017.
[2] C. Qi and L. Yi, H. Su, L. Guibas, "PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space", Advances in Neural Information Processing Systems (NIPS) 2017.
[3] H. Su, V. Jampani, D.Sun, S. Maji, E. Kalogerakis, M.-H. Yang, J. Kautz, "SPLATNet: Sparse Lattice Networks for Point Cloud Processing", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2018.
[4] R. Klokov, V. Lempitsky, "Escape from Cells: Deep Kd-Networks for the Recognition of 3D Point Cloud Models", Proceedings of the IEEE International Conference on Computer Vision (ICCV) 2017.
[5] W. Wang, R. Yu, Q. Huang, U. Neumann, "Sgpn: Similarity group proposal network for 3d point cloud instance segmentation", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2018.
[6] I. Armeni, A. Sax, A.-R. Zamir, S. Savarese, "Joint 2D-3D-Semantic Data for Indoor Scene Understanding", ArXiv e-prints 2017.

More Repositories

1

NeRD-Neural-Reflectance-Decomposition

NeRD: Neural Reflectance Decomposition from Image Collections - ICCV 2021
Python
246
star
2

ggnn

GGNN: State of the Art Graph-based GPU Nearest Neighbor Search
Cuda
141
star
3

SIGNeRF

SIGNeRF: Scene Integrated Generation for Neural Radiance Fields
Python
112
star
4

learning-blind-motion-deblurring

Multiframe Image Deconvolution (ICCV17)
Python
106
star
5

Neural-PIL

Neural-PIL: Neural Pre-Integrated Lighting for Reflectance Decomposition - NeurIPS2021
Python
99
star
6

Product-Quantization-Tree

GPU-based large scale Approx. Nearest Neighbor Search, accepted at CVPR 2016
Cuda
93
star
7

pointcloud-viewer

Efficient Large-Scale Point-Cloud Viewer based on OpenGL
C++
79
star
8

will-people-like-your-image

Image Aesthetics Estimation (WACV18)
Python
61
star
9

SpatialDETR

Official implementation of SpatialDETR. The paper will be presented at ECCV 2022
Python
55
star
10

tf_custom_op

Boilerplate template for adding custom operations to TensorFlow
C++
14
star
11

emca

EMCA: Explorer of Monte Carlo based Algorithms
Python
12
star
12

low-poly-painter

Python
11
star
13

LearningToSynchronizeVideos

Implementation of our Video Synchronization paper in TensorFlow (ICMLA 2017)
C
7
star
14

infomark-ui

Front-end for the Infomark online platform
Elm
6
star
15

MedicalAnnotationFramework

Python
4
star
16

shinobi

SHINOBI: Shape and Illumination using Neural Object Decomposition via BRDF Optimization In-the-wild
TypeScript
3
star
17

DualQueryMIL

Dual-Query Multiple Instance Learning for Dynamic Meta-Embedding based Tumor Classification
Python
3
star
18

hyperrealistic_indoor_streetview

🗺️🏠 Software Project SS24 | Hyperrealistic Indoor Street-View
TypeScript
3
star
19

infomark-backend

scalable, modern and open-source online course management system with auto testing of programming assignments (RESTful API Server)
Go
2
star
20

praktikum-ws16-funfair

praktikum-ws16-funfair created by GitHub Classroom
JavaScript
2
star
21

InfoMark-deprecated

A online platform for distributing exercise sheets and testing exercise submission within Docker
Ruby
2
star
22

mitsuba-emca

Integration of EMCA https://github.com/cgtuebingen/emca into Mitsuba
C++
2
star
23

jannik-hofmann-master-thesis

DNN Visualization in the Unreal Engine for Interactive Fly-through Exploration, an extensible framework for visualizing feed-forward TensorFlow/Keras DNNs. It calculates a force-based layout and displays kernels, kernel activations, saliency maps, and integrated gradients. Published under GNU GPLv3.
C
2
star
24

fastsync

C++
1
star
25

hybrid_zoom

This is the simulation of the hybrid zoom based on the SLM-camera.
Python
1
star
26

snapshot_multispectral_imaging

MATLAB
1
star
27

spatial_xr

🥽🌐 Software Project SS24 | SpatialXR
C#
1
star