• Stars
    star
    318
  • Rank 131,872 (Top 3 %)
  • Language
    Python
  • License
    MIT License
  • Created about 6 years ago
  • Updated over 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Official Code: Implicit 3D Orientation Learning for 6D Object Detection from RGB Images

Augmented Autoencoders

Implicit 3D Orientation Learning for 6D Object Detection from RGB Images

Martin Sundermeyer, Zoltan-Csaba Marton, Maximilian Durner, Manuel Brucker, Rudolph Triebel
Best Paper Award, ECCV 2018.

paper, supplement, oral

Citation

If you find Augmented Autoencoders useful for your research, please consider citing:

@InProceedings{Sundermeyer_2018_ECCV,
author = {Sundermeyer, Martin and Marton, Zoltan-Csaba and Durner, Maximilian and Brucker, Manuel and Triebel, Rudolph},
title = {Implicit 3D Orientation Learning for 6D Object Detection from RGB Images},
booktitle = {The European Conference on Computer Vision (ECCV)},
month = {September},
year = {2018}
}

Multi-path Learning for Object Pose Estimation Across Domains

Martin Sundermeyer, Maximilian Durner, En Yen Puang, Zoltan-Csaba Marton, Narunas Vaskevicius, Kai O. Arras, Rudolph Triebel
CVPR 2020
The code of this work can be found here

Overview

We propose a real-time RGB-based pipeline for object detection and 6D pose estimation. Our novel 3D orientation estimation is based on a variant of the Denoising Autoencoder that is trained on simulated views of a 3D model using Domain Randomization. This so-called Augmented Autoencoder has several advantages over existing methods: It does not require real, pose-annotated training data, generalizes to various test sensors and inherently handles object and view symmetries.

1.) Train the Augmented Autoencoder(s) using only a 3D model to predict 3D Object Orientations from RGB image crops
2.) For full RGB-based 6D pose estimation, also train a 2D Object Detector (e.g. https://github.com/fizyr/keras-retinanet)
3.) Optionally, use our standard depth-based ICP to refine the 6D Pose

Requirements: Hardware

For Training

Nvidia GPU with >4GB memory (or adjust the batch size)
RAM >8GB
Duration depending on Configuration and Hardware: ~3h per Object

Requirements: Software

Linux, Python 2.7 / Python 3

GLFW for OpenGL:

sudo apt-get install libglfw3-dev libglfw3  

Assimp:

sudo apt-get install libassimp-dev  

Tensorflow >= 1.6
OpenCV >= 3.1

pip install --pre --upgrade PyOpenGL PyOpenGL_accelerate
pip install cython
pip install cyglfw3
pip install pyassimp==3.3
pip install imgaug
pip install progressbar

Headless Rendering

Please note that we use the GLFW context as default which does not support headless rendering. To allow for both, onscreen rendering & headless rendering on a remote server, set the context to EGL:

export PYOPENGL_PLATFORM='egl'

In order to make the EGL context work, you might need to change PyOpenGL like here

Support for Tensorflow 2.6 / Python 3

The code now also supports TF 2.6 with python 3. Instead of the pip installs above, you can also use the provided conda environment.

conda env create -f aae_py37_tf26.yml

In the activated environment proceed with the preparatory steps.

Preparatory Steps

1. Pip installation

pip install .

2. Set Workspace path, consider to put this into your bash profile, will always be required

export AE_WORKSPACE_PATH=/path/to/autoencoder_ws  

3. Create Workspace, Init Workspace (if installed locally, make sure .local/bin/ is in your PATH)

mkdir $AE_WORKSPACE_PATH
cd $AE_WORKSPACE_PATH
ae_init_workspace

Train an Augmented Autoencoder

1. Create the training config file. Insert the paths to your 3D model and background images.

mkdir $AE_WORKSPACE_PATH/cfg/exp_group
cp $AE_WORKSPACE_PATH/cfg/train_template.cfg $AE_WORKSPACE_PATH/cfg/exp_group/my_autoencoder.cfg
gedit $AE_WORKSPACE_PATH/cfg/exp_group/my_autoencoder.cfg

2. Generate and check training data. The object views should be strongly augmented but identifiable.

(Press ESC to close the window.)

ae_train exp_group/my_autoencoder -d

This command does not start training and should be run on a PC with a display connected.

Output:

3. Train the model (See the Headless Rendering section if you want to train directly on a server without display)

ae_train exp_group/my_autoencoder
$AE_WORKSPACE_PATH/experiments/exp_group/my_autoencoder/train_figures/training_images_29999.png  

Middle part should show reconstructions of the input object (if all black, set higher bootstrap_ratio / auxilliary_mask in training config)

4. Create the embedding

ae_embed exp_group/my_autoencoder

Testing

Augmented Autoencoder only

have a look at /auto_pose/test/

Feed one or more object crops from disk into AAE and predict 3D Orientation

python aae_image.py exp_group/my_autoencoder -f /path/to/image/file/or/folder

The same with a webcam input stream

python aae_webcam.py exp_group/my_autoencoder

Multi-object RGB-based 6D Object Detection from a Webcam stream

Option 1: Train a RetinaNet Model from https://github.com/fizyr/keras-retinanet

adapt $AE_WORKSPACE_PATH/eval_cfg/aae_retina_webcam.cfg

python auto_pose/test/aae_retina_webcam_pose.py -test_config aae_retina_webcam.cfg -vis

Option 2: Using the Google Detection API with Fixes

Train a 2D detector following https://github.com/naisy/train_ssd_mobilenet
adapt /auto_pose/test/googledet_utils/googledet_config.yml

python auto_pose/test/aae_googledet_webcam_multi.py exp_group/my_autoencoder exp_group/my_autoencoder2 exp_group/my_autoencoder3

Evaluate a model

Reproducing and visualizing BOP challenge results

Here are AAE models trained the BOP datasets with codebooks of all 108 objects:

Download

Extract it to $AE_WORKSPACE_PATH/experiments

Also get precomputed MaskRCNN predictions for all BOP datasets:

Download

Open the bop20 evaluation configs, e.g. auto_pose/ae/cfg_m3vision/m3_config_lmo.cfg, and point the path_to_masks parameter to the downloaded maskrcnn predictions.

You can visualize (-vis option) and reproduce BOP results by running:

python auto_pose/m3_interface/compute_bop_results_m3.py auto_pose/ae/cfg_m3vision/m3_config_lmo.cfg 
                                                     --eval_name test 
                                                     --dataset_name=lmo 
                                                     --datasets_path=/path/to/bop/datasets 
                                                     --result_folder /folder/to/results 
                                                     -vis

Note: You will need the bop_toolkit. I created a package bop_toolkit_lib from it, but you can also just add the required files to sys.path()

Original paper evaluation with T-LESS v1

For the evaluation you will also need https://github.com/thodan/sixd_toolkit + our extensions, see sixd_toolkit_extension/help.txt

Create the evaluation config file

mkdir $AE_WORKSPACE_PATH/cfg_eval/eval_group
cp $AE_WORKSPACE_PATH/cfg_eval/eval_template.cfg $AE_WORKSPACE_PATH/cfg_eval/eval_group/eval_my_autoencoder.cfg
gedit $AE_WORKSPACE_PATH/cfg_eval/eval_group/eval_my_autoencoder.cfg

Evaluate and visualize 6D pose estimation of AAE with ground truth bounding boxes

Set estimate_bbs=False in the evaluation config

ae_eval exp_group/my_autoencoder name_of_evaluation --eval_cfg eval_group/eval_my_autoencoder.cfg
e.g.
ae_eval tless_nobn/obj5 eval_name --eval_cfg tless/5.cfg

Evaluate 6D Object Detection with a 2D Object Detector

Set estimate_bbs=True in the evaluation config

Generate a training dataset for T-Less using detection_utils/generate_sixd_train.py

python detection_utils/generate_sixd_train.py

Train https://github.com/fizyr/keras-retinanet or https://github.com/balancap/SSD-Tensorflow

ae_eval exp_group/my_autoencoder name_of_evaluation --eval_cfg eval_group/eval_my_autoencoder.cfg
e.g.
ae_eval tless_nobn/obj5 eval_name --eval_cfg tless/5.cfg

Config file parameters

[Paths]
# Path to the model file. All formats supported by assimp should work. Tested with ply files.
MODEL_PATH: /path/to/my_3d_model.ply
# Path to some background image folder. Should contain a * as a placeholder for the image name.
BACKGROUND_IMAGES_GLOB: /path/to/VOCdevkit/VOC2012/JPEGImages/*.jpg

[Dataset]
#cad or reconst (with texture)
MODEL: reconst
# Height of the AE input layer
H: 128
# Width of the AE input layer
W: 128
# Channels of the AE input layer (default BGR)
C: 3
# Distance from Camera to the object in mm for synthetic training images
RADIUS: 700
# Dimensions of the renderered image, it will be cropped and rescaled to H, W later.
RENDER_DIMS: (720, 540)
# Camera matrix used for rendering and optionally for estimating depth from RGB
K: [1075.65, 0, 720/2, 0, 1073.90, 540/2, 0, 0, 1]
# Vertex scale. Vertices need to be scaled to mm
VERTEX_SCALE: 1
# Antialiasing factor used for rendering
ANTIALIASING: 8
# Padding rendered object images and potentially bounding box detections 
PAD_FACTOR: 1.2
# Near plane
CLIP_NEAR: 10
# Far plane
CLIP_FAR: 10000
# Number of training images rendered uniformly at random from SO(3)
NOOF_TRAINING_IMGS: 10000
# Number of background images that simulate clutter
NOOF_BG_IMGS: 10000

[Augmentation]
# Using real object masks for occlusion (not really necessary)
REALISTIC_OCCLUSION: False
# Maximum relative translational offset of input views, sampled uniformly  
MAX_REL_OFFSET: 0.20
# Random augmentations at random strengths from imgaug library
CODE: Sequential([
    #Sometimes(0.5, PerspectiveTransform(0.05)),
    #Sometimes(0.5, CropAndPad(percent=(-0.05, 0.1))),
    Sometimes(0.5, Affine(scale=(1.0, 1.2))),
    Sometimes(0.5, CoarseDropout( p=0.2, size_percent=0.05) ),
    Sometimes(0.5, GaussianBlur(1.2*np.random.rand())),
    Sometimes(0.5, Add((-25, 25), per_channel=0.3)),
    Sometimes(0.3, Invert(0.2, per_channel=True)),
    Sometimes(0.5, Multiply((0.6, 1.4), per_channel=0.5)),
    Sometimes(0.5, Multiply((0.6, 1.4))),
    Sometimes(0.5, ContrastNormalization((0.5, 2.2), per_channel=0.3))
    ], random_order=False)

[Embedding]
# for every rotation save rendered bounding box diagonal for projective distance estimation
EMBED_BB: True
# minimum number of equidistant views rendered from a view-sphere
MIN_N_VIEWS: 2562
# for each view generate a number of in-plane rotations to cover full SO(3)
NUM_CYCLO: 36

[Network]
# additionally reconstruct segmentation mask, helps when AAE decodes pure blackness
AUXILIARY_MASK: False
# Variational Autoencoder, factor in front of KL-Divergence loss
VARIATIONAL: 0
# Reconstruction error metric
LOSS: L2
# Only evaluate 1/BOOTSTRAP_RATIO of the pixels with highest errors, produces sharper edges
BOOTSTRAP_RATIO: 4
# regularize norm of latent variables
NORM_REGULARIZE: 0
# size of the latent space
LATENT_SPACE_SIZE: 128
# number of filters in every Conv layer (decoder mirrored)
NUM_FILTER: [128, 256, 512, 512]
# stride for encoder layers, nearest neighbor upsampling for decoder layers
STRIDES: [2, 2, 2, 2]
# filter size encoder
KERNEL_SIZE_ENCODER: 5
# filter size decoder
KERNEL_SIZE_DECODER: 5


[Training]
OPTIMIZER: Adam
NUM_ITER: 30000
BATCH_SIZE: 64
LEARNING_RATE: 1e-4
SAVE_INTERVAL: 5000

[Queue]
# number of threads for producing augmented training data (online)
NUM_THREADS: 10
# preprocessing queue size in number of batches
QUEUE_SIZE: 50

More Repositories

1

stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
Python
8,960
star
2

BlenderProc

A procedural Blender pipeline for photorealistic training image generation
Python
2,083
star
3

rl-baselines3-zoo

A training framework for Stable Baselines3 reinforcement learning agents, with hyperparameter optimization and pre-trained agents included.
Python
2,034
star
4

3DObjectTracking

Algorithms and Publications on 3D Object Tracking
C++
710
star
5

SingleViewReconstruction

Official Code: 3D Scene Reconstruction from a Single Viewport
Python
258
star
6

RAFCON

RAFCON (RMC advanced flow control) uses hierarchical state machines, featuring concurrent state execution, to represent robot programs. It ships with a graphical user interface supporting the creation of state machines and contains IDE like debugging mechanisms. Alternatively, state machines can programmatically be generated using RAFCON's API.
Python
180
star
7

rl-trained-agents

A collection of pre-trained RL agents using Stable Baselines3
Python
102
star
8

oaisys

Python
52
star
9

granite

C++
49
star
10

instr

code of paper โ€žUnknown Object Segmentation from Stereo Imagesโ€œ, IROS 2021
Python
43
star
11

curvature

Official Code: Estimating Model Uncertainty of Neural Networks in Sparse Information Form, ICML2020.
Python
23
star
12

UMF

Python
17
star
13

amp

Point-to-point motion planning library for articulated robots.
C++
13
star
14

DistinctNet

"What's This?" - Learning to Segment Unknown Objects from Manipulation Sequences
Python
11
star
15

GRACE

Graph Assembly processing networks for robotic assembly sequence planning and feasibility learning
Python
10
star
16

rosmc

ROS Mission Control (ROSMC) -- A high-level mission designining and monitoring tool with intuitive graphical interfaces
Python
9
star
17

python-jsonconversion

Convert arbitrary Python objects into JSON strings and back.
Python
8
star
18

moegplib

Official Code: Trust Your Robots! Predictive Uncertainty Estimation of Neural Networks with Sparse Gaussian Processes
Python
8
star
19

ExReNet

Learning to Localize in New Environments from Synthetic Training Data
Python
7
star
20

RAFCON-ros-state-machines

RAFCON state machine examples using the ROS middleware
Python
5
star
21

python-yaml-configuration

Python
4
star
22

BayesSim2Real

Source code for IROS 2022 paper: Bayesian Active Learning for Sim-to-Real Robotic Perception.
Python
4
star
23

TendonDrivenContinuum

4
star
24

multicam_dataset_reader

C++
2
star
25

stios-utils

utility functions for the [Stereo Instance on Surfaces (STOIS) dataset
Python
2
star
26

SemanticSingleViewReconstruction

C++
1
star
27

rafcon-task-planner-plugin

A Plugin for RAFCON to interface arbitrary PDDL Planner.
Python
1
star
28

RECALL

Code and image database for IROS2022 paper "RECALL: Rehearsal-free Continual Learning for Object Classification". A algorithm to learn new object categories on the fly without forgetting the old ones and without the need to save previous images.
Python
1
star