• Stars
    star
    220
  • Rank 176,790 (Top 4 %)
  • Language
    Python
  • License
    Other
  • Created about 4 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

[CVPR 2020] Estimation of the visible and hidden traversable space from a single color image

Footprints and Free Space from a Single Color Image

Jamie Watson, Michael Firman, Aron Monszpart and Gabriel J. Brostow โ€“ CVPR 2020 (Oral presentation)

[Link to Paper]

We introduce Footprints, a method for estimating the visible and hidden traversable space from a single RGB image

5 minute CVPR presentation video link

Understanding the shape of a scene from a single color image is a formidable computer vision task. Most methods aim to predict the geometry of surfaces that are visible to the camera, which is of limited use when planning paths for robots or augmented reality agents. Models which predict beyond the line of sight often parameterize the scene with voxels or meshes, which can be expensive to use in machine learning frameworks.

Our method predicts the hidden ground geometry and extent from a single image:

Web version of figure 1

Our predictions enable virtual characters to more realistically explore their environment.

Baseline exploration Our exploration
Baseline: The virtual character can only explore the ground visible to the camera Ours: The penguin can explore both the visible and hidden ground

โš™๏ธ Setup

Our code and models were developed with PyTorch 1.3.1. The environment.yml and requirements.txt list our dependencies.

We recommend installing and activating a new conda environment from these files with:

conda env create -f environment.yml -n footprints
conda activate footprints

๐Ÿ–ผ๏ธ Prediction

We provide three pretrained models:

  • kitti, a model trained on the KITTI driving dataset with a resolution of 192x640,
  • matterport, a model trained on the indoor Matterport dataset with a resolution of 512x640, and
  • handheld, a model trained on our own handheld stereo footage with a resolution of 256x448.

We provide code to make predictions for a single image, or a whole folder of images, using any of these pretrained models. Models will be automatically downloaded when required, and input images will be automatically resized to the correct input resolution for each model.

Single image prediction:

python -m footprints.predict_simple --image test_data/cyclist.jpg --model kitti

Multi image prediction:

python -m footprints.predict_simple --image test_data --model handheld

By default, .npy predictions and .jpg visualisations will be saved to the predictions folder; this can be changed with the --save_dir flag.

๐Ÿš‹ Training

To train a model you will need to download raw KITTI and Matterport data. Edit the dataset field in paths.yaml to point to the downloaded raw data paths.

For details on downloading KITTI, see Monodepth2.

You will also need per-image training data generated from the video sequences:

  • visible ground segmentations
  • hidden ground depths
  • depth maps
  • etc.

Our versions of these can be found HERE. Download these and edit the training_data field of paths.yaml to point to them.

After this your paths.yaml should look like:

# Contents of paths.yaml
  kitti:
    dataset: <your_raw_KITTI_path>
    training_data: <downloaded_KITTI_training_data>

  matterport:
    dataset: <your_raw_matterport_path>
    training_data: <downloaded_matterport_training_data>

  ...

Now you have everything you need to train!

Train a KITTI model using:

CUDA_VISIBLE_DEVICES=X python -m footprints.main \
    --training_dataset kitti \
    --log_path <your_log_path> \
    --model_name <your_model_name>

and a Matterport model using:

CUDA_VISIBLE_DEVICES=X python -m footprints.main \
    --training_dataset matterport \
    --height 512  --width 640 \
    --log_path <your_log_path> \
    --batch_size 8 \
    --model_name <your_model_name>

Training data generation

If you want to generate your own training data instead of using ours (e.g. you want to try a better ground segmentation algorithm, or more accurate camera poses) then you can!

There are several key elements of our training data - each can be swapped out for your own.

Visible depths

For KITTI we used PSMNet to generate disparity maps for stereo pairs. These are inside stereo_matching_disps, and are used to generate training labels. These are converted to depth maps using the known focal length and baseline. Matterport provides these.

Camera poses

For KITTI we used ORBSLAMv2 to generate camera poses, which are stored as npys inside the poses folder. These are used to reproject between cameras. Matterport provides these.

Ground segmentations

For both Matterport and KITTI we trained a segmentation network to classify ground pixels in an image. We provide training code for this inside footprints/preprocessing/segmentation. These are stored inside the ground_seg folder as npys and are unthresholded (i.e. raw sigmoid output).

Optical flow

For KITTI, we identify moving objects by comparing induced flow to optical flow. Our provided optical flow estimates come from LiteFlowNet, and are inside the optical_flow folder.

Hidden ground depths

To compute hidden depths (i.e. the depth to each visible and occluded ground pixel) we use camera poses, depth maps and ground segmentations. These can be generated using (expects a GPU to be available):

CUDA_VISIBLE_DEVICES=X  python -m \
    footprints.preprocessing.ground_truth_generation.ground_truth_generator \
    --type hidden_depths  --data_type kitti --textfile splits/kitti/train.txt

Make sure to run this on both train.txt and val.txt. Warning - this will take a while, so to speed things up you can do this in parallel by running multiple processes and adding the flags --start_idx X and --end_idx Y to split the textfile into smaller chunks.

Note that if you have already downloaded our training data, running this command will overwrite it unless you set --save_folder_name <my_save_folder>. To actually train using this, you can manually set the path inside footprints/datasets/<kitti or matterport dataset.py>, or rename your new data to the required folder name, e.g. hidden_depths.

Moving object masks

To compute moving objects masks we use optical flow, depth, ground segmentations and camera poses. These can be generated by amending the above command with --type moving_objects. This is only valid for KITTI.

Depth masks

Depth masks are estimates of the untraversable pixels in the image, and are computed using depth maps and ground segmentations. To generate these change the above command to use --type depth_masks.

โณ Evaluation

To generate predictions for evaluation using a trained model, run:

CUDA_VISIBLE_DEVICES=X python -m footprints.main \
    --mode inference \
    --load_path <your_model_path, e.g. logs/model/models/weights_9> \
    --inference_data_type <kitti or matterport> \
    --height <192 for kitti, 512 for matterport> \
    --width 640

By default this will save to <load_path>/<data_type>_predictions, but can be specified with --inference_save_path.

To evaluate a folder of predictions, run:

python -m footprints.evaluation.evaluate_model \
    --datatype kitti \
    --metric iou \
    --predictions <path/to/predictions/folder>

The following options are provided:

  • --datatype can be either kitti or matterport.
  • --metric can be iou (both kitti and matterport) or depth (for matterport)

If necessary, the ground truth files will be automatically downloaded and placed in the ground_truth_files folder.

You can also download the KITTI annotations directly from here. For each image, there are 3 .png files:

  • XXXXX_ground.png contains the mask of the boundary of visible and hidden ground, ignoring all objects
  • XXXXX_objects.png contains the mask of the ground space taken up by objects (the footprints)
  • XXXXX_combined.png contains the full evaluation mask - the visible and hidden ground, taking into account object footprints

Method and further results

We learn from stereo video sequences, using camera poses, per-frame depth and semantic segmentation to form training data, which is used to supervise an image-to-image network.

Video version of figure 3

Results on mobile phone footage:

Rig results Rig results

More results on the KITTI dataset:

KITTI results

โœ๏ธ ๐Ÿ“„ Citation

If you find our work useful or interesting, please consider citing our paper:

@inproceedings{watson-2020-footprints,
 title   = {Footprints and Free Space from a Single Color Image},
 author  = {Jamie Watson and
            Michael Firman and
            Aron Monszpart and
            Gabriel J. Brostow},
 booktitle = {Computer Vision and Pattern Recognition ({CVPR})},
 year = {2020}
}

๐Ÿ‘ฉโ€โš–๏ธ License

Copyright ยฉ Niantic, Inc. 2020. Patent Pending. All rights reserved. Please see the license file for terms.

More Repositories

1

monodepth2

[ICCV 2019] Monocular depth estimation from a single image
Jupyter Notebook
4,013
star
2

simplerecon

[ECCV 2022] SimpleRecon: 3D Reconstruction Without 3D Convolutions
Python
1,252
star
3

manydepth

[CVPR 2021] Self-supervised depth estimation from short sequences
Python
600
star
4

stereo-from-mono

[ECCV 2020] Learning stereo from single images using monocular depth estimation networks
Python
379
star
5

mickey

[CVPR 2024 - Oral] Matching 2D Images in 3D: Metric Relative Pose from Metric Correspondences
Python
335
star
6

ace

[CVPR 2023 - Highlight] Accelerated Coordinate Encoding (ACE): Learning to Relocalize in Minutes using RGB and Poses
Python
328
star
7

diffusionerf

[CVPR 2023] DiffusioNeRF: Regularizing Neural Radiance Fields with Denoising Diffusion Models
Python
281
star
8

wavelet-monodepth

[CVPR 2021] Monocular depth estimation using wavelets for efficiency
Jupyter Notebook
221
star
9

map-free-reloc

[ECCV 2022] Map-free Visual Relocalization: Metric Pose Relative to a Single Image
Python
197
star
10

depth-hints

[ICCV 2019] Depth Hints are complementary depth suggestions which improve monocular depth estimation algorithms trained from stereo pairs
Jupyter Notebook
183
star
11

acezero

ACE0 is a learning-based structure-from-motion approach that estimates camera parameters of sets of images by learning a multi-view consistent, implicit scene representation.
171
star
12

nerf-object-removal

[CVPR 2023] Removing Objects From Neural Radiance Fields
Python
95
star
13

marepo

[CVPR 2024 Highlight] Map-Relative Pose Regression for Visual Re-Localization
Python
79
star
14

scoring-without-correspondences

[CVPR 2023] Two-view Geometry Scoring Without Correspondences
Python
78
star
15

implicit-depth

[CVPR 2023] Virtual Occlusions Through Implicit Depth
Python
72
star
16

rectified-features

[ECCV 2020] Single image depth prediction allows us to rectify planar surfaces in images and extract view-invariant local features for better feature matching
63
star
17

image-box-overlap

[ECCV 2020] Training neural networks to predict visual overlap of images, through interpretable non-metric box embeddings
Jupyter Notebook
53
star
18

panoptic-forecasting

[CVPR 2021] Forecasting the panoptic segmentation of future video frames
Python
47
star
19

relpose-gnn

[3DV21] Visual Camera Re-Localization Using Graph Neural Networks and Relative Pose Supervision, M. Tรผrkoวงlu et al.
Python
37
star
20

modron

Modron - Cloud security compliance
JavaScript
33
star
21

time-repeatability

[ICRA 2020] Learning to Predict Repeatability of Interest Points
6
star
22

nianticlabs.github.io

HTML
4
star
23

metagame-balance

[AAMAS 2023] Bilevel Entropy based Mechanism Design for Balancing Meta in Video Games
Python
4
star