NICE-SLAM: Neural Implicit Scalable Encoding for SLAM
Zihan Zhu*
·
Songyou Peng*
·
Viktor Larsson
·
Weiwei Xu
·
Hujun Bao
Zhaopeng Cui
·
Martin R. Oswald
·
Marc Pollefeys
(* Equal Contribution)
CVPR 2022
Paper | Video | Project Page
NICE-SLAM produces accurate dense geometry and camera tracking on large-scale indoor scenes.
(The black / red lines are the ground truth / predicted camera trajectory)
Table of Contents
Installation
First you have to make sure that you have all dependencies in place. The simplest way to do so, is to use anaconda.
You can create an anaconda environment called nice-slam
. For linux, you need to install libopenexr-dev before creating the environment.
sudo apt-get install libopenexr-dev
conda env create -f environment.yaml
conda activate nice-slam
Visualizing NICE-SLAM Results
We provide the results of NICE-SLAM ready for download. You can run our interactive visualizer as following.
Self-captured Apartment
To visualize our results on the self-captured apartment, as shown in the teaser:
bash scripts/download_vis_apartment.sh
python visualizer.py configs/Apartment/apartment.yaml --output output/vis/Apartment
Note for users from China: If you encounter slow speed in downloading, check in all the scripts/download_*.sh
scripts, where we also provide the 和彩云 links for you to download manually.
ScanNet
bash scripts/download_vis_scene0000.sh
python visualizer.py configs/ScanNet/scene0000.yaml --output output/vis/scannet/scans/scene0000_00
You can find the results of NICE-SLAM on other scenes in ScanNet here.
Replica
bash scripts/download_vis_room1.sh
python visualizer.py configs/Replica/room1.yaml --output output/vis/Replica/room1
Interactive Visualizer Usage
The black trajectory indicates the ground truth trajectory, abd the red is trajectory of NICE-SLAM.
- Press
Ctrl+0
for grey mesh rendering. - Press
Ctrl+1
for textured mesh rendering. - Press
Ctrl+9
for normal rendering. - Press
L
to turn off/on lighting.
Command line arguments
--output $OUTPUT_FOLDER
output folder (overwrite the output folder in the config file)--input_folder $INPUT_FOLDER
input folder (overwrite the input folder in the config file)--save_rendering
save rendering video tovis.mp4
in the output folder--no_gt_traj
do not show ground truth trajectory--imap
visualize results of iMAP*--vis_input_frame
opens up a viewer to show input frames. Note: you need to download the dataset first. See the Run section below.
Demo
Here you can run NICE-SLAM yourself on a short ScanNet sequence with 500 frames.
First, download the demo data as below and the data is saved into the ./Datasets/Demo
folder.
bash scripts/download_demo.sh
Next, run NICE-SLAM. It takes a few minutes with ~5G GPU memory.
python -W ignore run.py configs/Demo/demo.yaml
Finally, run the following command to visualize.
python visualizer.py configs/Demo/demo.yaml
NOTE: This is for demonstration only, its configuration/performance may be different from our paper.
Run
Self-captured Apartment
Download the data as below and the data is saved into the ./Datasets/Apartment
folder.
bash scripts/download_apartment.sh
Next, run NICE-SLAM:
python -W ignore run.py configs/Apartment/apartment.yaml
ScanNet
Please follow the data downloading procedure on ScanNet website, and extract color/depth frames from the .sens
file using this code.
[Directory structure of ScanNet (click to expand)]
DATAROOT is ./Datasets
by default. If a sequence (sceneXXXX_XX
) is stored in other places, please change the input_folder
path in the config file or in the command line.
DATAROOT
└── scannet
└── scans
└── scene0000_00
└── frames
├── color
│ ├── 0.jpg
│ ├── 1.jpg
│ ├── ...
│ └── ...
├── depth
│ ├── 0.png
│ ├── 1.png
│ ├── ...
│ └── ...
├── intrinsic
└── pose
├── 0.txt
├── 1.txt
├── ...
└── ...
Once the data is downloaded and set up properly, you can run NICE-SLAM:
python -W ignore run.py configs/ScanNet/scene0000.yaml
Replica
Download the data as below and the data is saved into the ./Datasets/Replica
folder. Note that the Replica data is generated by the authors of iMAP, so please cite iMAP if you use the data.
bash scripts/download_replica.sh
and you can run NICE-SLAM:
python -W ignore run.py configs/Replica/room0.yaml
The mesh for evaluation is saved as $OUTPUT_FOLDER/mesh/final_mesh_eval_rec.ply
, where the unseen regions are culled using all frames.
TUM RGB-D
Download the data as below and the data is saved into the ./Datasets/TUM-RGBD
folder
bash scripts/download_tum.sh
Now run NICE-SLAM:
python -W ignore run.py configs/TUM_RGBD/freiburg1_desk.yaml
Co-Fusion
First, download the dataset. This script should download and unpack the data automatically into the ./Datasets/CoFusion
folder.
bash scripts/download_cofusion.sh
Run NICE-SLAM:
python -W ignore run.py configs/CoFusion/room4.yaml
Use your own RGB-D sequence from Kinect Azure
[Details (click to expand)]
-
Please first follow this guide to record a sequence and extract aligned color and depth images. (Remember to use
--align_depth_to_color
forazure_kinect_recorder.py
)DATAROOT is
./Datasets
in default, if a sequence (sceneXX
) is stored in other places, please change the "input_folder" path in the config file or in the command line.DATAROOT └── Own └── scene0 ├── color │ ├── 00000.jpg │ ├── 00001.jpg │ ├── 00002.jpg │ ├── ... │ └── ... ├── config.json ├── depth │ ├── 00000.png │ ├── 00001.png │ ├── 00002.png │ ├── ... │ └── ... └── intrinsic.json
-
Prepare
.yaml
file based on theconfigs/Own/sample.yaml
. Change the camera intrinsics in the config file based onintrinsic.json
. You can also get the intrinsics of the depth camera via other tools such as MATLAB. -
Specify the bound of the scene. If no ground truth camera pose is given, we construct world coordinates on the first frame. The X-axis is from left to right, Y-axis is from down to up, Z-axis is from front to back.
-
Change the
input_folder
path and/or theoutput
path in the config file or the command line. -
Run NICE-SLAM.
python -W ignore run.py configs/Own/sample.yaml
(Optional but highly Recommended) If you don't want to specify the bound of the scene or manually change the config file. You can first run the Redwood tool in Open3D and then run NICE-SLAM. Here we provide steps for the whole pipeline, beginning from recording Azure Kinect videos. (Ubuntu 18.04 and above is recommended.)
- Download the Open3D repository.
bash scripts/download_open3d.sh
- Record and extract frames.
# specify scene ID
sceneid=0
cd 3rdparty/Open3D-0.13.0/examples/python/reconstruction_system/
# record and save to .mkv file
python sensors/azure_kinect_recorder.py --align_depth_to_color --output scene$sceneid.mkv
# extract frames
python sensors/azure_kinect_mkv_reader.py --input scene$sceneid.mkv --output dataset/scene$sceneid
- Run reconstruction.
python run_system.py dataset/scene$sceneid/config.json --make --register --refine --integrate
# back to main folder
cd ../../../../../
- Prepare the config file.
python src/tools/prep_own_data.py --scene_folder 3rdparty/Open3D-0.13.0/examples/python/reconstruction_system/dataset/scene$sceneid --ouput_config configs/Own/scene$sceneid.yaml
- Run NICE-SLAM.
python -W ignore run.py configs/Own/scene$sceneid.yaml
iMAP*
We also provide our re-implementation of iMAP (iMAP*) for use. If you use the code, please cite both the original iMAP paper and NICE-SLAM.
Usage
iMAP* shares a majority part of the code with NICE-SLAM. To run iMAP*, simply use *_imap.yaml
in the config file and also add the argument --imap
in the command line. For example, to run iMAP* on Replica room0:
python -W ignore run.py configs/Replica/room0_imap.yaml --imap
To use our interactive visualizer:
python visualizer.py configs/Replica/room0_imap.yaml --imap
To evaluate ATE:
python src/tools/eval_ate.py configs/Replica/room0_imap.yaml --imap
[Differences between iMAP* and the original iMAP (click to expand)]
Keyframe pose optimization during mapping
We do not optimize the selected keyframes' poses for iMAP*, because optimizing them usually leads to worse performance. One possible reason is that since their keyframes are selected globally, and many of them do not have overlapping regions especially when the scene gets larger. Overlap is a prerequisite for bundle adjustment (BA). For NICE-SLAM, we only select overlapping keyframes within a small window (local BA), which works well in all scenes. You can still turn on the keyframe pose optimization during mapping for iMAP* by enabling BA
in the config file.
Active sampling
We disable the active sampling in iMAP*, because in our experiments we observe that it does not help to improve the performance while brings additional computational overhead.
For the image active sampling, in each iteration the original iMAP uniformly samples 200 pixels in the entire image. Next, they divide this image into an 8x8 grid and calculate the probability distribution from the rendering losses. This means that if the resolution of an image is 1200x680 (Replica), only around 3 pixels are sampled to calculate the distribution for a 150x85 grid patch. This is not too much different from simple uniform sampling. Therefore, during mapping we use the same pixel sampling strategy as NICE-SLAM for iMAP*: uniform sampling, but even 4x more pixels than reported in the iMAP paper.
For the keyframe active sampling, the original iMAP requires rendering depth and color images for all keyframes to get the loss distribution, which is expensive and we again did not find it very helpful. Instead, as done in NICE-SLAM, iMAP* randomly samples keyframes from the keyframe list. We also let iMAP* optimize for 4x more iterations than NICE-SLAM, but their performance is still inferior.
Keyframe selection
For fair comparison, we use the same keyframe selection method in iMAP* as in NICE-SLAM: add one keyframe to the keyframe list every 50 frames.
Evaluation
Average Trajectory Error
To evaluate the average trajectory error. Run the command below with the corresponding config file:
python src/tools/eval_ate.py configs/Replica/room0.yaml
Reconstruction Error
To evaluate the reconstruction error, first download the ground truth Replica meshes where unseen region have been culled.
bash scripts/download_cull_replica_mesh.sh
Then run the command below (same for NICE-SLAM and iMAP*). The 2D metric requires rendering of 1000 depth images, which will take some time (~9 minutes). Use -2d
to enable 2D metric. Use -3d
to enable 3D metric.
# assign any output_folder and gt mesh you like, here is just an example
OUTPUT_FOLDER=output/Replica/room0
GT_MESH=cull_replica_mesh/room0.ply
python src/tools/eval_recon.py --rec_mesh $OUTPUT_FOLDER/mesh/final_mesh_eval_rec.ply --gt_mesh $GT_MESH -2d -3d
We also provide code to cull the mesh given camera poses. Here we take culling of ground truth mesh of Replica room0 as an example.
python src/tools/cull_mesh.py --input_mesh Datasets/Replica/room0_mesh.ply --traj Datasets/Replica/room0/traj.txt --output_mesh cull_replica_mesh/room0.ply
[For iMAP* evaluation (click to expand)]
As discussed in many recent papers, e.g. UNISURF/VolSDF/NeuS, manual thresholding the volume density during marching cubes might be needed. Moreover, we find out there exist scaling differences, possibly because of the reason discussed in NeuS. Therefore, ICP with scale is needed. You can use the ICP tool in CloudCompare with default configuration with scaling enabled.
Acknowledgement
We adapted some codes from some awesome repositories including convolutional_occupancy_networks, nerf-pytorch, lietorch, and DIST-Renderer. Thanks for making codes public available. We also thank Edgar Sucar for allowing us to make the Replica Dataset available.
Citation
If you find our code or paper useful, please cite
@inproceedings{Zhu2022CVPR,
author = {Zhu, Zihan and Peng, Songyou and Larsson, Viktor and Xu, Weiwei and Bao, Hujun and Cui, Zhaopeng and Oswald, Martin R. and Pollefeys, Marc},
title = {NICE-SLAM: Neural Implicit Scalable Encoding for SLAM},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2022}
}
Contact
Contact Zihan Zhu and Songyou Peng for questions, comments and reporting bugs.