SPIn-NeRF: Multiview Segmentation and Perceptual Inpainting with Neural Radiance Fields
Project | Paper | YouTube | Dataset
Pytorch implementation of SPIn-NeRF. SPIn-NeRF leverages 2D priors from image inpainters, and enables view-consistent inpainting of NeRFs.
Quick Start
Dependencies
After installing Pytorch according to your CUDA version, install the rest of the dependencies:
pip install -r requirements.txt
Also, install LaMa dependencies with the following:
pip install -r lama/requirements.txt
You will also need COLMAP installed to compute poses if you want to run on your data.
Dataset preparation
Download the zip files of the dataset from here. Extract them under /data
.
Here, we provide information for running the statue scene. For other scenes, a similar approach with potentially different factor
can be done.
Extract statue.zip
under /data
. This can be done with unzip statue.zip -d data
. You might need to install unzip with sudo apt-get install unzip
.
If you want to use your own data, make sure that you put your data in a folder under data
with the following format (Note that labels under statue/images_2/label
are 1
where we need inpainting, and 0
otherwise):
statue
├── images
│ ├── IMG_2707.jpg
│ ├── IMG_2708.jpg
│ ├── ...
│ └── IMG_2736.jpg
└── images_2
├── IMG_2707.png
├── IMG_2708.png
├── ...
├── IMG_2736.png
└── label
├── IMG_2707.png
├── IMG_2708.png
├── ...
└── IMG_2736.png
where in this example, we want to use --factor 2
for the images to use 2x downsized images for the fitting, thus we have put 2x downsized images under images_2
. If your original images are larger, put the original images under images
, and the Nx downsized images under images_N
, where N is chosen based on your GPU availabitlity. Also, make sure to obtain camera parameters using COLMAP. This can be done with the following command:
python imgs2poses.py <your_datadir>
For example, for the sample statue
dataset, the camera parameters can be obtained as python imgs2poses.py data/statue
. Note that for this specific dataset, we have already provided the camera parameters and you can skip running COLMAP.
Running an initial NeRF for getting the depths
First, use the following command to render disparities from the training views. This can be done with the following:
rm -r LaMa_test_images/*
rm -r output/label/*
python DS_NeRF/run_nerf.py --config DS_NeRF/configs/config.txt --render_factor 1 --prepare --i_weight 1000000000 --i_video 1000000000 --i_feat 4000 --N_iters 4001 --expname statue --datadir ./data/statue --factor 2 --N_gt 0
After this, rendered disparities (inverse depths) are ready at lama/LaMa_test_images
, with their corresponding labels at lama/LaMa_test_images/label
.
Running LaMa to generate geometry and appearance guidance
First, let's run LaMa to generate depth priors:
cd lama
Now, make sure to follow the LaMa instructions for downloading the big-lama model.
export TORCH_HOME=$(pwd) && export PYTHONPATH=$(pwd)
python bin/predict.py refine=True model.path=$(pwd)/big-lama indir=$(pwd)/LaMa_test_images outdir=$(pwd)/output
Now, the inpainted disparities are ready at lama/output/label
. Copy the images and put the under data/statue/images_2/depth
. It can be done with the following:
dataset=statue
factor=2
rm -r ../data/$dataset/images_$factor/depth
mkdir ../data/$dataset/images_$factor/depth
cp ./output/label/*.png ../data/$dataset/images_$factor/depth
Now, let's generate the inpainted RGB images:
dataset=statue
factor=2
rm -r LaMa_test_images/*
rm -r output/label/*
cp ../data/$dataset/images_$factor/*.png LaMa_test_images
mkdir LaMa_test_images/label
cp ../data/$dataset/images_$factor/label/*.png LaMa_test_images/label
python bin/predict.py refine=True model.path=$(pwd)/big-lama indir=$(pwd)/LaMa_test_images outdir=$(pwd)/output
rm -r ../data/$dataset/images_$factor/lama_images
mkdir ../data/$dataset/images_$factor/lama_images
cp ../data/$dataset/images_$factor/*.png ../data/$dataset/images_$factor/lama_images
cp ./output/label/*.png ../data/$dataset/images_$factor/lama_images
The inpainted RGB images are now ready under lama/output/label
, and have been copied to data/statue/images_2/lama_images
.
statue
├── colmap_depth.npy
├── images
│ ├── IMG_2707.jpg
│ ├── ...
│ └── IMG_2736.jpg
├── images_2
│ ├── depth
│ │ ├── img000.png
│ │ ├── ...
│ │ └── img028.png
│ ├── IMG_2707.png
│ ├── IMG_2708.png
│ ├── ...
│ ├── IMG_2736.png
│ ├── label
│ │ ├── IMG_2707.png
│ │ ├── ...
│ │ ├── IMG_2736.png
│ └── lama_images
│ ├── IMG_2707.png
│ ├── ...
│ └── IMG_2736.png
└── sparse
Let's move back to the main directory by cd ..
.
Running multiview inpainter
Now, using the following command, the optimization of the final inpainted NeRF will be started. A video of the inpainted NeRF will be saved every i_video
iterations. The fitting will be done for N_iters
iterations. A sample rendering from a random view point is saved to /test_renders
every i_feat
iterations, which can be used for early sanity checks and hyper-parameter tunings.
python DS_NeRF/run_nerf.py --config DS_NeRF/configs/config.txt --i_feat 200 --lpips --i_weight 1000000000000 --i_video 1000 --N_iters 10001 --expname statue --datadir ./data/statue --N_gt 0 --factor $factor
Note that our experiments were done on Nvidia A6000 GPUs. In case of running on GPUs with lower memory, you might get out-of-memory errors. To prevent that, please try increasing the arguments --lpips_render_factor
and --patch_len_factor
, or reducing --lpips_batch_size
.
Notes on mask dilation
Please note that as mentioned in the paper, the masks are dilated by default with a 5x5 kernel for 5 iterations to ensure that all of the object is masked, and that the effects of the shadow of the unwanted objects on the scene is reduced. If you wish to alter the dilation, first, you need to change the dilations applied by the LaMa model to generate the inpaintings under lama/saicinpainting/evaluation/refinement.py
at the following line:
tmp = cv2.dilate(tmp.cpu().numpy().astype('uint8'), np.ones((5, 5), np.uint8), iterations=5)
Then, you also need to change the LLFF loader to load the masks with proper dilations applied to them under DS_NeRF/load_llff.py
. In this file, the following line is responsible for the dilations:
msk = cv2.dilate(msk, np.ones((5, 5), np.uint8), iterations=5)
BibTeX
If you find SPIn-NeRF useful in your work, please consider citing it:
@inproceedings{spinnerf,
title={{SPIn-NeRF}: Multiview Segmentation and Perceptual Inpainting with Neural Radiance Fields},
author={Ashkan Mirzaei and Tristan Aumentado-Armstrong and Konstantinos G. Derpanis and Jonathan Kelly and Marcus A. Brubaker and Igor Gilitschenski and Alex Levinshtein},
year={2023},
booktitle={CVPR},
}