Mega-NeRF
This repository contains the code needed to train Mega-NeRF models and generate the sparse voxel octrees used by the Mega-NeRF-Dynamic viewer.
The codebase for the Mega-NeRF-Dynamic viewer can be found here.
Note: This is a preliminary release and there may still be outstanding bugs.
Citation
@InProceedings{Turki_2022_CVPR,
author = {Turki, Haithem and Ramanan, Deva and Satyanarayanan, Mahadev},
title = {Mega-NERF: Scalable Construction of Large-Scale NeRFs for Virtual Fly-Throughs},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2022},
pages = {12922-12931}
}
Demo
Setup
conda env create -f environment.yml
conda activate mega-nerf
The codebase has been mainly tested against CUDA >= 11.1 and V100/2080 Ti/3090 Ti GPUs. 1080 Ti GPUs should work as well although training will be much slower.
Pretrained Models
Trained with 8 submodules (to compare with main paper)
- Rubble: model cluster masks
- Building: model cluster masks
- Quad: model cluster masks
- Residence: model cluster masks
- Sci-Art: model cluster masks
- Campus: model cluster masks
Larger models (trained with 25 submodules with 512 channels each)
- Rubble: model cluster masks
- Building: model cluster masks
- Residence: model cluster masks
- Sci-Art: model cluster masks
Data
Mill 19
UrbanScene 3D
- Download the raw photo collections from the UrbanScene3D dataset
- Download the refined camera poses for one of the scenes below:
- Run
python scripts/copy_images.py --image_path $RAW_PHOTO_PATH --dataset_path $CAMERA_POSE_PATH
Quad 6k Dataset
- Download the raw photo collections from here.
- Download the refined camera poses
- Run
python scripts/copy_images.py --image_path $RAW_PHOTO_PATH --dataset_path $CAMERA_POSE_PATH
Custom Data
We strongly recommend using PixSFM to refine camera poses for your own datasets. Mega-NeRF also assumes that the dataset is properly geo-referenced/aligned such that the second value of its ray_altitude_range
parameter properly corresponds to ground level. If using PixSFM/COLMAP the model_aligner utility might be helpful, with Manhattan world alignment being a possible fallback option if GPS alignment is not possible. We provide a script to convert from PixSFM/COLMAP output to the format Mega-NeRF expects.
If creating a custom dataset manually, the expected directory structure is:
- /coordinates.pt: Torch file that should contain the following keys:
- 'origin_drb': Origin of scene in real-world units
- 'pose_scale_factor': Scale factor mapping from real-world unit (ie: meters) to [-1, 1] range
- '/{val|train}/rgbs/': JPEG or PNG images
- '/{val|train}/metadata/': Image-specific image metadata saved as a torch file. Each image should have a corresponding metadata file with the following file format: {rgb_stem}.pt. Each metadata file should contain the following keys:
- 'W': Image width
- 'H': Image height
- 'intrinsics': Image intrinsics in the following form: [fx, fy, cx, cy]
- 'c2w': Camera pose. 3x3 camera matrix with the convention used in the original NeRF repo, ie: x: down, y: right, z: backwards, followed by the following transformation:
torch.cat([camera_in_drb[:, 1:2], -camera_in_drb[:, :1], camera_in_drb[:, 2:4]], -1)
Training
- Generate the training partitions for each submodule:
python scripts/create_cluster_masks.py --config configs/mega-nerf/${DATASET_NAME}.yml --dataset_path $DATASET_PATH --output $MASK_PATH --grid_dim $GRID_X $GRID_Y
- Note: this can be run across multiple GPUs by instead running
python -m torch.distributed.run --standalone --nnodes=1 --nproc_per_node $NUM_GPUS --max_restarts 0 scripts/create_cluster_masks.py <args>
- Note: this can be run across multiple GPUs by instead running
- Train each submodule:
python mega_nerf/train.py --config_file configs/mega-nerf/${DATASET_NAME}.yml --exp_name $EXP_PATH --dataset_path $DATASET_PATH --chunk_paths $SCRATCH_PATH --cluster_mask_path ${MASK_PATH}/${SUBMODULE_INDEX}
- Note: training with against full scale data will write hundreds of GBs / several TBs of shuffled data to disk. You can downsample the training data using
train_scale_factor
option. - Note: we provide a utility script based on parscript to start multiple training jobs in parallel. It can run through the following command:
CONFIG_FILE=configs/mega-nerf/${DATASET_NAME}.yaml EXP_PREFIX=$EXP_PATH DATASET_PATH=$DATASET_PATH CHUNK_PREFIX=$SCRATCH_PATH MASK_PATH=$MASK_PATH python -m parscript.dispatcher parscripts/run_8.txt -g $NUM_GPUS
- Note: training with against full scale data will write hundreds of GBs / several TBs of shuffled data to disk. You can downsample the training data using
- Merge the trained submodules into a unified Mega-NeRF model:
python scripts/merge_submodules.py --config_file configs/mega-nerf/${DATASET_NAME}.yaml --ckpt_prefix ${EXP_PREFIX}- --centroid_path ${MASK_PATH}/params.pt --output $MERGED_OUTPUT
Evaluation
Single-GPU evaluation: python mega_nerf/eval.py --config_file configs/nerf/${DATASET_NAME}.yaml --exp_name $EXP_NAME --dataset_path $DATASET_PATH --container_path $MERGED_OUTPUT
Multi-GPU evaluation: python -m torch.distributed.run --standalone --nnodes=1 --nproc_per_node $NUM_GPUS mega_nerf/eval.py --config_file configs/nerf/${DATASET_NAME}.yaml --exp_name $EXP_NAME --dataset_path $DATASET_PATH --container_path $MERGED_OUTPUT
Octree Extraction (for use by Mega-NeRF-Dynamic viewer)
python scripts/create_octree.py --config configs/mega-nerf/${DATASET_NAME}.yaml --dataset_path $DATASET_PATH --container_path $MERGED_OUTPUT --output $OCTREE_PATH
Acknowledgements
Large parts of this codebase are based on existing work in the nerf_pl, NeRF++, and Plenoctree repositories. We use svox to serialize our sparse voxel octrees and the generated structures should be largely compatible with that codebase.