• Stars
    star
    1,417
  • Rank 32,060 (Top 0.7 %)
  • Language
    Python
  • License
    Other
  • Created 7 months ago
  • Updated about 1 month ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

[CVPR 2024] 4K4D: Real-Time 4D View Synthesis at 4K Resolution

4K4D: Real-Time 4D View Synthesis at 4K Resolution

Paper | Project Page | arXiv | Pretrained Models | Minimal Datasets | EasyVolcap

Repository for our paper 4K4D: Real-Time 4D View Synthesis at 4K Resolution.

python star license

News:

  • 23.12.18: The backbone of 4K4D, our volumetric video framework EasyVolcap has been open-sourced!
  • 23.12.18: The inference code for 4K4D has also been open-sourced along with documentations.
4k4d_dance3_demo.mp4
4k4d_0008_01_demo.mp4
4k4d_0013_01_demo.mp4

For more high-resolution results and more real-time demos, please visit our project page.

Installation

Please refer to the installation guide of EasyVolcap for basic environment setup.

After setting up the environment, you should execute the installation command in this repository's root directory to register the modules:

# Run this inside the 4K4D repo
pip install -e . --no-build-isolation --no-deps

Note that it's not necessary for all requirements present in environment.yml and requirements.txt to be installed on your system as they contain dependencies for other parts of EasyVolcap. Thanks to the modular design of EasyVolcap, this missing packages will not hinder the rendering and training of 4K4D. After the installation process, we're expecting PyTorch, PyTorch3D and tiny-cuda-nn to be present in the current system for the rendering of 4K4D to work properly. For the training of 4K4D, you should also make sure that Open3D is properly installed. Other packages can be easily installed using pip if errors about their import are encountered. Check that this is the case with:

python -c "from easyvolcap.utils.console_utils import *" # Check for easyvolcap installation. 4K4D is a fork of EasyVolcap
python -c "import torch; print(torch.rand(3,3,device='cuda'))" # Check for pytorch installation
python -c "from pytorch3d.io import load_ply" # Check for pytorch3d installation
python -c "import tinycudann" # Check for tinycudann installation
python -c "import open3d" # Check for open3d installation. open3d is only required for training (extracting visual hulls)

Datasets

In this section, we provide instructions on downloading the full dataset for DNA-Rendering, ZJU-Mocap, NHR, ENeRF-Outdoor and Mobile-Stage dataset. If you only want to preview the pretrained models in the interactive GUI without any need for training, we recommend checking out the minimal dataset section because the full datasets are quite large in size. Note that for full quality rendering, you still need to download the full dataset as per the instructions below.

4K4D follows the typical dataset setup of EasyVolcap, where we group similar sequences into sub-directories of a particular dataset. Inside those sequences, the directory structure should generally remain the same: For example, after downloading and preparing the 0013_01 sequence of the DNA-Rendering dataset, the directory structure should look like this:

# data/renbody/0013_01:
# Required:
images # raw images, cameras inside: images/00, images/01 ...
masks # foreground masks, cameras inside: masks/00, masks/01 ...
extri.yml # extrinsic camera parameters, not required if the optimized folder is present
intri.yml # intrinsic camera parameters, not required if the optimized folder is present

# Optional:
optimized # OPTIONAL: optimized camera parameters: optimized/extri.yml, optimized: intri.yml
vhulls # OPTIONAL: extracted visual hull: vhulls/000000.ply, vhulls/000001.ply ... not required if the optimized folder and surfs folder are present
surfs # OPTIONAL: processed visual hull: surfs/000000.ply, surfs/000001.ply ...

DNA-Rendering, NHR and ZJU-MoCap Datasets

Please refer to Im4D's guide to download ZJU-MoCap, NHR and DNA-Rendering datasets. After downloading, the extracted files should be placed in to data/my_zjumocap, data/NHR and data/renbody respectively. If someone is interested in the processed data, please email me at [email protected] and CC [email protected] and [email protected] to request the processing guide. For ZJU-MoCap, you can fill in this Google form to request the download link. Note that you should cite the corresponding papers if you use these datasets.

ENeRF-Outdoor Dataset

If someone is interested in downloading the ENeRF-Outdoor dataset, please fill in this Google form to request the download link. Note that this dataset is for non-commercial use only. After downloading, the extracted files should be placed in data/enerf_outdoor.

Mobile-Stage Dataset

If someone is interested in downloading the Mobile-Stage dataset, please fill in this Google form to request the download link. Note that this dataset is for non-commercial use only. After downloading, the extracted files should be placed in data/mobile_stage.

Rendering

First, download the pretrained models.

After downloading, place them into data/trained_model (e.g. data/trained_model/4k4d_0013_01/1599.npz, data/trained_model/4k4d_0013_01_r4/latest.pt and data/trained_model/4k4d_0013_01_mb/-1.npz).

Note: The pre-trained models were created with the release codebase. This code base has been cleaned up and includes bug fixes, hence the metrics you get from evaluating them will differ from those in the paper. If yor're interested in reproducing the error metrics reported in the paper, please consider downloading the reference images.

Here we provide their naming convensions which corresponds to their respective config files:

  1. 4k4d_0013_01 (without any postfixes) is the real-time 4K4D model, corresponding to configs/projects/realtime4dv/rendering/4k4d_0013_01.yaml. This model can only be used for rendering. When combined with the full dataset mentioned above, this is the full official 4K4D implementation.
  2. 4k4d_0013_01_r4 (with the _r4 postfix) is the full pretrained model used during training, corresponding to configs/projects/realtime4dv/training/4k4d_0013_01_r4.yaml. This model can only be used for training. r4 is short for realtime4dv.
  3. 4k4d_0013_01_mb (with the _mb postfix) is an extension to 4K4D (Note: to be open-sourced) where we distill the IBR + SH appearance model into a set of low-degree SH parameters. This model can only be used for rendering and do not require pre-computation. mb is short for mobile.

Rendering of Trained Model

After placing the models and datasets in their respective places, you can run EasyVolcap with configs located in configs/projects/realtime4dv/rendering to perform rendering operations with 4K4D.

For example, to render the 0013_01 sequence of the DNA-Rendering dataset, you can run:

# GUI Rendering
evc -t gui -c configs/projects/realtime4dv/rendering/4k4d_0013_01.yaml,configs/specs/vf0.yaml # Only load, precompute and render the first frame
evc -t gui -c configs/projects/realtime4dv/rendering/4k4d_0013_01.yaml # Precompute and render all 150 frames, this could take a minute or two

# Testing with input views
evc -t test -c configs/projects/realtime4dv/rendering/4k4d_0013_01.yaml,configs/specs/eval.yaml,configs/specs/vf0.yaml # Only render some of the view of the first frame
evc -t test -c configs/projects/realtime4dv/rendering/4k4d_0013_01.yaml,configs/specs/eval.yaml # Only rendering some selected testing views and frames

# Rendering rotating novel views
evc -t test -c configs/projects/realtime4dv/rendering/4k4d_0013_01.yaml,configs/specs/eval.yaml,configs/specs/spiral.yaml,configs/specs/ibr.yaml,configs/specs/vf0.yaml # Render a static rotating novel view
evc -t test -c configs/projects/realtime4dv/rendering/4k4d_0013_01.yaml,configs/specs/eval.yaml,configs/specs/spiral.yaml,configs/specs/ibr.yaml # Render a dynamic rotating novel view

Rendering With Minimal Dataset (Only Encoded Videos)

We provide a minimal dataset for 4K4D to render with its full pipeline by encoding the input images and masks into videos (typically less than 100MiB each).

This leads to almost no visual quality loss, but if you have access to the full dataset, it's recommended to run the model on the full dataset instead (Sec. Rendering).

Here we provide instructions on setting up the minimal dataset and rendering with it:

  1. Download the pretrained models. If you've already done so for the Rendering section, this step can be skipped.
    1. Pretrained models should be placed directly into the data/trained_model directory (e.g. data/trained_model/4k4d_0013_01/1599.npz).
  2. Downlaod the minimal datasets.
    1. Place the compressed files inside their respective data_root (e.g. 0013_01_libx265.tar.gz should be placed into data/renbody/0013_01) and uncompressed them.
    2. Note that if you've already downloaded the full dataset with raw images as per the Rendering section, there's no need for redownloading the minimal dataset with encoded videos.
    3. However, if you continue, no files should be replaced and you can safely run both kinds of rendering.
    4. After the uncompression, you should see two folders: videos_libx265 and optimized. The former contains the encoded videos, and the latter contains the optionally optimized camera parameters. For some dataset, you'll see intri.yml and extri.yml instead of the optimized folder. And for some others, you'll see a videos_masks_libx265 for storing the masks separatedly.
  3. Process the minimal datasets using these two scripts:
    1. scripts/realtime4dv/extract_images.py: Extract images from the encoded videos. Use --data_root to control which dataset to extract.
    2. scripts/realtime4dv/extract_masks.py: Extract masks from the encoded videos. Use --data_root to control which dataset to extract.
    3. After the extraction (preprocessing), you should see a images_libx265 and a masks_libx265 inside your data_root. Example processing scripts:
# For foreground datasets with masks and masked images (DNA-Rendering, NHR, ZJU-Mocap)
python scripts/realtime4dv/extract_images.py --data_root data/renbody/0013_01
python scripts/realtime4dv/extract_masks.py --data_root data/renbody/0013_01

# For datasets with masks and full images (ENeRF-Outdoor and dance3 of MobileStage)
python scripts/realtime4dv/extract_images.py --data_root data/mobile_stage/dance3
python scripts/realtime4dv/extract_images.py --data_root data/mobile_stage/dance3 --videos_dir videos_masks_libx265 --images_dir masks_libx265 --single_channel
  1. Now, the minimal dataset has been prepared and you can render a model with it. The only change is to append a new config onto the command: configs/specs/video.yaml. Example rendering scripts:
# See configs/projects/realtime4dv/rendering for more
evc -t gui -c configs/projects/realtime4dv/rendering/4k4d_0013_01.yaml,configs/specs/video.yaml
evc -t gui -c configs/projects/realtime4dv/rendering/4k4d_sport1.yaml,configs/specs/video.yaml
evc -t gui -c configs/projects/realtime4dv/rendering/4k4d_my_313.yaml,configs/specs/video.yaml
evc -t gui -c configs/projects/realtime4dv/rendering/4k4d_dance3.yaml,configs/specs/video.yaml
evc -t gui -c configs/projects/realtime4dv/rendering/4k4d_actor1_4.yaml,configs/specs/video.yaml

Rendering Without Dataset (Mobile 4K4D)

  • TODO: Finish up the web viewer for the mobile 4k4d.

Training

  • TODO: Add trainable models & training examples

Pretrained Models for Training

Training on DNA-Rendering (ZJU-MoCap and NHR)

Training on ENeRF-Outdoor

Custom Datasets

  • TODO: Add trainable models & examples on custom datasets

Dataset Preparation

Configurations

Initialization (Space Carving || Visual Hull)

Training

Rendering

Optimizing the Cameras

Custom Full-Scene Datasets

  • TODO: Add trainable models & examples on custom full-scene datasets

Acknowledgements

We would like to acknowledge the following inspiring prior work:

Citation

If you find this code useful for your research, please cite us using the following BibTeX entry.

@article{xu20234k4d,
  title={4K4D: Real-Time 4D View Synthesis at 4K Resolution},
  author={Xu, Zhen and Peng, Sida and Lin, Haotong and He, Guangzhao and Sun, Jiaming and Shen, Yujun and Bao, Hujun and Zhou, Xiaowei},
  booktitle={arXiv preprint arXiv:2310.11448},
  year={2023}
}

@article{xu2023easyvolcap,
  title={EasyVolcap: Accelerating Neural Volumetric Video Research},
  author={Xu, Zhen and Xie, Tao and Peng, Sida and Lin, Haotong and Shuai, Qing and Yu, Zhiyuan and He, Guangzhao and Sun, Jiaming and Bao, Hujun and Zhou, Xiaowei},
  booktitle={SIGGRAPH Asia 2023 Technical Communications},
  year={2023}
}

More Repositories

1

EasyMocap

Make human motion capture easier.
Python
3,279
star
2

LoFTR

Code for "LoFTR: Detector-Free Local Feature Matching with Transformers", CVPR 2021, T-PAMI 2022
Jupyter Notebook
2,054
star
3

NeuralRecon

Code for "NeuralRecon: Real-Time Coherent 3D Reconstruction from Monocular Video", CVPR 2021 oral
Python
1,913
star
4

snake

Code for "Deep Snake for Real-Time Instance Segmentation" CVPR 2020 oral
Jupyter Notebook
1,142
star
5

OnePose

Code for "OnePose: One-Shot Object Pose Estimation without CAD Models", CVPR 2022
Python
903
star
6

neuralbody

Code for "Neural Body: Implicit Neural Representations with Structured Latent Codes for Novel View Synthesis of Dynamic Humans" CVPR 2021 best paper candidate
Python
897
star
7

pvnet

Code for "PVNet: Pixel-wise Voting Network for 6DoF Pose Estimation" CVPR 2019 oral
Jupyter Notebook
788
star
8

NeuralRecon-W

Code for "Neural 3D Reconstruction in the Wild", SIGGRAPH 2022 (Conference Proceedings)
Python
681
star
9

street_gaussians

Code for "Street Gaussians for Modeling Dynamic Urban Scenes"
576
star
10

mvpose

Code for "Fast and Robust Multi-Person 3D Pose Estimation from Multiple Views" (CVPR 2019, T-PAMI 2021)
Jupyter Notebook
504
star
11

animatable_nerf

Code for "Animatable Implicit Neural Representations for Creating Realistic Avatars from Videos" TPAMI 2024, ICCV 2021
Python
488
star
12

manhattan_sdf

Code for "Neural 3D Scene Reconstruction with the Manhattan-world Assumption" CVPR 2022 Oral
Python
482
star
13

EasyVolcap

[SIGGRAPH Asia 2023 (Technical Communications)] EasyVolcap: Accelerating Neural Volumetric Video Research
Python
461
star
14

ENeRF

SIGGRAPH Asia 2022: Code for "Efficient Neural Radiance Fields for Interactive Free-viewpoint Video"
Python
400
star
15

DetectorFreeSfM

Code for "Detector-Free Structure from Motion", Arxiv Preprint
393
star
16

clean-pvnet

Code for "PVNet: Pixel-wise Voting Network for 6DoF Pose Estimation" CVPR 2019 oral
C++
384
star
17

NeuMesh

Code for "MeuMesh: Learning Disentangled Neural Mesh-based Implicit Field for Geometry and Texture Editing", ECCV 2022 Oral
Python
374
star
18

AutoRecon

Code for "AutoRecon: Automated 3D Object Discovery and Reconstruction" CVPR 2023 (Highlight)
Python
341
star
19

OnePose_Plus_Plus

Code for "OnePose++: Keypoint-Free One-Shot Object Pose Estimation without CAD Models" NeurIPS 2022
Python
329
star
20

object_nerf

Code for "Learning Object-Compositional Neural Radiance Field for Editable Scene Rendering", ICCV 2021
Python
306
star
21

PVIO

Robust and Efficient Visual-Inertial Odometry with Multi-plane Priors
C++
298
star
22

Vox-Fusion

Code for "Dense Tracking and Mapping with Voxel-based Neural Implicit Representation", ISMAR 2022
Python
257
star
23

EfficientLoFTR

Jupyter Notebook
251
star
24

ENFT-SfM

This source code provides a reference implementation for ENFT-SfM.
C++
250
star
25

Wis3D

A web-based 3D visualization tool for 3D computer vision.
TypeScript
248
star
26

SMAP

Code for "SMAP: Single-Shot Multi-Person Absolute 3D Pose Estimation" (ECCV 2020)
Python
237
star
27

mlp_maps

Code for "Representing Volumetric Videos as Dynamic MLP Maps" CVPR 2023
Cuda
230
star
28

im4d

SIGGRAPH Asia 2023: Code for "Im4D: High-Fidelity and Real-Time Novel View Synthesis for Dynamic Scenes"
Python
226
star
29

disprcnn

Code release for Stereo 3D Object Detection via Shape Prior Guided Instance Disparity Estimation (CVPR 2020, TPAMI 2021)
Jupyter Notebook
211
star
30

PVO

code for "PVO: Panoptic Visual Odometry", CVPR 2023
Python
198
star
31

GIFT

Code for "GIFT: Learning Transformation-Invariant Dense Visual Descriptors via Group CNNs" NeurIPS 2019
Python
190
star
32

Mirrored-Human

Code for "Reconstructing 3D Human Pose by Watching Humans in the Mirror" (CVPR 2021 Oral)
184
star
33

pvnet-rendering

render images for pvnet training
Python
177
star
34

IntrinsicNeRF

code for "IntrinsicNeRF: Learning Intrinsic Neural Radiance Fields for Editable Novel View Synthesis", ICCV 2023
Python
174
star
35

InvRender

Code for "Modeling Indirect Illumination for Inverse Rendering", CVPR 2022
Python
165
star
36

EIBA

Efficient Incremental BA
C++
161
star
37

instant-nvr

[CVPR 2023] Code for "Learning Neural Volumetric Representations of Dynamic Humans in Minutes"
Python
144
star
38

Monocular_3D_human

137
star
39

eval-vislam

Toolkit for VI-SLAM evaluation.
C++
137
star
40

SINE

Code for "Semantic-driven Image-based NeRF Editing with Prior-guided Editing Field", CVPR 2023
Python
123
star
41

rnin-vio

Python
116
star
42

deltar

Code for "DELTAR: Depth Estimation from a Light-weight ToF Sensor And RGB Image", ECCV 2022
Python
112
star
43

NeuSC

A Temporal Voyage: Code for "Neural Scene Chronology" [CVPR 2023]
Python
111
star
44

DeFlowSLAM

code for "DeFlowSLAM: Self-Supervised Scene Motion Decomposition for Dynamic Dense SLAM"
109
star
45

SegmentBA

Segment based Bundle Adjustment
C++
108
star
46

CoLi-BA

C++
107
star
47

iMoCap

dataset for ECCV 2020 "Motion Capture from Internet Videos"
Python
104
star
48

VS-Net

VS-Net: Voting with Segmentation for Visual Localization
Python
86
star
49

UDOLO

Python
84
star
50

pats

Code for "PATS: Patch Area Transportation with Subdivision for Local Feature Matching", CVPR 2023
C++
84
star
51

SA-HMR

Code for "Learning Human Mesh Recovery in 3D Scenes" CVPR 2023
Python
79
star
52

ENFT

Efficient Non-Consecutive Feature Tracking for Robust SfM http://www.zjucvg.net/ls-acts/ls-acts.html
C++
76
star
53

TotalSelfScan

Code for "TotalSelfScan: Learning Full-body Avatars from Self-Portrait Videos of Faces, Hands, and Bodies" (NeurIPS 2022)
Python
73
star
54

SAM-Graph

Code for "SAM-guided Graph Cut for 3D Instance Segmentation"
69
star
55

gcasp

[CoRL 2022] Generative Category-Level Shape and Pose Estimation with Semantic Primitives
Python
66
star
56

GeneAvatar

Code for "GeneAvatar: Generic Expression-Aware Volumetric Head Avatar Editing from a Single Image", CVPR 2024
59
star
57

zju3dv.github.io

HTML
57
star
58

vig-init

Rapid and Robust Monocular Visual-Inertial Initialization with Gravity Estimation via Vertical Edges
C++
56
star
59

coxgraph

Code for "Coxgraph: Multi-Robot Collaborative, Globally Consistent, Online Dense Reconstruction System", IROS 2021 Best Paper Award Finalist on Safety, Security, and Rescue Robotics in memory of Motohiro Kisoi
C++
54
star
60

RVL-Dynamic

Code for "Prior Guided Dropout for Robust Visual Localization in Dynamic Environments" in ICCV 2019
Python
47
star
61

Vox-Surf

Code for "Vox-Surf: Voxel-based Implicit Surface Representation", TVCG 2022
Python
46
star
62

NIID-Net

Code for "NIID-Net: Adapting Surface Normal Knowledge for Intrinsic Image Decomposition in Indoor Scenes" TVCG
Python
43
star
63

hghoi

ICCV 2023, Hierarchical Generation of Human-Object Interactions with Diffusion Probabilistic Models
C++
43
star
64

RLP_VIO

Code for "RLP-VIO: Robust and lightweight plane-based visual-inertial odometry for augmented reality, CAVW 2022
C++
42
star
65

Mirror-NeRF

Code for "Mirror-NeRF: Learning Neural Radiance Fields for Mirrors with Whitted-Style Ray Tracing", ACM MM 2023
Python
37
star
66

AutoDecomp

3D object discovery from casual object captures
HTML
36
star
67

RelightableAvatar

[CVPR 2024 (Highlight)] Relightable and Animatable Neural Avatar from Sparse-View Video
Python
35
star
68

CloseMoCap

Official implementation of "Reconstructing Close Human Interaction from Multiple Views"
33
star
69

poking_perception

Python
29
star
70

MagLoc-AR

14
star
71

MVN-AFM

Code for "Multi-View Neural 3D Reconstruction of Micro-/Nanostructures with Atomic Force Microscopy"
Python
11
star
72

blink_sim

11
star
73

pvnet-depth-sup

10
star
74

hybrid3d

C++
10
star
75

nr_in_a_room

Code for "Neural Rendering in a Room: Amodal 3D Understanding and Free-Viewpoint Rendering for the Closed Scene Composed of Pre-Captured Objects", ACM ToG
Python
10
star
76

RNNPose

RNNPose: Recurrent 6-DoF Object Pose Refinement with Robust Correspondence Field Estimation and Pose Optimization, CVPR 2022
6
star
77

rnin-vio.github.io

CSS
2
star
78

LSFB

1
star