• Stars
    star
    149
  • Rank 248,619 (Top 5 %)
  • Language
    Python
  • License
    Other
  • Created almost 2 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Official pytorch implementation of BlendNeRF (ICCV 2023)

[ICCV 23] BlendNeRF - Official PyTorch Implementation

3D-aware Blending with Generative NeRFs
Hyunsu Kim1, Gayoung Lee1, Yunjey Choi1, Jin-Hwa Kim1,2, Jun-Yan Zhu3
1NAVER AI Lab, 2SNU AIIS, 3CMU

Project | arXiv | Paper

Abstract: Image blending aims to combine multiple images seamlessly. It remains challenging for existing 2D-based methods, especially when input images are misaligned due to differences in 3D camera poses and object shapes. To tackle these issues, we propose a 3D-aware blending method using generative Neural Radiance Fields (NeRF), including two key components: 3D-aware alignment and 3D-aware blending.

For 3D-aware alignment, we first estimate the camera pose of the reference image with respect to generative NeRFs and then perform pose alignment for objects. To further leverage 3D information of the generative NeRF, we propose 3D-aware blending that utilizes volume density and blends on the NeRF's latent space, rather than raw pixel space.

Installation

ubuntu CUDA CUDA-driver Python pytorch

For all the experiments, we use a single A100 GPU.

Clone this repository:

git clone https://github.com/naver-ai/BlendNeRF.git
cd BlendNeRF/eg3d

Install the dependencies:

conda env create -f environment.yml
conda activate blendnerf
./install.sh

Download pretrained networks

./download.sh

Pretrained EG3D generators:

  • original_ffhq_512-128_400kimg.pkl for FFHQ. As EG3D uses a different crop version of FFHQ compared to the original FFHQ, we fine-tune the EG3D using the original crop version of FFHQ.

  • afhqcats512-128.pkl for AFHQv2-Cat.

Camera pose estimators (encoders):

  • encoder_ffhq.pkl for FFHQ.
  • encoder_afhq.pkl for AFHQv2-Cat.

Semantic segmentation networks. We create target masks for editing to automatically simulate user mask inputs using the networks:

  • bisenet.ckpt for human faces.
  • deeplab_epoch_19.pth for cat faces.

Blend images

Command examples in AFHQ dataset.

Step 1. Image inversion

# For a single image, the first stage (W optimization) takes 23.6s, and the second stage (G optimization) takes 20.4s on a single A100 GPU.

python inversion.py --outdir=results/inversion --network=checkpoints/afhqcats512-128.pkl --encoder_network checkpoints/encoder_afhq.pkl --original_img_path test_images/afhq/original.png --reference_img_path test_images/afhq/reference.png

Step 2. Blend images with 3D-aware alignment

  • Ours only
# It takes 27 seconds
python blend.py --outdir=results/editing/face --network=checkpoints/afhqcats512-128.pkl --editing_target face --inversion_path=results/inversion --enable_warping True --poisson=False --shapes=True --n_iterations=200 --ref_color_lambda 5 
  • Ours + Poisson blending
# It takes 12 seconds
python blend.py --outdir=results/editing/face --network=checkpoints/afhqcats512-128.pkl --editing_target face --inversion_path=results/inversion --enable_warping True --poisson=True --shapes=False --n_iterations=100 --ref_color_lambda 5 

Argument instructions

--outdir: saving path
--network: pretrained generative NeRF path
--editing_target: blending region (e.g, face) # You can give your custom mask instead of using segmentation networks
--enable_warping: apply local alignment (Appendix C)
--poisson: apply Poisson blending with our method
--shapes: save meshes in .mrc format
--n_iterations: the number of iteration of optimization
--ref_color_lambda: the weight of the image blending loss # setting it zero means we only use the density blending loss

Generate various results

Diverse blending settings

Below command will generate blending results in various settings with test_images/original.png and test_images/reference.png.

python run.py

# Results are in 
- results/celeba_hq/inversion
- results/celeba_hq/editing
- results/afhq/inversion
- results/afhq/editing

Datasets:

  • CelebA-HQ
  • AFHQ

Target parts:

  • CelebA-HQ (face, hair, nose, lip, eyes)
  • AFHQ (face, ears, eyes).

Note that you can give any aribitray mask instead of using pretrained semantic segmentation networks.

Methods:

  • Ours
  • Ours + Poisson Blending

In our method without Poisson blending, multi-view blending and mesh results are included. Visualize the .mrc mesh file with UCSF Chimerax.

Diverse input images

Below command will generate multi-view blending results with diverse input images.

python run_more_images.py

# Results are in 
- results/celeba_hq/83_126
- results/celeba_hq/102_314
- ...
- results/afhq/2_325
- results/afhq/171_322
- ...

Related Projects

The model code starts from EG3D. In addition, for the simple user interface, we automatically obtain mask inputs by utilizing the following projects:

However, users are not limited to these options and can use their own custom masks if desired.

Future Works

Inversion

Currently, we use PTI for our inversion method. However, we recognize that the inversion process is the bottleneck of our blending pipeline in terms of speed and quality. PTI is slow and sometimes generates inadequate 3D shapes. Additionally, our camera pose estimator (encoder) might not accurate enough to do precise inversion.

If you seek a faster or more accurate inversion method, we suggest exploring other recent 3D-aware inversion approaches, such as:

Local alignment

We use the Iterative Closest Point (ICP) algorithm for local alignment (Appendix C) in AFHQ. However, it may fall into the local extremum, which can impact the alignment quality. For better local alignment, we recommend considering recent pairwise registration techniques.

License

The source code, pre-trained models, and dataset are available under NVIDIA Source Code License for EG3D.

For technical and other inquires, please contact [email protected].

Citation

If you find this work useful for your research, please cite our paper:

@inproceedings{kim20233daware,
  title={3D-aware Blending with Generative NeRFs}, 
  author={Hyunsu Kim and Gayoung Lee and Yunjey Choi and Jin-Hwa Kim and Jun-Yan Zhu},
  booktitle={ICCV},
  year={2023}
}

Acknowledgements

We would like to thank Seohui Jeong, Che-Sang Park, Eric R. Chan, Junho Kim, Jung-Woo Ha, Youngjung Uh, and other NAVER AI Lab researchers for their helpful comments and sharing of materials. All experiments were conducted on NAVER Smart Machine Learning (NSML) platform.

More Repositories

1

DenseDiffusion

Official Pytorch Implementation of DenseDiffusion (ICCV 2023)
Jupyter Notebook
466
star
2

StyleMapGAN

Official pytorch implementation of StyleMapGAN (CVPR 2021)
Python
458
star
3

Visual-Style-Prompting

Official Pytorch implementation of "Visual Style Prompting with Swapping Self-Attention"
Python
415
star
4

relabel_imagenet

Python
395
star
5

vidt

Python
305
star
6

pit

Python
240
star
7

korean-safety-benchmarks

Official datasets and pytorch implementation repository of SQuARe and KoSBi (ACL 2023)
Python
233
star
8

c3-gan

Official Pytorch implementation of C3-GAN (Spotlight at ICLR 2022)
Python
125
star
9

rope-vit

[ECCV 2024] Official PyTorch implementation of RoPE-ViT "Rotary Position Embedding for Vision Transformer"
Python
124
star
10

pcme

Official Pytorch implementation of "Probabilistic Cross-Modal Embedding" (CVPR 2021)
Python
121
star
11

GGDR

Official Pytorch implementation of GGDR (ECCV 2022)
Python
102
star
12

cl-vs-mim

(ICLR 2023) Official PyTorch implementation of "What Do Self-Supervised Vision Transformers Learn?"
Jupyter Notebook
97
star
13

calm

Python
91
star
14

PfLayer

Learning Features with Parameter-Free Layers, ICLR 2022
Python
85
star
15

rdnet

[ECCV2024] Official implementation of paper, "DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs".
Python
84
star
16

w-ood

Python
80
star
17

model-stock

Model Stock: All we need is just a few fine-tuned models
72
star
18

egtr

[CVPR 2024 Best paper award candidate] EGTR: Extracting Graph from Transformer for Scene Graph Generation
Python
65
star
19

hypermix

Code for text augmentation method leveraging large-scale language models
Python
60
star
20

carecall-corpus

CareCall for Seniors: Role Specified Open-Domain Dialogue dataset generated by leveraging LLMs (NAACL 2022).
59
star
21

eccv-caption

Extended COCO Validation (ECCV) Caption dataset (ECCV 2022)
Python
52
star
22

i-Blurry

Official Pytorch implementation of Online Continual Learning on Class Incremental Blurry Task Configuration with Anytime Inference (ICLR 2022)
Python
52
star
23

seit

[ECCV2024][ICCV2023] Official PyTorch implementation of SeiT++ and SeiT
Python
51
star
24

FSMR

Official Tensorflow implementation of "Feature Statistics Mixing Regularization for Generative Adversarial Networks" (CVPR 2022)
Python
49
star
25

pcmepp

Official Pytorch implementation of "Improved Probabilistic Image-Text Representations" (ICLR 2024)
Python
48
star
26

cmo

Python
45
star
27

facetts

Python
44
star
28

cream

Visually-Situated Natural Language Understanding with Contrastive Reading Model and Frozen Large Language Models, EMNLP 2023
Python
42
star
29

dap-cl

Official code of "Generating Instance-level Prompts for Rehearsal-free Continual Learning (ICCV 2023)"
Python
39
star
30

NeglectedFreeLunch

Jupyter Notebook
36
star
31

neuralwoz

NeuralWOZ: Learning to Collect Task-Oriented Dialogue via Model-based Simulation (ACL-IJCNLP 2021)
Python
36
star
32

dual-teacher

Official code for the NeurIPS 2023 paper "Switching Temporary Teachers for Semi-Supervised Semantic Segmentation"
Python
35
star
33

augsub

Official PyTorch implementation of MaskSub "Masking Augmentation for Supervised Learning"
Python
32
star
34

chacha-chatbot

Python
31
star
35

tablevqabench

Jupyter Notebook
30
star
36

carecall-memory

Keep Me Updated! Memory Management in Long-term Conversations (Findings of EMNLP 2022)
28
star
37

mid.metric

Python
27
star
38

MetricMT

The official code repository for MetricMT - a reward optimization method for NMT with learned metrics
25
star
39

scob

Official Implementation of SCOB [ICCV 2023]
Python
22
star
40

ALMoST

Python
22
star
41

coco-annotation-tool

TypeScript
21
star
42

hmix-gmix

Jupyter Notebook
21
star
43

imagenet-annotation-tool

TypeScript
17
star
44

informer

17
star
45

cs-shortcut

Saving Dense Retriever from Shortcut Dependency in Conversational Search (EMNLP 2022)
Python
16
star
46

talebrush

The official source code for TaleBrush (CHI 2022)
Python
14
star
47

cgl_fairness

Python
14
star
48

KoBBQ

Official code and dataset repository of KoBBQ (TACL 2024)
Python
14
star
49

trace

TRACE: Table Reconstruction Aligned to Corner and Edges (ICDAR 2023)
Python
12
star
50

simseek

Generating Information-Seeking Conversations from Unlabeled Documents (EMNLP 2022).
Python
11
star
51

tc-clip

[ECCV 2024] Official PyTorch implementation of TC-CLIP "Leveraging Temporal Contextualization for Video Action Recognition"
Python
10
star
52

burn

Official Pytorch Implementation of Unsupervised Representation Learning for Binary Networks by Joint Classifier Training (CVPR 2022)
Python
10
star
53

tokenadapt

Python
8
star
54

llm-chatbot

The LLM chatbot demo website
HTML
7
star
55

lut

[ECCV 2024] Official PyTorch implementation of LUT "Learning with Unmasked Tokens Drives Stronger Vision Learners"
5
star
56

elva

On Efficient Language and Vision Assistants for Visually-Situated Natural Language Understanding: What Matters in Reading and Reasoning
5
star
57

rewas

5
star
58

densediffusion

5
star
59

rite

Python
5
star
60

demystifying-ntk

Demystifying the Neural Tangent Kernel from a Practical Perspective: Can it be trusted for Neural Architecture Search without training? (CVPR 2022)
Python
2
star
61

carte

CARTE: Cell Adjacency Relation for Table Evaluation
Python
2
star
62

chacha

TypeScript
1
star