• Stars
    star
    169
  • Rank 223,257 (Top 5 %)
  • Language
    Python
  • License
    Other
  • Created over 2 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

[CVPR 2022] GAN inversion and editing with spatially-adaptive multiple latent layers

Spatially-Adaptive Multilayer (SAM) GAN Inversion

Project Page | Paper

We provide a PyTorch implementation of GANs projection using multilayer latent codes. Choosing a single latent layer for GAN inversion leads to a dilemma between obtaining a faithful reconstruction of the input image and being able to perform downstream edits (1st and 2nd row). In contrast, our proposed method automatically selects the latent space tailored for each region to balance the reconstruction quality and editability (3rd row).

Spatially-Adaptive Multilayer Selection for GAN Inversion and Editing
Gaurav Parmar, Yijun Li, Jingwan Lu, Richard Zhang, Jun-Yan Zhu, Krishna Kumar Singh
CMU, Adobe Research
CVPR 2022

Image Formation with Multiple Latent Codes

We use the predicted invertibility map in conjunction with multiple latent codes to generate the final image. First, the StyleBlocks of the pretrained StyleGAN2 model are modulated by W+ directly. Subsequently, for intermediate feature space Fi, we predict the change in the layerโ€™s feature value โˆ†Fi and add it to the feature block after masking with the corresponding binary mask mi.

Predicting the Invertibility Map

We begin with predicting how difficult each region of the image is to invert for every latent layer using our trained invertibility network. Subsequently, we refine the predicted map using a semantic segmentation network and combine them using a user-specified threshold. This combined invertibility map is shown on the right and used to determine the latent layer to be used for inverting each segment in the image.

Qualitative Results

Below we show image inversion and editing results obtained using the proposed method. Please see the project website for more results.

Getting Started

Clone this repo:

git clone --recurse-submodules https://github.com/adobe-research/sam_inversion
cd sam_inversion

Environment Setup

See environment.yml for a full list of library dependencies. The following commands can be used to install all the dependencies in a new conda environment.

conda env create -f environment.yml
conda activate inversion

Inversion

An example command for inverting an image for a given target image is shown below. The --image_category should be one of {"cars", "faces", "cats"}. The --sweep_threshold will perform inversion for a range of different threshold values. See file for other optional flags.

python src/sam_inv_optimization.py \
    --image_category "cars" --image_path test_images/cars/b.png \
    --output_path "output/cars/" --sweep_thresholds --generate_edits

Using a Custom Dataset

To perform SAM Inversion on a custom dataset, we need to train a corresponding invertibility network. First, perform a single layer inversion using all candidate latent spaces as shown in the command below for all images in the training set.

for latent_name in "W+" "F4" "F6" "F8" "F10"; do
    python src/single_latent_inv.py \
        --image_category "cats" --image_folder_path datasets/custom_images/train \
        --num_opt_steps 501 --output_path "output/custom_ds/train/${latent_name}" --target_H 256 --target_W 256 \
        --latent_name ${latent_name}
done

Next, repeat the above for the validation and test splits. Finally, train the invertibility network as shown in the example command below.

python src/train_invertibility.py \
    --dataset_folder_train output/custom_ds/train \
    --dataset_folder_val output/custom_ds/val \
    --output_folder output/invertibility/custom_ds \
    --gpu-ids "0" --batch-size 16 --lr 0.0001

Reference

If you find this code useful for your research, please cite our paper.

@inproceedings{
parmar2022sam,
title={Spatially-Adaptive Multilayer Selection for GAN Inversion and Editing},
author={Gaurav Parmar, Yijun Li, Jingwan Lu, Richard Zhang, Jun-Yan Zhu, Krishna Kumar Singh},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2022}
}

Related Projects

Please check out our past GANs inversion projects:
iGAN (ECCV 2016), GANPaint (SIGGRAPH 2019), GANSeeing (ICCV 2019), pix2latent (ECCV 2020)

Acknowledgment

Our work is built partly based on the following repos:

  • e4e - Encoder used for the W+ inversions.
  • StyleGAN - The generative model used for the inversion.
  • Deeplab3-xception - Used for the base architectore of the invertibility prediction network.
  • HRNet, Detectron - Used for segmenting images (except faces).
  • Face Parsing - Used for segmenting face images.

More Repositories

1

custom-diffusion

Custom Diffusion: Multi-Concept Customization of Text-to-Image Diffusion (CVPR 2023)
Python
1,835
star
2

theseus

A pretty darn cool JavaScript debugger for Brackets
JavaScript
1,337
star
3

MakeItTalk

Jupyter Notebook
481
star
4

DeepAFx-ST

DeepAFx-ST - Style transfer of audio effects with differentiable signal processing. Please see https://csteinmetz1.github.io/DeepAFx-ST/
Python
352
star
5

spindle

Next-generation web analytics processing with Scala, Spark, and Parquet.
JavaScript
332
star
6

diffusion-rig

Code Release for DiffusionRig (CVPR 2023)
Python
259
star
7

MetaAF

Control adaptive filters with neural networks.
Python
221
star
8

DeepAFx

Third-party audio effects plugins as differentiable layers within deep neural networks.
Jupyter Notebook
185
star
9

ActionScript4

ActionScript 4 specification archive
TeX
181
star
10

affordance-insertion

Python
135
star
11

convmelspec

Convmelspec: Convertible Melspectrograms via 1D Convolutions
Python
128
star
12

MagicFixup

Python
125
star
13

VideoDoodles

Python
119
star
14

fondue

JavaScript instrumentation library for collecting traces
JavaScript
110
star
15

libkafka

A C++ client library for Apache Kafka v0.8+. Also includes C API.
C++
90
star
16

domain-expansion

Domain Expansion of Image Generators - CVPR23
Python
86
star
17

deft_corpus

The Definition Extraction From Text corpus and relevant formatting scripts
Python
79
star
18

node-theseus

JavaScript
76
star
19

GCview

GC / memory management visualization and monitoring framework.
JavaScript
73
star
20

vaw_dataset

This repository provides data for the VAW dataset as described in the CVPR 2021 paper titled "Learning to Predict Visual Attributes in the Wild" and the ECCV 2022 paper titled "Improving Closed and Open-Vocabulary Attribute Prediction using Transformers"
Python
61
star
21

svgObjectModelGenerator

SVG OM Generator & Writer
JavaScript
49
star
22

spark-parquet-thrift-example

Example Spark project using Parquet as a columnar store with Thrift objects.
Scala
48
star
23

spark-cluster-deployment

Automates Spark standalone cluster tasks with Puppet and Fabric.
Python
43
star
24

EntitySeg-Dataset

Adobe-EntitySeg dataset
38
star
25

spark-gpu

GPU Acceleration for Apache Spark
Python
34
star
26

layered-depth-refinement

Python
32
star
27

auto-wire-removal

28
star
28

sunstage

Python
28
star
29

deep-acoustic-analysis

Python
26
star
30

mesh

General-purpose programming language featuring functional idioms, strong static inferred types, and a concurrency model built on managed mutability and STM.
26
star
31

AutoToon

Python
25
star
32

VideoSham-dataset

22
star
33

CHART-Synthetic

Synthetic Dataset used in the ICDAR2019 Competition on HArvesting Raw Tables from Infographics (CHART-Infographics)
Python
19
star
34

DiffusionHandles

Diffusion Handles is a training-free method that enables 3D-aware image edits using a pre-trained Diffusion Model.
Python
15
star
35

Cross-lingual-Test-Dataset-XTD10

13
star
36

beacon-aug

Cross-library augmentation toolbox supporting 300 operators over 8 libraries + AI transforms
Jupyter Notebook
12
star
37

audio-retargeting

C
11
star
38

prometheus-opentsdb-exporter

A Prometheus exporter component for OpenTSDB
Scala
10
star
39

cross-preferences

Java Preferences SPI implementations backed by distributed configuration stores (web API included)
Java
8
star
40

aesop

AESOP: Abstract Encoding of Stories, Objects and Pictures
Python
7
star
41

meetingqa

Python
7
star
42

UniHuman

Python
7
star
43

mississippi

Mississippi is a Python package that runs batch jobs in the Amazon Web Services (AWS) environment.
6
star
44

http_streaming_client

Ruby HTTP client with support for HTTP 1.1 streaming, GZIP compressed streams, and chunked transfer encoding. Includes extensible OAuth support for the Adobe Analytics Firehose and Twitter Streaming APIs.
Ruby
6
star
45

DocEdit-Dataset

Release of the DocEdit Dataset associated with the AAAI 2023 paper "DocEdit: Language-guided Document Editing"
5
star
46

longmoment-detr

Python
5
star
47

LexDeMod

3
star
48

hw_with_style

Python
2
star
49

AutoForecast_ResourceUsageData

2
star
50

ASWValData

Jupyter Notebook
1
star