• Stars
    star
    135
  • Rank 267,885 (Top 6 %)
  • Language
    Python
  • License
    Other
  • Created over 1 year ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Putting People in Their Place: Affordance-Aware Human Insertion into Scenes

Project Page | Paper

This repository contains the original PyTorch implementation of this project.

Putting People in Their Place: Affordance-Aware Human Insertion into Scenes
Sumith Kulal1, Tim Brooks2, Alex Aiken1, Jiajun Wu1,
Jimei Yang3, Jingwan Lu3, Alexei A. Efros2, Krishna Kumar Singh3
1Stanford University, 2UC Berkeley, 3Adobe Research

Setup

Install the necessary packages using either pip or conda:

# install via pip (recommended)
python3 -m venv .affordance
source .affordance/bin/activate
pip install wheel
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 \
	torchdata==0.3.0 --extra-index-url https://download.pytorch.org/whl/cu113
pip install -r requirements.txt

# install via conda
conda env create -f environment.yaml

Dataset

Similar to Hallucinating Pose-Compatible Scenes, we preprocess raw input video data into LMDBs that can be ingested by our dataloaders and training pipeline. Starting from raw input video data, we first process it to only consist spatio-temporal sequences of single human motion. We then pre-compute the human segmentation masks. A sample sliver of data in LMDB format is here.

Our pre-processing scripts also closely follow hallucinating-scenes repo, with improvements for speed and mask generation.

Preprocessing steps

You can refer to more complete instructions here.

# preprocess vidoes, use images script if videos are in extracted format
python data_from_videos.py input_dir=kinetics output_dir=kinetics_frames_256
### python data_from_images.py input_dir=kinetics output_dir=kinetics_256

# filter by detecting people bbox
python data_filter_people.py input_dir=kinetics_256 output_dir=kinetics_people

# filter by detecting body keypoints
mkdir open_pose/pretrained/
gdown --id 1k7Teg2bVxGBR7ECiNScNzlLF0PDiIgmo -O open_pose/pretrained/open_pose.pt
python data_detect_pose.py input_dir=kinetics_people output_dir=kinetics_pose

# generate masks for filtered data
python data_generate_masks.py input_dir=kinetics_pose output_dir=kinetics_mask

Save the dataset in the following format:

data_root/
  --> dataset1/
    --> frames_db/
    --> masks_db/
    --> clipmask_db/
  --> dataset2/
...

Training

The training follows config-driven approach as in the original CompVis/stable-diffusion repo.

Dataloader

The main control for the data-loading is through the data config. The various config parameters have been described in detail in the ./ldm/data/hic.py file. One could visualize the various options and the data sampled from them through them by running python -m ldm.data.hic. As mentioned previously, a sample dataset with handful of videos is made available here solely for data exploration and visualization. These videos originate from a single batch MPII Human Pose dataset and we urge users to go through the author's license here.

Training

Run the script bash train.sh for single node training. To experiment with different conditioning modalities and types (concat, crossattn), you can set the cond_stage_key param which has been expanded to handle more general structures.

FAQs

  • At times there is excessive memory usage and EOM during training? Although this rarely happens, if you observe unusual memory consumption it could be due to LMDB loading. We observed using fewer larger LMDBs help alleviate such issues. You can use merge LMDBs for smaller datasets using ldm/data/data_merge_lmdb.py.

  • Can you share the full dataset used for training? We will not be able to share the datasets for various reasons including size and source copyright. Our data preparation scripts can construct input datasets from various video sources. A good starting point would large video datasets that are publicly available, as listed here.

Please reach out to Sumith ([email protected]) for any questions or concerns.

Citation

If you find our project useful, please cite the following:

@inproceedings{kulal2023affordance,
author    = {Kulal, Sumith and Brooks, Tim and Aiken, Alex and Wu, Jiajun and Yang, Jimei and Lu, Jingwan and Efros, Alexei A. and Singh, Krishna Kumar},
title     = {Putting People in Their Place: Affordance-Aware Human Insertion into Scenes},
booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year      = {2023},
}

More Repositories

1

custom-diffusion

Custom Diffusion: Multi-Concept Customization of Text-to-Image Diffusion (CVPR 2023)
Python
1,835
star
2

theseus

A pretty darn cool JavaScript debugger for Brackets
JavaScript
1,337
star
3

MakeItTalk

Jupyter Notebook
481
star
4

DeepAFx-ST

DeepAFx-ST - Style transfer of audio effects with differentiable signal processing. Please see https://csteinmetz1.github.io/DeepAFx-ST/
Python
352
star
5

spindle

Next-generation web analytics processing with Scala, Spark, and Parquet.
JavaScript
332
star
6

diffusion-rig

Code Release for DiffusionRig (CVPR 2023)
Python
259
star
7

MetaAF

Control adaptive filters with neural networks.
Python
221
star
8

DeepAFx

Third-party audio effects plugins as differentiable layers within deep neural networks.
Jupyter Notebook
185
star
9

ActionScript4

ActionScript 4 specification archive
TeX
181
star
10

sam_inversion

[CVPR 2022] GAN inversion and editing with spatially-adaptive multiple latent layers
Python
169
star
11

convmelspec

Convmelspec: Convertible Melspectrograms via 1D Convolutions
Python
128
star
12

MagicFixup

Python
125
star
13

VideoDoodles

Python
119
star
14

fondue

JavaScript instrumentation library for collecting traces
JavaScript
110
star
15

libkafka

A C++ client library for Apache Kafka v0.8+. Also includes C API.
C++
90
star
16

domain-expansion

Domain Expansion of Image Generators - CVPR23
Python
86
star
17

deft_corpus

The Definition Extraction From Text corpus and relevant formatting scripts
Python
79
star
18

node-theseus

JavaScript
76
star
19

GCview

GC / memory management visualization and monitoring framework.
JavaScript
73
star
20

vaw_dataset

This repository provides data for the VAW dataset as described in the CVPR 2021 paper titled "Learning to Predict Visual Attributes in the Wild" and the ECCV 2022 paper titled "Improving Closed and Open-Vocabulary Attribute Prediction using Transformers"
Python
61
star
21

svgObjectModelGenerator

SVG OM Generator & Writer
JavaScript
49
star
22

spark-parquet-thrift-example

Example Spark project using Parquet as a columnar store with Thrift objects.
Scala
48
star
23

spark-cluster-deployment

Automates Spark standalone cluster tasks with Puppet and Fabric.
Python
43
star
24

EntitySeg-Dataset

Adobe-EntitySeg dataset
38
star
25

spark-gpu

GPU Acceleration for Apache Spark
Python
34
star
26

layered-depth-refinement

Python
32
star
27

auto-wire-removal

28
star
28

sunstage

Python
28
star
29

deep-acoustic-analysis

Python
26
star
30

mesh

General-purpose programming language featuring functional idioms, strong static inferred types, and a concurrency model built on managed mutability and STM.
26
star
31

AutoToon

Python
25
star
32

VideoSham-dataset

22
star
33

CHART-Synthetic

Synthetic Dataset used in the ICDAR2019 Competition on HArvesting Raw Tables from Infographics (CHART-Infographics)
Python
19
star
34

DiffusionHandles

Diffusion Handles is a training-free method that enables 3D-aware image edits using a pre-trained Diffusion Model.
Python
15
star
35

Cross-lingual-Test-Dataset-XTD10

13
star
36

beacon-aug

Cross-library augmentation toolbox supporting 300 operators over 8 libraries + AI transforms
Jupyter Notebook
12
star
37

audio-retargeting

C
11
star
38

prometheus-opentsdb-exporter

A Prometheus exporter component for OpenTSDB
Scala
10
star
39

cross-preferences

Java Preferences SPI implementations backed by distributed configuration stores (web API included)
Java
8
star
40

aesop

AESOP: Abstract Encoding of Stories, Objects and Pictures
Python
7
star
41

meetingqa

Python
7
star
42

UniHuman

Python
7
star
43

mississippi

Mississippi is a Python package that runs batch jobs in the Amazon Web Services (AWS) environment.
6
star
44

http_streaming_client

Ruby HTTP client with support for HTTP 1.1 streaming, GZIP compressed streams, and chunked transfer encoding. Includes extensible OAuth support for the Adobe Analytics Firehose and Twitter Streaming APIs.
Ruby
6
star
45

DocEdit-Dataset

Release of the DocEdit Dataset associated with the AAAI 2023 paper "DocEdit: Language-guided Document Editing"
5
star
46

longmoment-detr

Python
5
star
47

LexDeMod

3
star
48

hw_with_style

Python
2
star
49

AutoForecast_ResourceUsageData

2
star
50

ASWValData

Jupyter Notebook
1
star