• Stars
    star
    633
  • Rank 71,037 (Top 2 %)
  • Language
    Jupyter Notebook
  • License
    MIT License
  • Created about 2 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Implementation of Paint-with-words with Stable Diffusion : method from eDiff-I that let you generate image from text-labeled segmentation map.

Paint-with-Words, Implemented with Stable diffusion

Subtle Control of the Image Generation

Notice how without PwW the cloud is missing.

Notice how without PwW, abandoned city is missing, and road becomes purple as well.

Shift the object : Same seed, just the segmentation map's positional difference

"A digital painting of a half-frozen lake near mountains under a full moon and aurora. A boat is in the middle of the lake. Highly detailed."

Notice how nearly all of the composition remains the same, other than the position of the moon.


Recently, researchers from NVIDIA proposed eDiffi. In the paper, they suggested method that allows "painting with word". Basically, this is like make-a-scene, but with just using adjusted cross-attention score. You can see the results and detailed method in the paper.

Their paper and their method was not open-sourced. Yet, paint-with-words can be implemented with Stable Diffusion since they share common Cross Attention module. So, I implemented it with Stable Diffusion.

Installation

pip install git+https://github.com/cloneofsimo/paint-with-words-sd.git

Basic Usage

Before running, fill in the variable HF_TOKEN in .env file with Huggingface token for Stable Diffusion, and load_dotenv().

Prepare segmentation map, and map-color : tag label such as below. keys are (R, G, B) format, and values are tag label.

{
    (0, 0, 0): "cat,1.0",
    (255, 255, 255): "dog,1.0",
    (13, 255, 0): "tree,1.5",
    (90, 206, 255): "sky,0.2",
    (74, 18, 1): "ground,0.2",
}

You neeed to have them so that they are in format "{label},{strength}", where strength is additional weight of the attention score you will give during generation, i.e., it will have more effect.

import dotenv
from PIL import Image

from paint_with_words import paint_with_words

settings = {
    "color_context": {
        (0, 0, 0): "cat,1.0",
        (255, 255, 255): "dog,1.0",
        (13, 255, 0): "tree,1.5",
        (90, 206, 255): "sky,0.2",
        (74, 18, 1): "ground,0.2",
    },
    "color_map_img_path": "contents/example_input.png",
    "input_prompt": "realistic photo of a dog, cat, tree, with beautiful sky, on sandy ground",
    "output_img_path": "contents/output_cat_dog.png",
}


dotenv.load_dotenv()

color_map_image = Image.open(settings["color_map_img_path"]).convert("RGB")
color_context = settings["color_context"]
input_prompt = settings["input_prompt"]

img = paint_with_words(
    color_context=color_context,
    color_map_image=color_map_image,
    input_prompt=input_prompt,
    num_inference_steps=30,
    guidance_scale=7.5,
    device="cuda:0",
)

img.save(settings["output_img_path"])

There is minimal working example in runner.py that is self contained. Please have a look!


Weight Scaling

In the paper, they used $w \log (1 + \sigma) \max (Q^T K)$ to scale appropriate attention weight. However, this wasn't optimal after few tests, found by CookiePPP. You can check out the effect of the functions below:

$w' = w \log (1 + \sigma) std (Q^T K)$

$w' = w \log (1 + \sigma) \max (Q^T K)$

$w' = w \log (1 + \sigma^2) std (Q^T K)$

You can define your own weight function and further tweak the configurations by defining weight_function argument in paint_with_words.

Example:

w_f = lambda w, sigma, qk: 0.4 * w * math.log(sigma**2 + 1) * qk.std()

img = paint_with_words(
    color_context=color_context,
    color_map_image=color_map_image,
    input_prompt=input_prompt,
    num_inference_steps=20,
    guidance_scale=7.5,
    device="cuda:0",
    preloaded_utils=loaded,
    weight_function=w_f
)

More on the weight function, (but higher)

$w' = w \log (1 + \sigma) std (Q^T K)$

$w' = w \log (1 + \sigma) \max (Q^T K)$

$w' = w \log (1 + \sigma^2) std (Q^T K)$

Regional-based seeding

Following this example, where the random seed for whole image is 0,

"A digital painting of a half-frozen lake near mountains under a full moon and aurora. A boat is in the middle of the lake. Highly detailed."

the random seed for 'boat', 'moon', and 'mountain' are set to various values show in the top row.

Example:

EXAMPLE_SETTING_4_seed = {
    "color_context": {
        (7, 9, 182): "aurora,0.5,-1",
        (136, 178, 92): "full moon,1.5,-1",
        (51, 193, 217): "mountains,0.4,-1",
        (61, 163, 35): "a half-frozen lake,0.3,-1",
        (89, 102, 255): "boat,2.0,2077",
    },
    "color_map_img_path": "contents/aurora_1.png",
    "input_prompt": "A digital painting of a half-frozen lake near mountains under a full moon and aurora. A boat is in the middle of the lake. Highly detailed.",
    "output_img_path": "contents/aurora_1_seed_output.png",
}

where the 3rd item of context are random seed for the object. Use -1 to follow the seed set in paint_with_words function. In this example the random seed of boat is set to 2077.

Image inpainting

Following the previous example, the figure below shows the results of image inpainting with paint-with-word

where the top row shows the example of editing moon size by inpainting. The bottom row shows the example of re-synthesize the moon by inpainting with the original "input color map" for text-image paint-with-word.

Example

from paint_with_words import paint_with_words_inpaint


img = paint_with_words_inpaint(
    color_context=color_context,
    color_map_image=color_map_image,
    init_image=init_image,
    mask_image=mask_image,
    input_prompt=input_prompt,
    num_inference_steps=150,
    guidance_scale=7.5,
    device="cuda:0",
    seed=81,
    weight_function=lambda w, sigma, qk: 0.15 * w * math.log(1 + sigma) * qk.max(),
    strength = 1.0,
)

To run inpainting

python runner_inpaint.py

Using other Fine-tuned models

If you are from Automatic1111 community, you maybe used to using native LDM checkpoint formats, not diffuser-checkpoint format. Luckily, there is a quick script that allows conversion. this.

python change_model_path.py --checkpoint_path custom_model.ckpt --scheduler_type ddim --dump_path custom_model_diffusion_format

Now, use the converted model in paint_with_words function.

from paint_with_words import paint_with_words, pww_load_tools

loaded = pww_load_tools(
    "cuda:0",
    scheduler_type=LMSDiscreteScheduler,
    local_model_path="./custom_model_diffusion_format"
)
#...
img = paint_with_words(
    color_context=color_context,
    color_map_image=color_map_image,
    input_prompt=input_prompt,
    num_inference_steps=30,
    guidance_scale=7.5,
    device="cuda:0",
    weight_function=lambda w, sigma, qk: 0.4 * w * math.log(1 + sigma) * qk.max(),
    preloaded_utils=loaded
)

Example Notebooks

You can view the minimal working notebook here or Open In Colab


Gradio interface

Paint-with-word

To launch gradio api

python gradio_pww.py

Noting that the "Color context" should follows the format defined as the example in runner.py. For example,

{(7, 9, 182): "aurora,0.5,-1",(136, 178, 92): "full moon,1.5,-1",(51, 193, 217): "mountains,0.4,-1",(61, 163, 35): "a half-frozen lake,0.3,-1",(89, 102, 255): "boat,2.0,2077",}

Color contenet extraction

One can extract the color content from "Segmentation map" by expanding the "Color content option". Press the button "Extract color content" to extract the unique color of images.

In "Color content option", the extracted colors are shown respectively for each item. One can then replace "obj" with the object appear in the prompt. Importantly, don't use "," in the object, as this is the separator of the color content.

Click the button "Generate color content" to collect all the contents into "Color content" the textbox as the formal input of Paint-with-word.

The same function is supported for Paint-with-word for image inpainting as shown below

Paint-with-word for image inpainting

To launch gradio api

python gradio_pww_inpaint.py

Paint with Word (PwW) + ControlNet Extension for AUTOMATIC1111(A1111) stable-diffusion-webui

This extension provide additional PwW control to ControlNet. See sd-webui-controlnet-pww for the repo of this module.

The demo is shown below.

screencapture-127-0-0-1-7860-2023-03-13-10_56_34

The implementation is based on the great controlnet extension for A1111

Benchmark of ControlNet + PwW

The following figure shows the comparison between the ControlNet results and the ControlNet+PwW results for the boat examples.

Noting that the PwW make the background, e.g. aurora and mountains, more realistic as weight function scales increases.

The setups are detailed as follows

Scribble and Segmentation map:

Prompts:

"A digital painting of a half-frozen lake near mountains under a full moon and aurora. A boat is in the middle of the lake. Highly detailed."

Color contents:

"{(7, 9, 182): "[email protected]@-1",(136, 178, 92): "full [email protected]@-1",(51, 193, 217): "[email protected]@-1",(61, 163, 35): "a half-frozen [email protected]@-1",(89, 102, 255): "[email protected]@-1",}"

Note that A1111 extension now use "@" as separator instead of ",".

Assign the material for the specific region in scribble

One can use PwW to assign the material upon scribble, see the results comparing ControlNet and ControlNet+PwW below.

Noting that the material of turtle shell specified by PwW is significantly improved showns in the right blocks. Please see sd-webui-controlnet-pww for the experimental setups.

Installation

(1) Clone the source code to A1111 webui extensions

one can install by cloning the 'pww_controlnet" directory into the extensions directory of A1111 webui

cp -rf pww_controlnet path/stable-diffusion-webui/extensions/

or simply

cd path/stable-diffusion-webui/extensions/
git clone [email protected]:lwchen6309/sd-webui-controlnet-pww.git

where path is the location of A1111 webui.

(2) Setup pretrained model of ControlNet

Please follow the instruction of controlnet extension to get the pretrained models.

IMPORTANT: This extension is currently NOT compatible with ControlNet extension as reported at this issue. Hence, please disable the ControlNet extension before you install ControlNet+PwW.

However, one can still make them compatible by following the instruction of installation.

TODO

  • Make extensive comparisons for different weight scaling functions.
  • Create word latent-based cross-attention generations.
  • Check if statement "making background weight smaller is better" is justifiable, by using some standard metrics
  • Create AUTOMATIC1111's interface
  • Create Gradio interface
  • Create tutorial
  • See if starting with some "known image latent" is helpful. If it is, we might as well hard-code some initial latent.
  • Region based seeding, where we set seed for each regions. Can be simply implemented with extra argument in COLOR_CONTEXT
  • sentence wise text seperation. Currently token is the smallest unit that influences cross-attention. This needs to be fixed. (Can be done pretty trivially)
  • Allow different models to be used. use this.
  • "negative region", where we can set some region to "not" have some semantics. can be done with classifier-free guidance.
  • Img2ImgPaintWithWords -> Img2Img, but with extra text segmentation map for better control
  • InpaintPaintwithWords -> inpaint, but with extra text segmentation map for better control
  • Support for other schedulers

Acknowledgement

Thanks for the inspiring gradio interface from ControlNet

Thanks for the wonderful A1111 extension of controlnet as the baseline of our implementation

More Repositories

1

lora

Using Low-rank adaptation to quickly fine-tune diffusion models.
Jupyter Notebook
6,852
star
2

minDiffusion

Self-contained, minimalistic implementation of diffusion models with Pytorch.
Python
809
star
3

minSDXL

Huggingface-compatible SDXL Unet implementation that is readily hackable
Jupyter Notebook
367
star
4

minRF

Minimal implementation of scalable rectified flow transformers, based on SD3's approach
Jupyter Notebook
322
star
5

consistency_models

Unofficial Implementation of Consistency Models in pytorch
Python
244
star
6

d3pm

Minimal Implementation of a D3PM in pytorch
Jupyter Notebook
159
star
7

min-max-gpt

Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training
Python
104
star
8

realformer-pytorch

Implementation of RealFormer using pytorch
Python
102
star
9

magicmix

Unofficial Implementation of MagicMix
Python
97
star
10

t2i-adapter-diffusers

Python
85
star
11

promptplusplus

Jupyter Notebook
72
star
12

ezmup

Simple implementation of muP, based on Spectral Condition for Feature Learning. The implementation is SGD only, dont use it for Adam
Python
63
star
13

sdxl_inversions

Jupyter Notebook
62
star
14

min-fsdp

Python
62
star
15

karras-power-ema-tutorial

Python
49
star
16

insightful-nn-papers

These papers will provide unique insightful concepts that will broaden your perspective on neural networks and deep learning
42
star
17

clipping-CLIP-to-GAN

Python
40
star
18

fim-llama-deepspeed

Python
31
star
19

min-max-in-dit

Python
26
star
20

imagenet.int8

Python
26
star
21

auto_llm_codebase_analysis

Python
25
star
22

project_RF

Python
22
star
23

inversion_edits

Jupyter Notebook
17
star
24

efae

Python
15
star
25

repa-rf

Python
15
star
26

zeroshot-storytelling

Github repository for Zero Shot Visual Storytelling
Python
15
star
27

ptar

C++
13
star
28

planning-with-diffusion-tutorial

Jupyter Notebook
12
star
29

poly2SOP

Transformer takes a polynomial, expresses it as sum of powers.
Python
11
star
30

smallest_working_performer

Python
10
star
31

reverse_eng_deepspeed_study

DeepSpeed Study, focused on reverse engineering and enhancing documentation
Python
6
star
32

n-body-dynamic-cuda

Cuda
6
star
33

smallest_working_gpt

gpt that is even smaller
Python
6
star
34

minDinoV2

Python
6
star
35

infinite-fractal-stream

Jupyter Notebook
6
star
36

torchcu

Python
5
star
37

lora_dreambooth_replicate

Jupyter Notebook
4
star
38

imgdataset_process

Python
4
star
39

Algorithms-TSNE

How are algorithms really related? We use data from solved.ac and matrix factorization to find out.
Python
4
star
40

rectified-flow

Jupyter Notebook
4
star
41

neural-tsp-pytorch

Python
3
star
42

binclone_python

Python
3
star
43

policy-optimization-torch

Python
2
star
44

samsung_s1t1

Jupyter Notebook
2
star
45

compare_aura_sd3

Vibe check Imagegen models (AuraFlow vs Others)
HTML
2
star
46

culll

Python
2
star
47

latex-quick-figures

atom package
JavaScript
2
star
48

cattalk

Jupyter Notebook
2
star
49

railabweb

Source code for Railab website
HTML
1
star
50

arp-spoofing

C++
1
star
51

cv

Simple CV (pdf, Latex)
1
star
52

send-arp

C++
1
star
53

project_structured_prompt

Jupyter Notebook
1
star
54

netfilter

C
1
star
55

unn-lstm-torch

Python
1
star
56

PGAT

On Using Transformer as Password Guessing Attacker
Python
1
star
57

Super-Simple-LSTM-Template

Python
1
star
58

SemanticSegmentationTrainerTemplate

Python
1
star
59

Freshman_2

Lecture notes from my Freshman 2nd semester
HTML
1
star
60

vqgan-training

Python
1
star
61

tellghsomething

κ΅°λŒ€κ°„ κ·œν˜„μ΄μ—κ²Œ 짧은 νŽΈμ§€λ₯Ό μ“°μž. λ‹€λ§Œ μžλ™μœΌλ‘œ...
TypeScript
1
star
62

pcap-test

C++
1
star