• Stars
    star
    466
  • Rank 93,525 (Top 2 %)
  • Language
    Jupyter Notebook
  • License
    Apache License 2.0
  • Created about 1 year ago
  • Updated 10 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Official Pytorch Implementation of DenseDiffusion (ICCV 2023)

Dense Text-to-Image Generation with Attention Modulation

ICCV 2023 [Paper]

Authors    Yunji Kim1, Jiyoung Lee1, Jin-Hwa Kim1, Jung-Woo Ha1, Jun-Yan Zhu2
         1NAVER AI Lab, 2Carnegie Mellon University

Abstract

Existing text-to-image diffusion models struggle to synthesize realistic images given dense captions, where each text prompt provides a detailed description for a specific image region. To address this, we propose DenseDiffusion, a training-free method that adapts a pre-trained text-to-image model to handle such dense captions while offering control over the scene layout. We first analyze the relationship between generated images' layouts and the pre-trained model's intermediate attention maps. Next, we develop an attention modulation method that guides objects to appear in specific regions according to layout guidance. Without requiring additional fine-tuning or datasets, we improve image generation performance given dense captions regarding both automatic and human evaluation scores. In addition, we achieve similar-quality visual results with models specifically trained with layout conditions.

Method

Our goal is to improve the text-to-image model's ability to reflect textual and spatial conditions without fine-tuning. We formally define our condition as a set of $N$ segments ${\lbrace(c_{n},m_{n})\rbrace}^{N}_{n=1}$, where each segment $(c_n,m_n)$ describes a single region. Here $c_n$ is a non-overlapping part of the full-text caption $c$, and $m_n$ denotes a binary map representing each region. Given the input conditions, we modulate attention maps of all attention layers on the fly so that the object described by $c_n$ can be generated in the corresponding region $m_n$. To maintain the pre-trained model's generation capacity, we design the modulation to consider original value range and each segment's area.

Examples


How to launch a web interface

  • Put your access token to Hugging Face Hub here.

  • Run the Gradio app.

python gradio_app.py

Getting Started

  • Create the image layout.

  • Label each segment with a text prompt.

  • Adjust the full text. The default full text is automatically concatenated from each segment's text. The default one works well, but refineing the full text will further improve the result.

  • Check the generated images, and tune the hyperparameters if needed.
    wc : The degree of attention modulation at cross-attention layers.
    ws : The degree of attention modulation at self-attention layers.


Benchmark

We share the benchmark used in our model development and evaluation here. The code for preprocessing segment conditions is in here.


BibTeX

@inproceedings{densediffusion,
  title={Dense Text-to-Image Generation with Attention Modulation},
  author={Kim, Yunji and Lee, Jiyoung and Kim, Jin-Hwa and Ha, Jung-Woo and Zhu, Jun-Yan},
  year={2023},
  booktitle = {ICCV}
}

Acknowledgment

The demo was developed referencing this source code. Thanks for the inspiring work! πŸ™

More Repositories

1

StyleMapGAN

Official pytorch implementation of StyleMapGAN (CVPR 2021)
Python
458
star
2

Visual-Style-Prompting

Official Pytorch implementation of "Visual Style Prompting with Swapping Self-Attention"
Python
403
star
3

relabel_imagenet

Python
395
star
4

vidt

Python
305
star
5

pit

Python
240
star
6

korean-safety-benchmarks

Official datasets and pytorch implementation repository of SQuARe and KoSBi (ACL 2023)
Python
233
star
7

BlendNeRF

Official pytorch implementation of BlendNeRF (ICCV 2023)
Python
149
star
8

c3-gan

Official Pytorch implementation of C3-GAN (Spotlight at ICLR 2022)
Python
125
star
9

rope-vit

[ECCV 2024] Official PyTorch implementation of RoPE-ViT "Rotary Position Embedding for Vision Transformer"
Python
124
star
10

pcme

Official Pytorch implementation of "Probabilistic Cross-Modal Embedding" (CVPR 2021)
Python
121
star
11

GGDR

Official Pytorch implementation of GGDR (ECCV 2022)
Python
102
star
12

cl-vs-mim

(ICLR 2023) Official PyTorch implementation of "What Do Self-Supervised Vision Transformers Learn?"
Jupyter Notebook
97
star
13

calm

Python
91
star
14

PfLayer

Learning Features with Parameter-Free Layers, ICLR 2022
Python
85
star
15

rdnet

[ECCV2024] Official implementation of paper, "DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs".
Python
84
star
16

w-ood

Python
81
star
17

model-stock

Model Stock: All we need is just a few fine-tuned models
72
star
18

hypermix

Code for text augmentation method leveraging large-scale language models
Python
60
star
19

carecall-corpus

CareCall for Seniors: Role Specified Open-Domain Dialogue dataset generated by leveraging LLMs (NAACL 2022).
59
star
20

eccv-caption

Extended COCO Validation (ECCV) Caption dataset (ECCV 2022)
Python
52
star
21

i-Blurry

Official Pytorch implementation of Online Continual Learning on Class Incremental Blurry Task Configuration with Anytime Inference (ICLR 2022)
Python
51
star
22

seit

Python
50
star
23

FSMR

Official Tensorflow implementation of "Feature Statistics Mixing Regularization for Generative Adversarial Networks" (CVPR 2022)
Python
49
star
24

pcmepp

Official Pytorch implementation of "Improved Probabilistic Image-Text Representations" (ICLR 2024)
Python
48
star
25

egtr

[CVPR 2024 Best paper award candidate] EGTR: Extracting Graph from Transformer for Scene Graph Generation
Python
46
star
26

cmo

Python
45
star
27

facetts

Python
44
star
28

cream

Visually-Situated Natural Language Understanding with Contrastive Reading Model and Frozen Large Language Models, EMNLP 2023
Python
42
star
29

dap-cl

Official code of "Generating Instance-level Prompts for Rehearsal-free Continual Learning (ICCV 2023)"
Python
39
star
30

NeglectedFreeLunch

Jupyter Notebook
36
star
31

neuralwoz

NeuralWOZ: Learning to Collect Task-Oriented Dialogue via Model-based Simulation (ACL-IJCNLP 2021)
Python
36
star
32

dual-teacher

Official code for the NeurIPS 2023 paper "Switching Temporary Teachers for Semi-Supervised Semantic Segmentation"
Python
35
star
33

augsub

Official PyTorch implementation of MaskSub "Masking Augmentation for Supervised Learning"
Python
32
star
34

chacha-chatbot

Python
31
star
35

carecall-memory

Keep Me Updated! Memory Management in Long-term Conversations (Findings of EMNLP 2022)
28
star
36

mid.metric

Python
27
star
37

tablevqabench

Jupyter Notebook
26
star
38

MetricMT

The official code repository for MetricMT - a reward optimization method for NMT with learned metrics
25
star
39

scob

Official Implementation of SCOB [ICCV 2023]
Python
22
star
40

ALMoST

Python
22
star
41

coco-annotation-tool

TypeScript
21
star
42

hmix-gmix

Jupyter Notebook
21
star
43

imagenet-annotation-tool

TypeScript
17
star
44

informer

17
star
45

cs-shortcut

Saving Dense Retriever from Shortcut Dependency in Conversational Search (EMNLP 2022)
Python
16
star
46

talebrush

The official source code for TaleBrush (CHI 2022)
Python
14
star
47

cgl_fairness

Python
14
star
48

KoBBQ

Official code and dataset repository of KoBBQ (TACL 2024)
Python
14
star
49

trace

TRACE: Table Reconstruction Aligned to Corner and Edges (ICDAR 2023)
Python
12
star
50

simseek

Generating Information-Seeking Conversations from Unlabeled Documents (EMNLP 2022).
Python
11
star
51

tc-clip

[ECCV 2024] Official PyTorch implementation of TC-CLIP "Leveraging Temporal Contextualization for Video Action Recognition"
Python
10
star
52

burn

Official Pytorch Implementation of Unsupervised Representation Learning for Binary Networks by Joint Classifier Training (CVPR 2022)
Python
10
star
53

tokenadapt

Python
8
star
54

llm-chatbot

The LLM chatbot demo website
HTML
7
star
55

lut

[ECCV 2024] Official PyTorch implementation of LUT "Learning with Unmasked Tokens Drives Stronger Vision Learners"
5
star
56

elva

On Efficient Language and Vision Assistants for Visually-Situated Natural Language Understanding: What Matters in Reading and Reasoning
5
star
57

rewas

5
star
58

densediffusion

5
star
59

rite

Python
5
star
60

demystifying-ntk

Demystifying the Neural Tangent Kernel from a Practical Perspective: Can it be trusted for Neural Architecture Search without training? (CVPR 2022)
Python
2
star
61

carte

CARTE: Cell Adjacency Relation for Table Evaluation
Python
2
star
62

chacha

TypeScript
1
star