• Stars
    star
    114
  • Rank 306,261 (Top 7 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created almost 2 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

🤗 Unofficial huggingface/diffusers-based implementation of the paper "Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis".

Training-free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis

ICLR 2023 OpenReview CI Python

Unofficial 🤗 huggingface/diffusers-based implementation of the paper Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis. We refer to the author's original implementation as supplemented in the OpenReview. There is no direct relationship between this implementation and the author.

TL;DR

The author proposes a training-free approach to incorporate language structured for compositional text-to-image synthesis

Figure 1: Three challenging phenomena in the compositional generation.Attribute leakage:The attribute of one object is (partially) observable in another object. Interchanged attributes: theattributes of two or more objects are interchanged. Missing objects: one or more objects are missing.With slight abuse of attribute binding definitions, we aim to address all three problems in this work.

Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis

  • Anonymous authors
  • ICLR 2023 under review

Large-scale diffusion models have demonstrated remarkable performance on text-to-image synthesis (T2I). Despite their ability to generate high-quality and creative images, users still observe images that do not align well with the text input, especially when involving multiple objects. In this work, we strive to improve the compositional skills of existing large-scale T2I models, specifically more accurate attribute binding and better image compositions. We propose to incorporate language structures with the cross-attention layers based on a recently discovered property of diffusion-based T2I models. Our method is implemented on a state-of-the-art model, Stable Diffusion, and achieves better compositional skills in both qualitative and quantitative results. Our structured cross-attention design is also efficient that requires no additional training samples. Lastly, we conduct an in-depth analysis to reveal potential causes of incorrect image compositions and justify the properties of cross-attention layers in the generation process.

Installation

pip install git+https://github.com/shunk031/training-free-structured-diffusion-guidance

How to use Training-Free Structured Diffusion Guidance (TFSDG)

from tfsdg.pipelines import TFSDGPipeline

pipe = TFSDGPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4", use_auth_token=True
)
pipe = pipe.to("cuda")

prompt = "A red car and a white sheep"
image = pipe(prompt, struct_attention="align_seq").images[0]
image.save('a_red_car_and_a_white_sheep.png')

Citation

@misc{kitada-2022-tfsdg,
  author = {Shunsuke Kitada},
  title = {Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis},
  year = {2022},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/shunk031/training-free-structured-diffusion-guidance}}
}

More Repositories

1

paper-survey

📚 Survey of previous research and related works on machine learning (especially Deep Learning) in Japanese
HTML
150
star
2

awesome-ai-best-papers

A curated list of famous CV/NLP/ML/AI best papers, inspired by Best Paper Awards in Computer Science (since 1996).
55
star
3

simple-aesthetics-predictor

CLIP-based aesthetics predictor inspired by the interface of 🤗 huggingface transformers.
Python
25
star
4

dotfiles

💻 My dotfiles powered by chezmoi / spacemacs, zsh, prezto, tmux
Shell
23
star
5

nvhtop

A tool for enriching the output of nvidia-smi forked from peci1/nvidia-htop.
Python
22
star
6

Multi-task-Conditional-Attention-Networks

A prototype version of our submitted paper: Conversion Prediction Using Multi-task Conditional Attention Networks to Support the Creation of Effective Ad Creatives.
Python
21
star
7

chainer-MeanTeachers

Chainer implementation of the paper "Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results" (https://arxiv.org/abs/1703.01780)
Python
19
star
8

libtorch-gin-api-server

High-speed Deep learning API Server with Libtorch (C++) and Gin (Golang)
Go
17
star
9

human-attention-map-for-text-classification

Reimplementation of the paper `Human Attention Maps for Text Classification: Do Humans and Neural Networks Focus on the Same Words? (ACL2020)`
Python
16
star
10

TedScraper

💬 Scraper for TED Talks in Python. Get talk title, transcript, talk topics and so on.
Python
15
star
11

chainer-skin-lesion-detector

Skin Lesion Detector using HAM10000 dataset with Chainer / ChainerCV
Python
12
star
12

allennlp-shiba-model

AllenNLP integration for Shiba: Japanese CANINE model
Python
12
star
13

LSUV.pytorch

Implementation of LSUV (Layer-sequential unit-variance) in PyTorch
Python
9
star
14

huggingface-datasets_JGLUE

JGLUE: Japanese General Language Understanding Evaluation for huggingface datasets
Python
9
star
15

numpy-100-ja

Forked from rougier/numpy-100 numpy-100日本語バージョン
Jupyter Notebook
7
star
16

chainer-IMSAT

Implementation of IMSAT in Chainer
Python
7
star
17

attention-meets-perturbation

📝 Official Implementation of "Attention Meets Perturbation: Robust and Interpretable Attention with Adversarial Training"
Python
6
star
18

yans2019-hackathon

Python
5
star
19

chainer-FocalLoss

Chainer implementation of the paper "Focal Loss for Dense Object Detection" (https://arxiv.org/abs/1708.02002)
Python
5
star
20

chainer-Xception

Chainer implementation of the paper "Xception: Deep Learning with Depthwise Separable Convolutions" (https://arxiv.org/abs/1610.02357).
Python
5
star
21

allennlp-NER-ja

AllenNLP-NER-ja: AllenNLP による日本語を対象とした固有表現抽出
Python
5
star
22

chainer-RICAP

Chainer implementation of the paper "RICAP: Random Image Cropping and Patching Data Augmentation for Deep CNNs" (http://proceedings.mlr.press/v95/takahashi18a.html)
Python
5
star
23

cookiecutter-huggingface-datasets

cookiecutter for huggingface datasets
Python
5
star
24

chainer-center-loss

Implementation of Center Loss in Chainer
Python
4
star
25

coloso-python-diffusion-models

Coloso オンデマンド講座 "画像生成 AI 入門:Python による拡散モデルの理論と実践" で使用する講義資料置き場
Jupyter Notebook
4
star
26

huggingface-datasets_COCOA

COCOA: Semantic Amodal Segmentation for huggingface datasets
Python
4
star
27

huggingface-datasets_wrime

WRIME for huggingface datasets
Python
4
star
28

mecab-neologd-py3

🐳 My dockernized mecab-ipadic-neologd with mecab-python3 🐍
Dockerfile
4
star
29

nvinfo-go

Rewrite of ikr7/nvinfo, a simple utility for monitoring your CUDA-enabled GPUs, with Golang
Go
4
star
30

ANLS

ANLS: Average Normalized Levenshtein Similarity
Python
3
star
31

feature-extractor-for-landing-page

Feature extractor for landing page
HTML
3
star
32

huggingface-datasets_MSCOCO

Microsoft COCO: Common Objects in Context for huggingface datasets
Python
3
star
33

GWork

Classify gunosy news articles by Naive Bayes classifier and predict article category at django server
Python
3
star
34

huggingface-datasets_JDocQA

Japanese Document Question Answering (JDocQA), a large-scale document-based QA dataset for huggingface datasets
Python
3
star
35

latex-word-count

Docker container with TeXCount
Dockerfile
2
star
36

chainer-PyramidNet

Chainer implementation of the paper "Deep Pyramidal Residual Networks" (https://arxiv.org/abs/1610.02915).
Python
2
star
37

abci-llm-distributed-training-hackathon-01

第1回大規模言語モデル分散学習ハッカソンで使用したソースコードです
Shell
2
star
38

graph-powered-machine-learning

Jupyter Notebook
2
star
39

huggingface-datasets_jsnli

JSNLI (Japanese SNLI) dataset for huggingface datasets
Python
2
star
40

shunk031.github.io

👨🏻‍💻 Personal page https://shunk031.me/
Shell
2
star
41

LINEBlogScraper

Scraper for LINE Blog in Scrapy
Python
2
star
42

autocvd-go

A golang cli tool for setting `CUDA_VISIBLE_DEVICES` based on GPU utilization.
Go
2
star
43

huggingface-datasets_CAMERA

CAMERA (CyberAgent Multimodal Evaluation for Ad Text GeneRAtion) for huggingface datasets
Python
2
star
44

rakuten2furusatotax

`楽天ふるさと納税` の情報を `ふるさとチョイス` へ登録する君
Python
2
star
45

huggingface-datasets_livedoor-news-corpus

Japanese Livedoor news corpus for huggingface datasets
Python
2
star
46

chainer-InceptionResNetV2

Chainer implementation of the paper "Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning" (https://arxiv.org/abs/1602.07261)
Python
2
star
47

allennlp-dataframe-mapper

AllenNLP integration for sklearn-pandas
Python
1
star
48

SEO-Word2Vec-LSTM

Python
1
star
49

tango-extensions

Extension modules for https://github.com/allenai/tango
Python
1
star
50

huggingface-datasets_DrawBench

DrawBench for huggingface datasets
Python
1
star
51

huggingface-datasets_PosterErase

PosterErase for huggingface datasets
Python
1
star
52

tango-textual-inversion

allenai/tango version of textual inversion
Python
1
star
53

NLP100-Notebook

言語処理100本ノック 2015を解いてみた
Jupyter Notebook
1
star
54

nicole-or-kiko

JavaScript
1
star
55

huggingface-datasets_Rico

Rico: A Mobile App Dataset for Building Data-Driven Design Applications for huggingface datasets
Python
1
star
56

tango-jglue-benchmarks

Reproducible implementation using ai2-tango for JGLUE, Japanese benchmark
Jsonnet
1
star
57

allennlp-eraser

[WIP] Collection of AllenNLP DatasetReaders for ERASER
Python
1
star