• Stars
    star
    1,961
  • Rank 23,637 (Top 0.5 %)
  • Language
    Jupyter Notebook
  • Created about 1 year ago
  • Updated 8 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Let us democratise high-resolution generation! (CVPR 2024)

DemoFusion

Project Page arXiv Replicate Open In Colab Hugging Face Page Views Count

Code release for "DemoFusion: Democratising High-Resolution Image Generation With No 💰"

Abstract: High-resolution image generation with Generative Artificial Intelligence (GenAI) has immense potential but, due to the enormous capital investment required for training, it is increasingly centralised to a few large corporations, and hidden behind paywalls. This paper aims to democratise high-resolution GenAI by advancing the frontier of high-resolution generation while remaining accessible to a broad audience. We demonstrate that existing Latent Diffusion Models (LDMs) possess untapped potential for higher-resolution image generation. Our novel DemoFusion framework seamlessly extends open-source GenAI models, employing Progressive Upscaling, Skip Residual, and Dilated Sampling mechanisms to achieve higher-resolution image generation. The progressive nature of DemoFusion requires more passes, but the intermediate results can serve as "previews", facilitating rapid prompt iteration.

News

  • 2024.02.27: 🔥 DemoFusion has been accepted to CVPR'24!
  • 2023.12.15: 🚀 A ComfyUI Demofusion Custom Node is available! Thank Andre for the implementation!
  • 2023.12.12: ✨ DemoFusion with ControNet is availabe now! Check it out at pipeline_demofusion_sdxl_controlnet! The local Gradio Demo is also available.
  • 2023.12.10: ✨ Image2Image is supported by pipeline_demofusion_sdxl now! The local Gradio Demo is also available.
  • 2023.12.08: 🚀 A HuggingFace Demo for Img2Img is now available! Hugging Face Thank Radamés for the implementation and Hugging Face for the support!
  • 2023.12.07: 🚀 Add Colab demo Open In Colab. Check it out! Thank camenduru for the implementation!
  • 2023.12.06: ✨ The local Gradio Demo is now available! Better interaction and presentation!
  • 2023.12.04: ✨ A low-vram version of DemoFusion is available! Thank klimaleksus for the implementation!
  • 2023.12.01: 🚀 Integrated to Replicate. Check out the online demo: Replicate Thank Luis C. for the implementation!
  • 2023.11.29: 💰 pipeline_demofusion_sdxl is released.

Usage

A quick try with integrated demos

  • HuggingFace Space: Try Text2Image generation at Hugging Face and Image2Image enhancement at Hugging Face.
  • Colab: Try Text2Image generation at Open In Colab and Image2Image enhancement at Open In Colab.
  • Replicate: Try Text2Image generation at Replicate and Image2Image enhancement at Replicate.

Starting with our code

Hyper-parameters

  • view_batch_size (int, defaults to 16): The batch size for multiple denoising paths. Typically, a larger batch size can result in higher efficiency but comes with increased GPU memory requirements.
  • stride (int, defaults to 64): The stride of moving local patches. A smaller stride is better for alleviating seam issues, but it also introduces additional computational overhead and inference time.
  • cosine_scale_1 (float, defaults to 3): Control the decreasing rate of skip-residual. A smaller value results in better consistency with low-resolution results, but it may lead to more pronounced upsampling noise. Please refer to Appendix C in the DemoFusion paper.
  • cosine_scale_2 (float, defaults to 1): Control the decreasing rate of dilated sampling. A smaller value can better address the repetition issue, but it may lead to grainy images. For specific impacts, please refer to Appendix C in the DemoFusion paper.
  • cosine_scale_3 (float, defaults to 1): Control the decrease rate of the Gaussian filter. A smaller value results in less grainy images, but it may lead to over-smoothing images. Please refer to Appendix C in the DemoFusion paper.
  • sigma (float, defaults to 1): The standard value of the Gaussian filter. A larger sigma promotes the global guidance of dilated sampling, but it has the potential of over-smoothing.
  • multi_decoder (bool, defaults to True): Determine whether to use a tiled decoder. Generally, a tiled decoder becomes necessary when the resolution exceeds 3072*3072 on an RTX 3090 GPU.
  • show_image (bool, defaults to False): Determine whether to show intermediate results during generation.

Text2Image (will take about 17 GB of VRAM)

  • Set up the dependencies as:
conda create -n demofusion python=3.9
conda activate demofusion
pip install -r requirements.txt
  • Download pipeline_demofusion_sdxl.py and run it as follows. A use case can be found in demo.ipynb.
from pipeline_demofusion_sdxl import DemoFusionSDXLPipeline
import torch

model_ckpt = "stabilityai/stable-diffusion-xl-base-1.0"
pipe = DemoFusionSDXLPipeline.from_pretrained(model_ckpt, torch_dtype=torch.float16)
pipe = pipe.to("cuda")

prompt = "Envision a portrait of an elderly woman, her face a canvas of time, framed by a headscarf with muted tones of rust and cream. Her eyes, blue like faded denim. Her attire, simple yet dignified."
negative_prompt = "blurry, ugly, duplicate, poorly drawn, deformed, mosaic"

images = pipe(prompt, negative_prompt=negative_prompt,
              height=3072, width=3072, view_batch_size=16, stride=64,
              num_inference_steps=50, guidance_scale=7.5,
              cosine_scale_1=3, cosine_scale_2=1, cosine_scale_3=1, sigma=0.8,
              multi_decoder=True, show_image=True
             )

for i, image in enumerate(images):
    image.save('image_' + str(i) + '.png')
  • ⚠️ When you have enough VRAM (e.g., generating 2048*2048 images on hardware with more than 18GB RAM), you can set multi_decoder=False, which can make the decoding process faster.
  • Please feel free to try different prompts and resolutions.
  • Default hyper-parameters are recommended, but they may not be optimal for all cases. For specific impacts of each hyper-parameter, please refer to Appendix C in the DemoFusion paper.
  • The code was cleaned before the release. If you encounter any issues, please contact us.

Text2Image on Windows with 8 GB of VRAM

  • Set up the environment as:
cmd
git clone "https://github.com/PRIS-CV/DemoFusion"
cd DemoFusion
python -m venv venv
venv\Scripts\activate
pip install -U "xformers==0.0.22.post7+cu118" --index-url https://download.pytorch.org/whl/cu118
pip install "diffusers==0.21.4" "matplotlib==3.8.2" "transformers==4.35.2" "accelerate==0.25.0"
  • Launch DemoFusion as follows. The use case can be found in demo_lowvram.py.
python
from pipeline_demofusion_sdxl import DemoFusionSDXLPipeline

import torch
from diffusers.models import AutoencoderKL
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)

model_ckpt = "stabilityai/stable-diffusion-xl-base-1.0"
pipe = DemoFusionSDXLPipeline.from_pretrained(model_ckpt, torch_dtype=torch.float16, vae=vae)
pipe = pipe.to("cuda")

prompt = "Envision a portrait of an elderly woman, her face a canvas of time, framed by a headscarf with muted tones of rust and cream. Her eyes, blue like faded denim. Her attire, simple yet dignified."
negative_prompt = "blurry, ugly, duplicate, poorly drawn, deformed, mosaic"

images = pipe(prompt, negative_prompt=negative_prompt,
              height=2048, width=2048, view_batch_size=4, stride=64,
              num_inference_steps=40, guidance_scale=7.5,
              cosine_scale_1=3, cosine_scale_2=1, cosine_scale_3=1, sigma=0.8,
              multi_decoder=True, show_image=False, lowvram=True
             )

for i, image in enumerate(images):
    image.save('image_' + str(i) + '.png')

Text2Image with local Gradio demo

  • Make sure you have installed gradio and gradio_imageslider.
  • Launch DemoFusion via Gradio demo now -- try python gradio_demo.py! Better Interaction and Presentation!

Image2Image with local Gradio demo

  • Make sure you have installed gradio and gradio_imageslider.
  • Launch DemoFusion Image2Image by python gradio_demo_img2img.py.

- ⚠️ Please note that, as a tuning-free framework, DemoFusion's Image2Image capability is strongly correlated with the SDXL's training data distribution and will show a significant bias. An accurate prompt to describe the content and style of the input also significantly improves performance. Have fun and regard it as a side application of text+image based generation.

DemoFusion+ControlNet with local Gradio demo

  • Make sure you have installed gradio and gradio_imageslider.
  • Launch DemoFusion+ControNet Text2Image by python gradio_demo.py.
  • Launch DemoFusion+ControNet Image2Image by python gradio_demo_img2img.py.

Citation

If you find this paper useful in your research, please consider citing:

@inproceedings{du2024demofusion,
  title={DemoFusion: Democratising High-Resolution Image Generation With No \$\$\$},
  author={Du, Ruoyi and Chang, Dongliang and Hospedales, Timothy and Song, Yi-Zhe and Ma, Zhanyu},
  booktitle={CVPR},
  year={2024}
}

More Repositories

1

Mutual-Channel-Loss

Code release for The Devil is in the Channels: Mutual-Channel Loss for Fine-Grained Image Classification (TIP 2020)
Python
255
star
2

PMG-Progressive-Multi-Granularity-Training

Code release for Fine-Grained Visual Classification via Progressive Multi-Granularity Training of Jigsaw Patches (ECCV2020)
Python
216
star
3

Fine-Grained-or-Not

Code release for Your “Flamingo” is My “Bird”: Fine-Grained, or Not (CVPR 2021 Oral)
Python
57
star
4

On-the-fly-Category-Discovery

Code release for Your “On-the-fly Category Discovery (CVPR 2023)”
Python
49
star
5

BSNet

Code release for the paper BSNet: Bi-Similarity Network for Few-shot Fine-grained Image Classification. (TIP2020)
Python
49
star
6

Bi-FRN

Code release for Bi-Directional Feature Reconstruction Network for Fine-grained Few-shot Image Classification
Python
48
star
7

AP-CNN_Pytorch-master

Weakly Supervised Attention Pyramid Convolutional Neural Network for Fine-Grained Visual Classification (TIP2021)
Python
47
star
8

OSLNet

Code release for OSLNet: Deep Small-Sample Classification with an Orthogonal Softmax Layer (TIP2020)
Python
44
star
9

DCRNet

The repository contains the PyTorch implementation of "Duplex Contextual Relations for PolypSegmentation"
Python
34
star
10

seal

Semantic Enhanced Attribute Learning
Jupyter Notebook
34
star
11

WFEN

[ACMMM 2024] "Efficient Face Super-Resolution via Wavelet-based Feature Enhancement Network"
Python
32
star
12

Fine-grained-Visual-Analysis-Library

FGVCLib is an open-source and well documented library for Fine-grained Visual Classification.
Python
31
star
13

PCA-Net

Progressive Co-Attention Network for Fine-Grained Visual Classification
Python
31
star
14

DSACA

Code release for Dilated-Scale-Aware Category-Attention ConvNet for Multi-Class Object Counting
Python
19
star
15

Mutual-to-Separate

Code release for Mind the Gap: Enlarging the Domain Gap in Open Set Domain Adaptation (TCSVT 2023)
Python
18
star
16

AdvancedDropout

Advanced Dropout: A Model-free Methodology for Bayesian Dropout Optimization (IEEE TPAMI 2021)
Python
15
star
17

PMG-V2

Code release for "Progressive Learning of Category-Consistent Multi-Granularity Features for Fine-Grained Visual Classification"
Python
15
star
18

An-Erudite-FGVC-Model

Code release for Your “An Erudite Fine-Grained Visual Classification Model (CVPR 2023)"
Python
14
star
19

Making-a-Bird-AI-Expert-Work-for-You-and-Me

Code release for "Making a Bird AI Expert Work for You and Me (TPAMI 2023)".
Python
14
star
20

NeRSP

Python
13
star
21

Top-Down-Spatial-Attention-Loss

Fine-Grained Visual Classification via Simultaneously Learning of Multi-regional Multi-grained Features
Python
12
star
22

CAM-Guided-Attention

Code release for Grad-CAM Guided Attention Module for Fine-grained Visual Classification (MLSP 2022)
Python
12
star
23

MSSRM

An implementation of MSSRM method
Python
11
star
24

RelMatch

Code release for "Clue Me In: Semi-Supervised FGVC with Out-of-Distribution Data".
Python
11
star
25

IVR

Python
9
star
26

FFDI

Python
9
star
27

knowledge-transfer-based-FGVC

knowledge transfer based fine-grained visual classification
Python
9
star
28

Pascal-EA

Python
9
star
29

Category-Specific-Prompt

Code release for "Category-Specific Prompts for Animal Action Recognition with Pretrained Vision-Language Models"
Python
9
star
30

BiEN

Code release for Bi-Directional Ensemble Network for Few-Shot Fine-Grained Classification.
Python
8
star
31

DropChannelBlock_Pytorch_master

Python
8
star
32

Semantic-Memory-Guided-Image-Representation-for-Polyp-Segmentation

Semantic Memory Guided Image Representation for Polyp Segmentation (ICASSP-2023)
Python
8
star
33

RTMem

Python
7
star
34

Sketch-SF

Python
6
star
35

CMF-Refseg

Code of our ICIP 2021 paper CMF: Cascaded Multi-model Fusion for Referring Image Segmentation
6
star
36

TA2-Net

Python
5
star
37

MAFR

Code release for "Multi-View Active Fine-Grained Visual Recognition" (ICCV 2023)
Python
5
star
38

SSKD

Code release for SSKD: Self-Supervised Knowledge Distillation for Cross Domain Adaptive Person Re-Identification
Python
4
star
39

ReMarNet

Code release for the paper ReMarNet: Conjoint Relation and Margin Learning for Small-Sample Image Classification (TCSVT 2020)
Python
4
star
40

HumanRecon

Python
4
star
41

DL-CV-ITS

Code release for Deep Learning-based Computer Vision for Surveillance in ITS: Evaluation of State-of-the-art Methods (IEEE TVT 2021)
Python
4
star
42

Sketch-CS

Python
4
star
43

Pair-wise-Similarity-module

Python
3
star
44

Attribute-Comprehension-of-VLMs

We provide a benchmark for evaluating the attribute understanding capabilities of large vision-language models.
Python
3
star
45

CN-CNN

Cross-layer Navigation Convolutional Neural Network for Fine-grained Visual Classification
Python
3
star
46

CGVC

Cross-Layer Feature based Multi-Granularity Visual Classification
Python
3
star
47

InterBoost

Code release for Deep InterBoost Networks for Small-sample Image Classification (NEUROCOMPUTING 2020)
Python
3
star
48

GPCA

GPCA: A Probabilistic Framework for Gaussian Process Embedded Channel Attention (IEEE TPAMI 2021)
3
star
49

Mixture-of-Hand-Experts

Python
3
star
50

AFGR

2
star
51

DS-UI

DS-UI: Dual-Supervised Mixture of Gaussian Mixture Models for Uncertainty Inference in Image Recognition (IEEE TIP 2021)
Python
2
star
52

Caption-Feature-Space-Regularization

This is the code for "Caption Feature Space Regularization for Audio Captioning"
Python
2
star
53

EGNN_TLRM

Code release for the paper: "TLRM: Task-level Relation Module for GNN-based Few-Shot Learning" (IEEE VCIP 2021)
Python
2
star
54

ENDE_For_SSS

Python
2
star
55

class-level-sampling

Python
1
star
56

Structured-DropConnect

Structured DropConnect for Uncertainty Inference in Image Classification
Python
1
star
57

Adaptive-Multi-Resolution-Feature-Fusion

Python
1
star
58

Fine-Grained-Age-Estimation-in-the-Wild-With-Attention-LSTM-Networks

Lua
1
star
59

DFA

Dual-granularity Feature Alignment for Cross-modality Person Re-identification
Python
1
star
60

Reserve_to_Adapt

Python
1
star
61

CRL-code

Code release for the paper “Competing Ratio Loss for Discriminative Multi-class Image Classification” (IEEE Neurocomputing 2021).
Python
1
star