• Stars
    star
    3,377
  • Rank 13,245 (Top 0.3 %)
  • Language
    Python
  • Created almost 2 years ago
  • Updated 5 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

T2I-Adapter

πŸ‘‰ T2I-Adapter for [SD-1.4/1.5], for [SDXL]

Huggingface T2I-Adapter-SDXL   Blog T2I-Adapter-SDXL   arXiv


Official implementation of T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models based on Stable Diffusion-XL.

The diffusers team and the T2I-Adapter authors have been collaborating to bring the support of T2I-Adapters for Stable Diffusion XL (SDXL) in diffusers! It achieves impressive results in both performance and efficiency.


image

🚩 New Features/Updates

  • βœ… Sep. 8, 2023. We collaborate with the diffusers team to bring the support of T2I-Adapters for Stable Diffusion XL (SDXL) in diffusers! It achieves impressive results in both performance and efficiency. We release T2I-Adapter-SDXL models for sketch, canny, lineart, openpose, depth-zoe, and depth-mid. We release two online demos: Huggingface T2I-Adapter-SDXL and Huggingface T2I-Adapter-SDXL Doodle.

  • βœ… Aug. 21, 2023. We release T2I-Adapter-SDXL, including sketch, canny, and keypoint. We still use the original recipe (77M parameters, a single inference) to drive StableDiffusion-XL. Due to the limited computing resources, those adapters still need further improvement. We are collaborating with HuggingFace, and a more powerful adapter is in the works.

  • βœ… Jul. 13, 2023. Stability AI release Stable Doodle, a groundbreaking sketch-to-image tool based on T2I-Adapter and SDXL. It makes drawing easier.

  • βœ… Mar. 16, 2023. We add CoAdapter (Composable Adapter). The online Huggingface Gadio has been updated Huggingface Gradio (CoAdapter). You can also try the local gradio demo.

  • βœ… Mar. 16, 2023. We have shrunk the git repo with bfg. If you encounter any issues when pulling or pushing, you can try re-cloning the repository. Sorry for the inconvenience.

  • βœ… Mar. 3, 2023. Add a color adapter (spatial palette), which has only 17M parameters.

  • βœ… Mar. 3, 2023. Add four new adapters style, color, openpose and canny. See more info in the Adapter Zoo.

  • βœ… Feb. 23, 2023. Add the depth adapter t2iadapter_depth_sd14v1.pth. See more info in the Adapter Zoo.

  • βœ… Feb. 15, 2023. Release T2I-Adapter.


πŸ”₯πŸ”₯πŸ”₯ Why T2I-Adapter-SDXL?

The Original Recipe Drives Larger SD.

SD-V1.4/1.5 SD-XL T2I-Adapter T2I-Adapter-SDXL
Parameters 860M 2.6B 77 M 77/79 M

Inherit High-quality Generation from SDXL.

  • Lineart-guided

Model from TencentARC/t2i-adapter-lineart-sdxl-1.0

  • Keypoint-guided

Model from openpose_sdxl_1.0

  • Sketch-guided

Model from TencentARC/t2i-adapter-sketch-sdxl-1.0

  • Depth-guided

Depth guided models from TencentARC/t2i-adapter-depth-midas-sdxl-1.0 and TencentARC/t2i-adapter-depth-zoe-sdxl-1.0 respectively

πŸ”§ Dependencies and Installation

pip install -r requirements.txt

⏬ Download Models

All models will be automatically downloaded to the checkpoints folder. You can also choose to download manually from this url.

πŸ”₯ How to Train

Here we take sketch guidance as an example, but of course, you can also prepare your own dataset following this method.

accelerate launch train_sketch.py --pretrained_model_name_or_path stabilityai/stable-diffusion-xl-base-1.0 --output_dir experiments/adapter_sketch_xl --config configs/train/Adapter-XL-sketch.yaml --mixed_precision="fp16" --resolution=1024 --learning_rate=1e-5 --max_train_steps=60000 --train_batch_size=1 --gradient_accumulation_steps=4 --report_to="wandb" --seed=42 --num_train_epochs 100

We train with FP16 data precision on 4 NVIDIA A100 GPUs.

πŸ’» How to Test

Inference requires at least 15GB of GPU memory.

Quick start with diffusers

To get started, first install the required dependencies:

pip install git+https://github.com/huggingface/diffusers.git@t2iadapterxl # for now
pip install -U controlnet_aux==0.0.7 # for conditioning models and detectors  
pip install transformers accelerate safetensors
  1. Images are first downloaded into the appropriate control image format.
  2. The control image and prompt are passed to the StableDiffusionXLAdapterPipeline.

Let's have a look at a simple example using the LineArt Adapter.

  • Dependency
from diffusers import StableDiffusionXLAdapterPipeline, T2IAdapter, EulerAncestralDiscreteScheduler, AutoencoderKL
from diffusers.utils import load_image, make_image_grid
from controlnet_aux.lineart import LineartDetector
import torch

# load adapter
adapter = T2IAdapter.from_pretrained(
  "TencentARC/t2i-adapter-lineart-sdxl-1.0", torch_dtype=torch.float16, varient="fp16"
).to("cuda")

# load euler_a scheduler
model_id = 'stabilityai/stable-diffusion-xl-base-1.0'
euler_a = EulerAncestralDiscreteScheduler.from_pretrained(model_id, subfolder="scheduler")
vae=AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)
pipe = StableDiffusionXLAdapterPipeline.from_pretrained(
    model_id, vae=vae, adapter=adapter, scheduler=euler_a, torch_dtype=torch.float16, variant="fp16", 
).to("cuda")
pipe.enable_xformers_memory_efficient_attention()

line_detector = LineartDetector.from_pretrained("lllyasviel/Annotators").to("cuda")
  • Condition Image
url = "https://huggingface.co/Adapter/t2iadapter/resolve/main/figs_SDXLV1.0/org_lin.jpg"
image = load_image(url)
image = line_detector(
    image, detect_resolution=384, image_resolution=1024
)

  • Generation
prompt = "Ice dragon roar, 4k photo"
negative_prompt = "anime, cartoon, graphic, text, painting, crayon, graphite, abstract, glitch, deformed, mutated, ugly, disfigured"
gen_images = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    image=image,
    num_inference_steps=30,
    adapter_conditioning_scale=0.8,
    guidance_scale=7.5, 
).images[0]
gen_images.save('out_lin.png')

Online Demo Huggingface T2I-Adapter-SDXL

Online Doodly Demo Huggingface T2I-Adapter-SDXL

Tutorials on HuggingFace:

...

Other Source

Jul. 13, 2023. Stability AI release Stable Doodle, a groundbreaking sketch-to-image tool based on T2I-Adapter and SDXL. It makes drawing easier.

doodle.mp4

πŸ€— Acknowledgements

  • Thanks to HuggingFace for their support of T2I-Adapter.
  • T2I-Adapter is co-hosted by Tencent ARC Lab and Peking University VILLA.

BibTeX

@article{mou2023t2i,
  title={T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models},
  author={Mou, Chong and Wang, Xintao and Xie, Liangbin and Wu, Yanze and Zhang, Jian and Qi, Zhongang and Shan, Ying and Qie, Xiaohu},
  journal={arXiv preprint arXiv:2302.08453},
  year={2023}
}

More Repositories

1

GFPGAN

GFPGAN aims at developing Practical Algorithms for Real-world Face Restoration.
Python
35,397
star
2

PhotoMaker

PhotoMaker [CVPR 2024]
Jupyter Notebook
9,198
star
3

InstantMesh

InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models
Python
2,928
star
4

BrushNet

[ECCV 2024] The official implementation of paper "BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion"
Python
1,298
star
5

MotionCtrl

Official Code for MotionCtrl [SIGGRAPH 2024]
Python
1,247
star
6

MasaCtrl

[ICCV 2023] Consistent Image Synthesis and Editing
Python
699
star
7

SEED-Story

SEED-Story: Multimodal Long Story Generation with Large Language Model
Python
657
star
8

LLaMA-Pro

[ACL 2024] Progressive LLaMA with Block Expansion.
Python
459
star
9

Mix-of-Show

NeurIPS 2023, Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models
Python
383
star
10

Open-MAGVIT2

Open-MAGVIT2: Democratizing Autoregressive Visual Generation
Python
376
star
11

AnimeSR

Codes for "AnimeSR: Learning Real-World Super-Resolution Models for Animation Videos"
Python
325
star
12

VQFR

ECCV 2022, Oral, VQFR: Blind Face Restoration with Vector-Quantized Dictionary and Parallel Decoder
Python
320
star
13

CustomNet

Python
258
star
14

SmartEdit

Official code of SmartEdit [CVPR-2024 Highlight]
Python
214
star
15

UMT

UMT is a unified and flexible framework which can handle different input modality combinations, and output video moment retrieval and/or highlight detection results.
Python
186
star
16

MM-RealSR

Codes for "Metric Learning based Interactive Modulation for Real-World Super-Resolution"
Python
152
star
17

ViT-Lens

[CVPR 2024] ViT-Lens: Towards Omni-modal Representations
Python
148
star
18

MCQ

Official code for "Bridging Video-text Retrieval with Multiple Choice Questions", CVPR 2022 (Oral).
Python
136
star
19

DeSRA

Official codes for DeSRA (ICML 2023)
Python
123
star
20

FAIG

NeurIPS 2021, Spotlight, Finding Discriminative Filters for Specific Degradations in Blind Super-Resolution
Python
118
star
21

ArcNerf

Nerf and extensions in all
Jupyter Notebook
106
star
22

ST-LLM

[ECCV 2024πŸ”₯] Official implementation of the paper "ST-LLM: Large Language Models Are Effective Temporal Learners"
Python
96
star
23

SurfelNeRF

SurfelNeRF: Neural Surfel Radiance Fields for Online Photorealistic Reconstruction of Indoor Scenes
76
star
24

RepSR

Codes for "RepSR: Training Efficient VGG-style Super-Resolution Networks with Structural Re-Parameterization and Batch Normalization"
74
star
25

mllm-npu

mllm-npu: training multimodal large language models on Ascend NPUs
Python
67
star
26

HOSNeRF

HOSNeRF: Dynamic Human-Object-Scene Neural Radiance Fields from a Single Video
Python
65
star
27

FastRealVSR

Codes for "Mitigating Artifacts in Real-World Video Super-Resolution Models"
59
star
28

ConMIM

Official codes for ConMIM (ICLR 2023)
Python
57
star
29

GVT

Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".
Python
54
star
30

TVTS

Turning to Video for Transcript Sorting
Jupyter Notebook
44
star
31

BEBR

Official code for "Binary embedding based retrieval at Tencent"
Python
42
star
32

ViSFT

Python
33
star
33

pi-Tuning

Official code for "pi-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation", ICML 2023.
Python
32
star
34

FLM

Accelerating Vision-Language Pretraining with Free Language Modeling (CVPR 2023)
Python
31
star
35

Efficient-VSR-Training

Codes for "Accelerating the Training of Video Super-Resolution"
30
star
36

DTN

Official code for "Dynamic Token Normalization Improves Vision Transformer", ICLR 2022.
Python
27
star
37

OpenCompatible

OpenCompatible provides a standard compatible training benchmark, covering practical training scenarios.
Python
24
star
38

BTS

BTS: A Bi-lingual Benchmark for Text Segmentation in the Wild
23
star
39

SGAT4PASS

This is the official implementation of the paper SGAT4PASS: Spherical Geometry-Aware Transformer for PAnoramic Semantic Segmentation (IJCAI 2023)
Python
23
star
40

SFDA

Python
20
star
41

TaCA

Official code for the paper, "TaCA: Upgrading Your Visual Foundation Model with Task-agnostic Compatible Adapter".
15
star
42

Plot2Code

Python
14
star
43

common_trainer

Common template for pytorch project. Easy to extent and modify for new project.
Python
12
star
44

TransFusion

The code repo for the ACM MM paper: TransFusion: Multi-Modal Fusion for Video Tag Inference viaTranslation-based Knowledge Embedding.
9
star
45

BasicVQ-GEN

7
star
46

ArcVis

Visualization of 3d and 2d components interactively.
Jupyter Notebook
6
star
47

VTLayout

3
star