• Stars
    star
    5,177
  • Rank 8,008 (Top 0.2 %)
  • Language
    Jupyter Notebook
  • License
    Apache License 2.0
  • Created over 1 year ago
  • Updated 5 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.

IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models


Introduction

we present IP-Adapter, an effective and lightweight adapter to achieve image prompt capability for the pre-trained text-to-image diffusion models. An IP-Adapter with only 22M parameters can achieve comparable or even better performance to a fine-tuned image prompt model. IP-Adapter can be generalized not only to other custom models fine-tuned from the same base model, but also to controllable generation using existing controllable tools. Moreover, the image prompt can also work well with the text prompt to accomplish multimodal image generation.

arch

Release

  • [2023/8/30] πŸ”₯ Add an IP-Adapter with face image as prompt. The demo is here.
  • [2023/8/29] πŸ”₯ Release the training code.
  • [2023/8/23] πŸ”₯ Add code and models of IP-Adapter with fine-grained features. The demo is here.
  • [2023/8/18] πŸ”₯ Add code and models for SDXL 1.0. The demo is here.
  • [2023/8/16] πŸ”₯ We release the code and models.

Dependencies

  • diffusers >= 0.19.3

Download Models

you can download models from here. To run the demo, you should also download the following models:

How to Use

  • ip_adapter_demo: image variations, image-to-image, and inpainting with image prompt.
  • ip_adapter_demo

image variations

image-to-image

inpainting

structural_cond structural_cond2

multi_prompts

ip_adpter_plus_image_variations ip_adpter_plus_multi

ip_adpter_plus_face

Best Practice

  • If you only use the image prompt, you can set the scale=1.0 and text_prompt=""(or some generic text prompts, e.g. "best quality", you can also use any negative text prompt). If you lower the scale, more diverse images can be generated, but they may not be as consistent with the image prompt.
  • For multimodal prompts, you can adjust the scale to get the best results. In most cases, setting scale=0.5 can get good results. For the version of SD 1.5, we recommend using community models to generate good images.

How to Train

For training, you should install accelerate and make your own dataset into a json file.

accelerate launch --num_processes 8 --multi_gpu --mixed_precision "fp16" \
  tutorial_train.py \
  --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5/" \
  --image_encoder_path="{image_encoder_path}" \
  --data_json_file="{data.json}" \
  --data_root_path="{image_path}" \
  --mixed_precision="fp16" \
  --resolution=512 \
  --train_batch_size=8 \
  --dataloader_num_workers=4 \
  --learning_rate=1e-04 \
  --weight_decay=0.01 \
  --output_dir="{output_dir}" \
  --save_steps=10000

Disclaimer

This project strives to positively impact the domain of AI-driven image generation. Users are granted the freedom to create images using this tool, but they are expected to comply with local laws and utilize it in a responsible manner. The developers do not assume any responsibility for potential misuse by users.

Citation

If you find IP-Adapter useful for your research and applications, please cite using this BibTeX:

@article{ye2023ip-adapter,
  title={IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models},
  author={Ye, Hu and Zhang, Jun and Liu, Sibo and Han, Xiao and Yang, Wei},
  booktitle={arXiv preprint arxiv:2308.06721},
  year={2023}
}

More Repositories

1

V-Express

V-Express aims to generate a talking head video under the control of a reference image, an audio, and a sequence of V-Kps images.
Python
2,182
star
2

persona-hub

Official repo for the paper "Scaling Synthetic Data Creation with 1,000,000,000 Personas"
Python
768
star
3

hifi3dface

Code and data for our paper "High-Fidelity 3D Digital Human Creation from RGB-D Selfies".
Python
758
star
4

hok_env

Honor of Kings AI Open Environment of Tencent
Python
616
star
5

pika

a lightweight speech processing toolkit based on Pytorch and (Py)Kaldi
Python
338
star
6

grover

This is a Pytorch implementation of the paper: Self-Supervised Graph Transformer on Large-Scale Molecular Data
Python
325
star
7

bddm

BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis
Python
217
star
8

FRA-RIR

Python
169
star
9

PCDMs

Implementation code:Advancing Pose-Guided Image Synthesis with Progressive Conditional Diffusion Models
Jupyter Notebook
150
star
10

DrugOOD

OOD Dataset Curator and Benchmark for AI-aided Drug Discovery
Python
149
star
11

Frequency_Aug_VAE_MoESR

Latent-based SR using MoE and frequency augmented VAE decoder
Python
145
star
12

tleague_projpage

Jinja
135
star
13

3m-asr

3M: Multi-loss, Multi-path and Multi-level Neural Networks for speech recognition
Python
119
star
14

TLeague

Python
79
star
15

RLogist

RLogist = RL (reinforcement learning) + Pathologist
Python
65
star
16

CogKernel

Python
44
star
17

MDM

MDM
Python
43
star
18

UltraDualPathCompression

A Pytorch-based implementation of the compression and decompression module in "Ultra Dual-Path Compression For Joint Echo Cancellation And Noise Suppression".
Jupyter Notebook
36
star
19

Lodoss

Python
34
star
20

mini-hok

Mini HoK: a novel MARL benchmark based on the popular mobile game, Honor of Kings, to address limitations in existing environments such as complexity and accessibility.
Python
29
star
21

TriNet

TriNet: stabilizing self-supervised learning from complete or slow collapse on ASR.
Python
26
star
22

ICML21_OAXE

Python
25
star
23

season

[EMNLP 2022] Salience Allocation as Guidance for Abstractive Summarization
Python
22
star
24

hokoff

Python
21
star
25

Leopard

The repository for the paper titled "Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks"
18
star
26

hifi3dface_projpage

Project page for our paper "High-Fidelity 3D Digital Human Creation from RGB-D Selfies".
HTML
16
star
27

GrndPodcastSum

(ACL 2022) The source code for the paper "Towards Abstractive Grounded Summarization of Podcast Transcripts"
Python
15
star
28

OASum

13
star
29

EMNLP21_SemEq

This repo is the code release of EMNLP 2021 conference paper "Connect-the-Dots: Bridging Semantics between Words and Definitions via Aligning Word Sense Inventories".
Python
12
star
30

learning_singing_from_speech

Project page for our paper "DurIAN : DurIAN-SC: Duration Informed Attention Network based Singing Voice Conversion System".
10
star
31

valuationgame

Jupyter Notebook
9
star
32

Arena

Python
8
star
33

MetaLogic

Python
8
star
34

ZED

This is the repository for EMNLP 2022 paper "Efficient Zero-shot Event Extraction with Context-Definition Alignment"
Python
8
star
35

machine-translation

Open source on machine translation
7
star
36

TPolicies

Python
6
star
37

zebra-inference

Python
5
star
38

Interformer

Jupyter Notebook
5
star
39

FOLNet

This repository includes the code for First-Order Logic Network (FOLNet).
Python
4
star
40

TLeagueAutoBuild

Python
4
star
41

TImitate

Python
2
star
42

siam

2
star