• Stars
    star
    7,753
  • Rank 4,651 (Top 0.1 %)
  • Language
    Jupyter Notebook
  • License
    Other
  • Created 5 months ago
  • Updated 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

PhotoMaker

PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding Paper page

[Paper] โ€ƒ [Project Page] โ€ƒ [Model Card]

[๐Ÿค— Demo (Realistic)] โ€ƒ [๐Ÿค— Demo (Stylization)]

[Replicate Demo (Realistic)] โ€ƒ [Replicate Demo (Stylization)]

If the ID fidelity is not enough for you, please try our stylization application, you may be pleasantly surprised.


Official implementation of PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding.

๐ŸŒ  Key Features:

  1. Rapid customization within seconds, with no additional LoRA training.
  2. Ensures impressive ID fidelity, offering diversity, promising text controllability, and high-quality generation.
  3. Can serve as an Adapter to collaborate with other Base Models alongside LoRA modules in community.

TencentARC%2FPhotoMaker | Trendshift

โ—โ— Note: If there are any PhotoMaker based resources and applications, please leave them in the discussion and we will list them in the Related Resources section in README file. Now we know the implementation of Replicate, Windows, ComfyUI, and WebUI. Thank you all!

photomaker_demo_fast

๐Ÿšฉ New Features/Updates

  • โœ… Jan. 20, 2024. An important note: For those GPUs that do not support bfloat16, please change this line to torch_dtype = torch.float16, the speed will be greatly improved (1min/img (before) vs. 14s/img (after) on V100). The minimum GPU memory requirement for PhotoMaker is 11G (Please refer to this link for saving GPU memory).
  • โœ… Jan. 15, 2024. We release PhotoMaker.

๐Ÿ”ฅ Examples

Realistic generation

Stylization generation

Note: only change the base model and add the LoRA modules for better stylization

๐Ÿ”ง Dependencies and Installation

conda create --name photomaker python=3.10
conda activate photomaker
pip install -U pip

# Install requirements
pip install -r requirements.txt

# Install photomaker
pip install git+https://github.com/TencentARC/PhotoMaker.git

Then you can run the following command to use it

from photomaker import PhotoMakerStableDiffusionXLPipeline

โฌ Download Models

The model will be automatically downloaded through the following two lines:

from huggingface_hub import hf_hub_download
photomaker_path = hf_hub_download(repo_id="TencentARC/PhotoMaker", filename="photomaker-v1.bin", repo_type="model")

You can also choose to download manually from this url.

๐Ÿ’ป How to Test

Use like diffusers

  • Dependency
import torch
import os
from diffusers.utils import load_image
from diffusers import EulerDiscreteScheduler
from photomaker import PhotoMakerStableDiffusionXLPipeline

### Load base model
pipe = PhotoMakerStableDiffusionXLPipeline.from_pretrained(
    base_model_path,  # can change to any base model based on SDXL
    torch_dtype=torch.bfloat16, 
    use_safetensors=True, 
    variant="fp16"
).to(device)

### Load PhotoMaker checkpoint
pipe.load_photomaker_adapter(
    os.path.dirname(photomaker_path),
    subfolder="",
    weight_name=os.path.basename(photomaker_path),
    trigger_word="img"  # define the trigger word
)     

pipe.scheduler = EulerDiscreteScheduler.from_config(pipe.scheduler.config)

### Also can cooperate with other LoRA modules
# pipe.load_lora_weights(os.path.dirname(lora_path), weight_name=lora_model_name, adapter_name="xl_more_art-full")
# pipe.set_adapters(["photomaker", "xl_more_art-full"], adapter_weights=[1.0, 0.5])

pipe.fuse_lora()
  • Input ID Images
### define the input ID images
input_folder_name = './examples/newton_man'
image_basename_list = os.listdir(input_folder_name)
image_path_list = sorted([os.path.join(input_folder_name, basename) for basename in image_basename_list])

input_id_images = []
for image_path in image_path_list:
    input_id_images.append(load_image(image_path))

  • Generation
# Note that the trigger word `img` must follow the class word for personalization
prompt = "a half-body portrait of a man img wearing the sunglasses in Iron man suit, best quality"
negative_prompt = "(asymmetry, worst quality, low quality, illustration, 3d, 2d, painting, cartoons, sketch), open mouth, grayscale"
generator = torch.Generator(device=device).manual_seed(42)
images = pipe(
    prompt=prompt,
    input_id_images=input_id_images,
    negative_prompt=negative_prompt,
    num_images_per_prompt=1,
    num_inference_steps=num_steps,
    start_merge_step=10,
    generator=generator,
).images[0]
gen_images.save('out_photomaker.png')

Start a local gradio demo

Run the following command:

python gradio_demo/app.py

You could customize this script in this file.

If you want to run it on MAC, you should follow this Instruction and then run the app.py.

Usage Tips:

  • Upload more photos of the person to be customized to improve ID fidelity. If the input is Asian face(s), maybe consider adding 'Asian' before the class word, e.g., Asian woman img
  • When stylizing, does the generated face look too realistic? Adjust the Style strength to 30-50, the larger the number, the less ID fidelity, but the stylization ability will be better. You could also try out other base models or LoRAs with good stylization effects.
  • Reduce the number of generated images and sampling steps for faster speed. However, please keep in mind that reducing the sampling steps may compromise the ID fidelity.

Related Resources

Replicate demo of PhotoMaker:

  1. Demo link, run PhotoMaker on replicate, provided by @yorickvP and @jd7h.
  2. Demo link (style version).

WebUI version of PhotoMaker:

  1. stable-diffusion-webui-forge: https://github.com/lllyasviel/stable-diffusion-webui-forge provided by @Lvmin Zhang
  2. Fooocus App: Fooocus-inswapper provided by @machineminded

Windows version of PhotoMaker:

  1. bmaltais/PhotoMaker by @bmaltais, easy to deploy PhotoMaker on Windows. The description can be found in this link.
  2. sdbds/PhotoMaker-for-windows by @sdbds.

ComfyUI:

  1. ๐Ÿ”ฅ Official Implementation by ComfyUI: https://github.com/comfyanonymous/ComfyUI/commit/d1533d9c0f1dde192f738ef1b745b15f49f41e02
  2. https://github.com/ZHO-ZHO-ZHO/ComfyUI-PhotoMaker
  3. https://github.com/StartHua/Comfyui-Mine-PhotoMaker
  4. https://github.com/shiimizu/ComfyUI-PhotoMaker

Purely C/C++/CUDA version of PhotoMaker:

  1. stable-diffusion.cpp by @bssrdf.

Other Applications / Web Demos

  1. Wisemodel ๅง‹ๆ™บ (Easy to use in China) https://wisemodel.cn/space/gradio/photomaker
  2. OpenXLab (Easy to use in China): https://openxlab.org.cn/apps/detail/camenduru/PhotoMaker Open in OpenXLab by @camenduru.
  3. Colab: https://github.com/camenduru/PhotoMaker-colab by @camenduru
  4. Monster API: https://monsterapi.ai/playground?model=photo-maker
  5. Pinokio: https://pinokio.computer/item?uri=https://github.com/cocktailpeanutlabs/photomaker

Graido demo in 45 lines

Provided by @Gradio

๐Ÿค— Acknowledgements

  • PhotoMaker is co-hosted by Tencent ARC Lab and Nankai University MCG-NKU.
  • Inspired from many excellent demos and repos, including IP-Adapter, multimodalart/Ip-Adapter-FaceID, FastComposer, and T2I-Adapter. Thanks for their great work!
  • Thanks to the Venus team in Tencent PCG for their feedback and suggestions.
  • Thanks to the HuggingFace team for their generous support!

Disclaimer

This project strives to impact the domain of AI-driven image generation positively. Users are granted the freedom to create images using this tool, but they are expected to comply with local laws and utilize it responsibly. The developers do not assume any responsibility for potential misuse by users.

BibTeX

If you find PhotoMaker useful for your research and applications, please cite using this BibTeX:

@inproceedings{li2023photomaker,
  title={PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding},
  author={Li, Zhen and Cao, Mingdeng and Wang, Xintao and Qi, Zhongang and Cheng, Ming-Ming and Shan, Ying},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2024}
}

More Repositories

1

GFPGAN

GFPGAN aims at developing Practical Algorithms for Real-world Face Restoration.
Python
34,212
star
2

T2I-Adapter

T2I-Adapter
Python
2,985
star
3

MotionCtrl

MotionCtrl: A Unified and Flexible Motion Controller for Video Generation
Python
981
star
4

MasaCtrl

[ICCV 2023] Consistent Image Synthesis and Editing
Python
636
star
5

Mix-of-Show

NeurIPS 2023, Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models
Python
359
star
6

LLaMA-Pro

Progressive LLaMA with Block Expansion.
Python
335
star
7

AnimeSR

Codes for "AnimeSR: Learning Real-World Super-Resolution Models for Animation Videos"
Python
303
star
8

VQFR

ECCV 2022, Oral, VQFR: Blind Face Restoration with Vector-Quantized Dictionary and Parallel Decoder
Python
301
star
9

UMT

UMT is a unified and flexible framework which can handle different input modality combinations, and output video moment retrieval and/or highlight detection results.
Python
169
star
10

CustomNet

143
star
11

MM-RealSR

Codes for "Metric Learning based Interactive Modulation for Real-World Super-Resolution"
Python
139
star
12

MCQ

Official code for "Bridging Video-text Retrieval with Multiple Choice Questions", CVPR 2022 (Oral).
Python
126
star
13

FAIG

NeurIPS 2021, Spotlight, Finding Discriminative Filters for Specific Degradations in Blind Super-Resolution
Python
115
star
14

ViT-Lens

[CVPR 2024] ViT-Lens: Towards Omni-modal Representations
Python
111
star
15

DeSRA

Official codes for DeSRA (ICML 2023)
Python
109
star
16

ArcNerf

Nerf and extensions in all
Jupyter Notebook
105
star
17

RepSR

Codes for "RepSR: Training Efficient VGG-style Super-Resolution Networks with Structural Re-Parameterization and Batch Normalization"
73
star
18

SmartEdit

Official code of SmartEdit [CVPR-2024]
HTML
73
star
19

SurfelNeRF

SurfelNeRF: Neural Surfel Radiance Fields for Online Photorealistic Reconstruction of Indoor Scenes
72
star
20

FastRealVSR

Codes for "Mitigating Artifacts in Real-World Video Super-Resolution Models"
58
star
21

HOSNeRF

HOSNeRF: Dynamic Human-Object-Scene Neural Radiance Fields from a Single Video
Python
58
star
22

ConMIM

Official codes for ConMIM (ICLR 2023)
Python
52
star
23

GVT

Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".
Python
47
star
24

TVTS

Turning to Video for Transcript Sorting
Jupyter Notebook
43
star
25

BEBR

Official code for "Binary embedding based retrieval at Tencent"
Python
40
star
26

FLM

Accelerating Vision-Language Pretraining with Free Language Modeling (CVPR 2023)
Python
30
star
27

Efficient-VSR-Training

Codes for "Accelerating the Training of Video Super-Resolution"
30
star
28

pi-Tuning

Official code for "pi-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation", ICML 2023.
Python
29
star
29

ViSFT

Python
27
star
30

DTN

Official code for "Dynamic Token Normalization Improves Vision Transformer", ICLR 2022.
Python
27
star
31

OpenCompatible

OpenCompatible provides a standard compatible training benchmark, covering practical training scenarios.
Python
23
star
32

BTS

BTS: A Bi-lingual Benchmark for Text Segmentation in the Wild
20
star
33

SFDA

Python
18
star
34

SGAT4PASS

This is the official implementation of the paper SGAT4PASS: Spherical Geometry-Aware Transformer for PAnoramic Semantic Segmentation (IJCAI 2023)
Python
17
star
35

TaCA

Official code for the paper, "TaCA: Upgrading Your Visual Foundation Model with Task-agnostic Compatible Adapter".
14
star
36

common_trainer

Common template for pytorch project. Easy to extent and modify for new project.
Python
12
star
37

TransFusion

The code repo for the ACM MM paper: TransFusion: Multi-Modal Fusion for Video Tag Inference viaTranslation-based Knowledge Embedding.
9
star
38

BasicVQ-GEN

7
star
39

InstantMesh

InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models
7
star
40

ArcVis

Visualization of 3d and 2d components interactively.
Jupyter Notebook
6
star
41

VTLayout

3
star