• Stars
    star
    307
  • Rank 136,109 (Top 3 %)
  • Language
    Jupyter Notebook
  • License
    MIT License
  • Created over 1 year ago
  • Updated 9 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Official implementation of the NeurIPS 2023 paper "Photoswap: Personalized Subject Swapping in Images"

PHOTOSWAP

Jing Gu, Yilin Wang, Nanxuan Zhao, Tsu-Jui Fu, Wei Xiong, Qing Liu, Zhifei Zhang, He Zhang, Jianming Zhang, HyunJoon Jung, Xin Eric Wang

[Project Page] [Paper] Teaser figure

Model Architecture

Teaser figure

TODO

  • Release Benchmark
  • Release code

🔥 News

  • [2023.07.08] We have released our code.

Concept Learning

Training a model with your own concept

The new subject will be learned as a new token in the diffusion model. Huggingface provides scripts for training. In detail, you could use Text Inversion, DreamBooth, Custom Diffusion, or any other concept learning model. Be sure to install the package in the corresponding requirements.txt

  • More source images leads to a better learnt concept and therefore a better subject swap result. For example, more human face images during training leads to a better artistic figure transfer.
  • For DreamBooth, finetuning the encoder leads to a better performance, especially for human face. That would also requires more memory.

Download PHOTPSWAP models

We provide a few checkpoints that already contain the new concept. All models here are based on StableDiffusion-2.

Type Concept Download
Human Taylor Swift Google Drive
Human Justin Bieber Google Drive

Attention Swap

Following the below steps to do subject swapping:

  1. Put the trained Diffusion Model checkpoint in checkpoints folder.
  2. Install package using requirements.txt by pip install -r requirements.txt. Note that the concept learning environment is not suitable for attention swap.
  3. Running real-image-swap.ipynb for subject swapping.

Different learnt concepts could have different swap steps for successful subject swapping. Tuning the swapping step and the text prompt for better performance. A concept model that has its weights tuned will have a degration in its ability on general concept generation. To do Subject Swapping with Photoswap, a single GPU with 16 GB memory is required.

Acknowledgements

Thank Prompt-to-Prompt, Huggingface, and MasaCtrl for their great work and open-sourced code.

Citation

@misc{gu2023photoswap,
      title={Photoswap: Personalized Subject Swapping in Images}, 
      author={Jing Gu and Yilin Wang and Nanxuan Zhao and Tsu-Jui Fu and Wei Xiong and Qing Liu and Zhifei Zhang and He Zhang and Jianming Zhang and HyunJoon Jung and Xin Eric Wang},
      year={2023},
      eprint={2305.18286},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

More Repositories

1

MiniGPT-5

Official implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"
Python
840
star
2

PEViT

Official implementation of AAAI 2023 paper "Parameter-efficient Model Adaptation for Vision Transformers"
Python
94
star
3

CPL

Official implementation of our EMNLP 2022 paper "CPL: Counterfactual Prompt Learning for Vision and Language Models"
Python
32
star
4

ComCLIP

Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"
Python
30
star
5

Discffusion

Official repo for the paper "Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners"
Python
26
star
6

llm_coordination

Code repository for the paper "LLM-Coordination: Evaluating and Analyzing Multi-agent Coordination Abilities in Large Language Models"
Python
21
star
7

Screen-Point-and-Read

Code repo for "Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding"
Python
19
star
8

MMWorld

Official repo of the paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"
Python
18
star
9

Aerial-Vision-and-Dialog-Navigation

Codebase of the ACL 2023 (Findings) Paper "Aerial Vision-and-Dialog Navigation"
Python
14
star
10

FedVLN

[ECCV 2022] Official pytorch implementation of the paper "FedVLN: Privacy-preserving Federated Vision-and-Language Navigation"
C++
13
star
11

Mitigate-Gender-Bias-in-Image-Search

Code for the EMNLP 2021 Oral paper "Are Gender-Neutral Queries Really Gender-Neutral? Mitigating Gender Bias in Image Search" https://arxiv.org/abs/2109.05433
Python
12
star
12

ACLToolBox

Python
8
star
13

PECTVLM

Code implementation for Findings of EMNLP 2023 paper "Parameter-Efficient Cross-lingual Transfer of Vision and Language Models via Translation-based Alignment"
Smalltalk
7
star
14

T2IAT

T2IAT: Measuring Valence and Stereotypical Biases in Text-to-Image Generation
Python
7
star
15

MSSBench

Official codebase for the paper "Multimodal Situational Safety"
Python
6
star
16

Naivgation-as-wish

Official implementation of the NAACL 2024 paper "Navigation as Attackers Wish? Towards Building Robust Embodied Agents under Federated Learning"
Python
5
star
17

ViCor

This is the implementation of ACL 2024 Findings paper ViCor: Bridging Visual Understanding and Commonsense Reasoning with Large Language Models
3
star
18

via-video

1
star