• Stars
    star
    699
  • Rank 64,759 (Top 2 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created over 1 year ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

[ICCV 2023] Consistent Image Synthesis and Editing

MasaCtrl: Tuning-free Mutual Self-Attention Control for Consistent Image Synthesis and Editing

Pytorch implementation of MasaCtrl: Tuning-free Mutual Self-Attention Control for Consistent Image Synthesis and Editing

Mingdeng Cao, Xintao Wang, Zhongang Qi, Ying Shan, Xiaohu Qie, Yinqiang Zheng

arXiv Project page demo demo


MasaCtrl enables performing various consistent non-rigid image synthesis and editing without fine-tuning and optimization.

Updates

  • [2023/5/13] The inference code of MasaCtrl with T2I-Adapter is available.
  • [2023/4/28] Hugging Face demo released.
  • [2023/4/25] Code released.
  • [2023/4/17] Paper is available here.

Introduction

We propose MasaCtrl, a tuning-free method for non-rigid consistent image synthesis and editing. The key idea is to combine the contents from the source image and the layout synthesized from text prompt and additional controls into the desired synthesized or edited image, with Mutual Self-Attention Control.

Main Features

1 Consistent Image Synthesis and Editing

MasaCtrl can perform prompt-based image synthesis and editing that changes the layout while maintaining contents of source image.

The target layout is synthesized directly from the target prompt.

Consistent synthesis results Real image editing results

2 Integration to Controllable Diffusion Models

Directly modifying the text prompts often cannot generate target layout of desired image, thus we further integrate our method into existing proposed controllable diffusion pipelines (like T2I-Adapter and ControlNet) to obtain stable synthesis and editing results.

The target layout controlled by additional guidance.

Synthesis (left part) and editing (right part) results with T2I-Adapter

3 Generalization to Other Models: Anything-V4

Our method also generalize well to other Stable-Diffusion-based models.

Results on Anything-V4

4 Extension to Video Synthesis

With dense consistent guidance, MasaCtrl enables video synthesis

Video Synthesis Results (with keypose and canny guidance)

Usage

Requirements

We implement our method with diffusers code base with similar code structure to Prompt-to-Prompt. The code runs on Python 3.8.5 with Pytorch 1.11. Conda environment is highly recommended.

pip install -r requirements.txt

Checkpoints

Stable Diffusion: We mainly conduct expriemnts on Stable Diffusion v1-4, while our method can generalize to other versions (like v1-5).

You can download these checkpoints on their official repository and Hugging Face.

Personalized Models: You can download personlized models from CIVITAI or train your own customized models.

Demos

Notebook demos

To run the synthesis with MasaCtrl, single GPU with at least 16 GB VRAM is required.

The notebook playground.ipynb and playground_real.ipynb provide the synthesis and real editing samples, respectively.

Online demos

We provide demo with Gradio app. Note that you may copy the demo into your own space to use the GPU. Online Colab demo demo is also available.

Local Gradio demo

You can launch the provided Gradio demo locally with

CUDA_VISIBLE_DEVICES=0 python app.py

MasaCtrl with T2I-Adapter

Install T2I-Adapter and prepare the checkpoints following their provided tutorial. Assuming it has been successfully installed and the root directory is T2I-Adapter.

Thereafter copy the core masactrl package and the inference code masactrl_w_adapter.py to the root directory of T2I-Adapter

cp -r MasaCtrl/masactrl T2I-Adapter/
cp MasaCtrl/masactrl_w_adapter/masactrl_w_adapter.py T2I-Adapter/

Last, you can inference the images with following command (with sketch adapter)

python masactrl_w_adapter.py \
--which_cond sketch \
--cond_path_src SOURCE_CONDITION_PATH \
--cond_path CONDITION_PATH \
--cond_inp_type sketch \
--prompt_src "A bear walking in the forest" \
--prompt "A bear standing in the forest" \
--sd_ckpt models/sd-v1-4.ckpt \
--resize_short_edge 512 \
--cond_tau 1.0 \
--cond_weight 1.0 \
--n_samples 1 \
--adapter_ckpt models/t2iadapter_sketch_sd14v1.pth

NOTE: You can download the sketch examples here.

For real image, the DDIM inversion is performed to invert the image into the noise map, thus we add the inversion process into the original DDIM sampler. You should replace the original file T2I-Adapter/ldm/models/diffusion/ddim.py with the exteneded version MasaCtrl/masactrl_w_adapter/ddim.py to enable the inversion function. Then you can edit the real image with following command (with sketch adapter)

python masactrl_w_adapter.py \
--src_img_path SOURCE_IMAGE_PATH \
--cond_path CONDITION_PATH \
--cond_inp_type image \
--prompt_src "" \
--prompt "a photo of a man wearing black t-shirt, giving a thumbs up" \
--sd_ckpt models/sd-v1-4.ckpt \
--resize_short_edge 512 \
--cond_tau 1.0 \
--cond_weight 1.0 \
--n_samples 1 \
--which_cond sketch \
--adapter_ckpt models/t2iadapter_sketch_sd14v1.pth \
--outdir ./workdir/masactrl_w_adapter_inversion/black-shirt

NOTE: You can download the real image editing example here.

Acknowledgements

We thank the awesome research works Prompt-to-Prompt, T2I-Adapter.

Citation

@misc{cao2023masactrl,
      title={MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing}, 
      author={Mingdeng Cao and Xintao Wang and Zhongang Qi and Ying Shan and Xiaohu Qie and Yinqiang Zheng},
      year={2023},
      eprint={2304.08465},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
    }

Contact

If your have any comments or questions, please open a new issue or feel free to contact Mingdeng Cao and Xintao Wang.

More Repositories

1

GFPGAN

GFPGAN aims at developing Practical Algorithms for Real-world Face Restoration.
Python
35,397
star
2

PhotoMaker

PhotoMaker [CVPR 2024]
Jupyter Notebook
9,198
star
3

T2I-Adapter

T2I-Adapter
Python
3,377
star
4

InstantMesh

InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models
Python
2,928
star
5

BrushNet

[ECCV 2024] The official implementation of paper "BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion"
Python
1,298
star
6

MotionCtrl

Official Code for MotionCtrl [SIGGRAPH 2024]
Python
1,247
star
7

SEED-Story

SEED-Story: Multimodal Long Story Generation with Large Language Model
Python
657
star
8

LLaMA-Pro

[ACL 2024] Progressive LLaMA with Block Expansion.
Python
459
star
9

Mix-of-Show

NeurIPS 2023, Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models
Python
383
star
10

Open-MAGVIT2

Open-MAGVIT2: Democratizing Autoregressive Visual Generation
Python
376
star
11

AnimeSR

Codes for "AnimeSR: Learning Real-World Super-Resolution Models for Animation Videos"
Python
325
star
12

VQFR

ECCV 2022, Oral, VQFR: Blind Face Restoration with Vector-Quantized Dictionary and Parallel Decoder
Python
320
star
13

CustomNet

Python
258
star
14

SmartEdit

Official code of SmartEdit [CVPR-2024 Highlight]
Python
214
star
15

UMT

UMT is a unified and flexible framework which can handle different input modality combinations, and output video moment retrieval and/or highlight detection results.
Python
186
star
16

MM-RealSR

Codes for "Metric Learning based Interactive Modulation for Real-World Super-Resolution"
Python
152
star
17

ViT-Lens

[CVPR 2024] ViT-Lens: Towards Omni-modal Representations
Python
148
star
18

MCQ

Official code for "Bridging Video-text Retrieval with Multiple Choice Questions", CVPR 2022 (Oral).
Python
136
star
19

DeSRA

Official codes for DeSRA (ICML 2023)
Python
123
star
20

FAIG

NeurIPS 2021, Spotlight, Finding Discriminative Filters for Specific Degradations in Blind Super-Resolution
Python
118
star
21

ArcNerf

Nerf and extensions in all
Jupyter Notebook
106
star
22

ST-LLM

[ECCV 2024🔥] Official implementation of the paper "ST-LLM: Large Language Models Are Effective Temporal Learners"
Python
96
star
23

SurfelNeRF

SurfelNeRF: Neural Surfel Radiance Fields for Online Photorealistic Reconstruction of Indoor Scenes
76
star
24

RepSR

Codes for "RepSR: Training Efficient VGG-style Super-Resolution Networks with Structural Re-Parameterization and Batch Normalization"
74
star
25

mllm-npu

mllm-npu: training multimodal large language models on Ascend NPUs
Python
67
star
26

HOSNeRF

HOSNeRF: Dynamic Human-Object-Scene Neural Radiance Fields from a Single Video
Python
65
star
27

FastRealVSR

Codes for "Mitigating Artifacts in Real-World Video Super-Resolution Models"
59
star
28

ConMIM

Official codes for ConMIM (ICLR 2023)
Python
57
star
29

GVT

Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".
Python
54
star
30

TVTS

Turning to Video for Transcript Sorting
Jupyter Notebook
44
star
31

BEBR

Official code for "Binary embedding based retrieval at Tencent"
Python
42
star
32

ViSFT

Python
33
star
33

pi-Tuning

Official code for "pi-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation", ICML 2023.
Python
32
star
34

FLM

Accelerating Vision-Language Pretraining with Free Language Modeling (CVPR 2023)
Python
31
star
35

Efficient-VSR-Training

Codes for "Accelerating the Training of Video Super-Resolution"
30
star
36

DTN

Official code for "Dynamic Token Normalization Improves Vision Transformer", ICLR 2022.
Python
27
star
37

OpenCompatible

OpenCompatible provides a standard compatible training benchmark, covering practical training scenarios.
Python
24
star
38

BTS

BTS: A Bi-lingual Benchmark for Text Segmentation in the Wild
23
star
39

SGAT4PASS

This is the official implementation of the paper SGAT4PASS: Spherical Geometry-Aware Transformer for PAnoramic Semantic Segmentation (IJCAI 2023)
Python
23
star
40

SFDA

Python
20
star
41

TaCA

Official code for the paper, "TaCA: Upgrading Your Visual Foundation Model with Task-agnostic Compatible Adapter".
15
star
42

Plot2Code

Python
14
star
43

common_trainer

Common template for pytorch project. Easy to extent and modify for new project.
Python
12
star
44

TransFusion

The code repo for the ACM MM paper: TransFusion: Multi-Modal Fusion for Video Tag Inference viaTranslation-based Knowledge Embedding.
9
star
45

BasicVQ-GEN

7
star
46

ArcVis

Visualization of 3d and 2d components interactively.
Jupyter Notebook
6
star
47

VTLayout

3
star