• Stars
    star
    373
  • Rank 111,204 (Top 3 %)
  • Language
    Python
  • License
    Other
  • Created about 1 year ago
  • Updated 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

This is the official repository for the paper "Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing". ICCV 2023

Multimodal Garment Designer (ICCV 2023)

Human-Centric Latent Diffusion Models for Fashion Image Editing

Alberto Baldrati*, Davide Morelli*, Giuseppe Cartella, Marcella Cornia, Marco Bertini, Rita Cucchiara

* Equal contribution.

arXiv GitHub Stars

This is the official repository for the paper "Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing".

Overview

Abstract:
Fashion illustration is used by designers to communicate their vision and to bring the design idea from conceptualization to realization, showing how clothes interact with the human body. In this context, computer vision can thus be used to improve the fashion design process. Differently from previous works that mainly focused on the virtual try-on of garments, we propose the task of multimodal-conditioned fashion image editing, guiding the generation of human-centric fashion images by following multimodal prompts, such as text, human body poses, and garment sketches. We tackle this problem by proposing a new architecture based on latent diffusion models, an approach that has not been used before in the fashion domain. Given the lack of existing datasets suitable for the task, we also extend two existing fashion datasets, namely Dress Code and VITON-HD, with multimodal annotations collected in a semi-automatic manner. Experimental results on these new datasets demonstrate the effectiveness of our proposal, both in terms of realism and coherence with the given multimodal inputs.

Citation

If you make use of our work, please cite our paper:

@inproceedings{baldrati2023multimodal,
  title={Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing},
  author={Baldrati, Alberto and Morelli, Davide and Cartella, Giuseppe and Cornia, Marcella and Bertini, Marco and Cucchiara, Rita},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year={2023}
}

Getting Started

We recommend using the Anaconda package manager to avoid dependency/reproducibility problems. For Linux systems, you can find a conda installation guide here.

Installation

  1. Clone the repository
git clone https://github.com/aimagelab/multimodal-garment-designer
  1. Install Python dependencies
conda env create -n mgd -f environment.yml
conda activate mgd

Alternatively, you can create a new conda environment and install the required packages manually:

conda create -n mgd -y python=3.9
conda activate mgd
pip install torch==1.12.1 torchmetrics==0.11.0 opencv-python==4.7.0.68 diffusers==0.12.0 transformers==4.25.1 accelerate==0.15.0 clean-fid==0.1.35 torchmetrics[image]==0.11.0

Inference

To run the inference please use the following:

python src/eval.py --dataset_path <path> --batch_size <int> --mixed_precision fp16 --output_dir <path> --save_name <string> --num_workers_test <int> --sketch_cond_rate 0.2 --dataset <dresscode|vitonhd> --start_cond_rate 0.0 --test_order <paired|unpaired>
  • dataset_path is the path to the dataset (change accordingly to the dataset parameter)
  • dataset dataset name to be used
  • output_dir path to the output directory
  • save_name name of the output dir subfolder where the generated images are saved
  • start_cond_rate rate {0.0,1.0} of denoising steps that will be used as offset to start sketch conditioning
  • sketch_cond_rate rate {0.0,1.0} of denoising steps in which sketch cond is applied
  • test_order test setting (paired | unpaired)

Note that we provide a few sample images to test MGD simply by cloning this repo (i.e., assets/data). To execute the code set

  • Dress Code Multimodal dataset
    • dataset_path to ../assets/data/dresscode
    • dataset to dresscode
  • Viton-HD Multimodal dataset
    • dataset_path to ../assets/data/vitonhd
    • dataset to vitonhd

It is possible to run the inference on the whole Dress Code Multimodal or Viton-HD Multimodal dataset simply changing the dataset_path and dataset according with the downloaded and prepared datasets (see sections below).

Pre-trained models

The model and checkpoints are available via torch.hub.

Load the MGD denoising UNet model using the following code:

import torch
unet = torch.hub.load(
    dataset=<dataset>, 
    repo_or_dir='aimagelab/multimodal-garment-designer', 
    source='github', 
    model='mgd', 
    pretrained=True
    )
  • dataset dataset name (dresscode | vitonhd)

Use the denoising network with our custom diffusers pipeline as follow:

from src.mgd_pipelines.mgd_pipe import MGDPipe
from diffusers import AutoencoderKL, DDIMScheduler
from transformers import CLIPTextModel, CLIPTokenizer

pretrained_model_name_or_path = "runwayml/stable-diffusion-inpainting"

text_encoder = CLIPTextModel.from_pretrained(
    pretrained_model_name_or_path, 
    subfolder="text_encoder"
    )

vae = AutoencoderKL.from_pretrained(
    pretrained_model_name_or_path, 
    subfolder="vae"
    )

tokenizer = CLIPTokenizer.from_pretrained(
    pretrained_model_name_or_path,
    subfolder="tokenizer",
    )

val_scheduler = DDIMScheduler.from_pretrained(
    pretrained_model_name_or_path,
    subfolder="scheduler"
    )
val_scheduler.set_timesteps(50)

mgd_pipe = MGDPipe(
    text_encoder=text_encoder,
    vae=vae,
    unet=unet,
    tokenizer=tokenizer,
    scheduler=val_scheduler,
    )

For an extensive usage case see the file eval.py in the main repo.

Datasets

We do not hold rights on the original Dress Code and Viton-HD datasets. Please refer to the original papers for more information.

Start by downloading the original datasets from the following links:

Download the Dress Code Multimodal and Viton-HD Multimodal additional data annotations from here.

  • Dress Code Multimodal [link]
  • Viton-HD Multimodal [link]

Dress Code Multimodal Data Preparation

Once data is downloaded prepare the dataset folder as follows:

Dress Code
| fine_captions.json
| coarse_captions.json
| test_pairs_paired.txt
| test_pairs_unpaired.txt
| train_pairs.txt
| test_stitch_map
|---- [category]
|-------- images
|-------- keypoints
|-------- skeletons
|-------- dense
|-------- im_sketch
|-------- im_sketch_unpaired
...

Viton-HD Multimodal Data Preparation

Once data is downloaded prepare the dataset folder as follows:

Viton-HD
| captions.json
|---- train
|-------- image
|-------- cloth
|-------- image-parse-v3
|-------- openpose_json
|-------- im_sketch
|-------- im_sketch_unpaired
...
|---- test
...
|-------- im_sketch
|-------- im_sketch_unpaired
...

TODO

  • training code

Acknowledgements

This work has partially been supported by the PNRR project “Future Artificial Intelligence Research (FAIR)”, by the PRIN project “CREATIVE: CRoss-modal understanding and gEnerATIon of Visual and tExtual content” (CUP B87G22000460001), both co-funded by the Italian Ministry of University and Research, and by the European Commission under European Horizon 2020 Programme, grant number 101004545 - ReInHerit.

LICENSE

Creative Commons License
All material is available under Creative Commons BY-NC 4.0. You can use, redistribute, and adapt the material for non-commercial purposes, as long as you give appropriate credit by citing our paper and indicate any changes you've made.

More Repositories

1

meshed-memory-transformer

Meshed-Memory Transformer for Image Captioning. CVPR 2020
Python
505
star
2

mammoth

An Extendible (General) Continual Learning Framework based on Pytorch - official codebase of Dark Experience for General Continual Learning
Python
448
star
3

dress-code

Dress Code: High-Resolution Multi-Category Virtual Try-On. ECCV 2022
Python
426
star
4

show-control-and-tell

Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions. CVPR 2019
Python
282
star
5

novelty-detection

Latent space autoregression for novelty detection.
Python
196
star
6

art2real

Art2Real: Unfolding the Reality of Artworks via Semantically-Aware Image-to-Image Translation. CVPR 2019
Python
76
star
7

VKD

PyTorch code for ECCV 2020 paper: "Robust Re-Identification by Multiple Views Knowledge Distillation"
Python
72
star
8

VATr

Python
65
star
9

STAGE_action_detection

Code of the STAGE module for video action detection
Python
50
star
10

pacscore

Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation. CVPR 2023
Python
48
star
11

open-fashion-clip

This is the official repository for the paper "OpenFashionCLIP: Vision-and-Language Contrastive Learning with Open-Source Fashion Data". ICIAP 2023
Python
48
star
12

human-pose-annotation-tool

Human Pose Annotation Tool
Python
38
star
13

speaksee

PyTorch library for Visual-Semantic tasks
Python
28
star
14

mil4wsi

DAS-MIL: Distilling Across Scales for MILClassification of Histological WSIs
Python
26
star
15

camel

CaMEL: Mean Teacher Learning for Image Captioning. ICPR 2022
Python
26
star
16

TransformerBasedGestureRecognition

Python
23
star
17

RefiNet

Python
21
star
18

mvad-names-dataset

M-VAD Names Dataset. Multimedia Tools and Applications (2019)
Python
21
star
19

DynamicConv-agent

PyTorch code for BMVC 2019 paper: Embodied Vision-and-Language Navigation with Dynamic Convolutional Filters
C++
21
star
20

perceive-transform-and-act

PyTorch code for the paper: "Perceive, Transform, and Act: Multi-Modal Attention Networks for Vision-and-Language Navigation"
C++
18
star
21

mcmr

PyTorch code for 3DV 2021 paper: "Multi-Category Mesh Reconstruction From Image Collections"
Python
17
star
22

LiDER

Official implementation of "On the Effectiveness of Lipschitz-Driven Rehearsal in Continual Learning"
Python
16
star
23

PMA-Net

With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning. ICCV 2023
15
star
24

MaPeT

Learning to Mask and Permute Visual Tokens for Vision Transformer Pre-Training
Python
15
star
25

Ti-MGD

This is the official repository for the paper "Multimodal-Conditioned Latent Diffusion Models for Fashion Image Editing".
15
star
26

awesome-human-visual-attention

This repository contains a curated list of research papers and resources focusing on saliency and scanpath prediction, human attention, human visual search.
14
star
27

LoCoNav

Python
13
star
28

focus-on-impact

Python
13
star
29

safe-clip

This is the official repository for the paper "Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models".
13
star
30

HWD

Python
12
star
31

CSL-TAL

Pytorch code for ECCVW 2022 paper "Consistency-based Self-supervised Learning for Temporal Anomaly Localization"
Python
11
star
32

ADCC

Python
10
star
33

RMSNet_Soccer

PyTorch code for RMS-Net
Python
8
star
34

COCOFake

7
star
35

CSSL

Code implementation for "Continual Semi-Supervised Learning through Contrastive Interpolation Consistency"
Python
6
star
36

aimagelab-srv

AImageLab-SRV wiki, support, code snippets and best practices.
5
star
37

rpe_spdh

PyTorch code for IEEE RA-L paper: "Semi-Perspective Decoupled Heatmaps for 3D Robot Pose Estimation from Depth Maps"
Python
5
star
38

vffc

Python
3
star
39

aidlda_tutorial

A tutorial on PyTorch - AI-DLDA 2018
Python
3
star
40

LAM

The Ludovico Antonio Muratori (LAM) dataset is the largest line-level HTR dataset to date and contains 25,823 lines from Italian ancient manuscripts edited by a single author over 60 years. The dataset comes in two configurations: a basic splitting and a date-based splitting which takes into account the age of the author. The first setting is intended to study HTR on ancient documents in Italian, while the second focuses on the ability of HTR systems to recognize text written by the same writer in time periods for which training data are not available.
3
star
41

unveiling-the-truth

2
star
42

cvcs2023

1
star
43

FourBi

Python
1
star
44

DefConvs_HTR

Boosting modern and historical handwritten text recognition with deformable convolutions (ICPR20, IJDAR22)
Python
1
star
45

Teddy

Python
1
star