• Stars
    star
    215
  • Rank 183,925 (Top 4 %)
  • Language
    Python
  • Created about 4 years ago
  • Updated 6 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

[CVPR 2021] Multi-Modal-CelebA-HQ: A Large-Scale Text-Driven Face Generation and Understanding Dataset

Multi-Modal-CelebA-HQ

Paper Maintenance PR's Welcome Images 30000

Multi-Modal-CelebA-HQ (MM-CelebA-HQ) is a large-scale face image dataset that has 30k high-resolution face images selected from the CelebA dataset by following CelebA-HQ. Each image in the dataset is accompanied by a semantic mask, sketch, descriptive text, and an image with a transparent background.

Multi-Modal-CelebA-HQ can be used to train and evaluate algorithms for a range of tasks, including text-to-image generation, text-guided image manipulation, sketch-to-image generation, image captioning, and visual question answering. This dataset is introduced and employed in TediGAN.

TediGAN: Text-Guided Diverse Face Image Generation and Manipulation.
Weihao Xia, Yujiu Yang, Jing-Hao Xue, and Baoyuan Wu.
CVPR 2021.

Updates

  • [04/10/2023] The scripts for text and sketch generation have been added to the repository.
  • [06/12/2020] The paper is released on ArXiv.
  • [11/13/2020] The multi-modal-celeba-hq dataset has been released.

Data Generation

Description

  • The textual descriptions are generated using probabilistic context-free grammar (PCFG) based on the given attributes. We create ten unique single sentence descriptions per image to obtain more training data following the format of the popular CUB dataset and COCO dataset. The previous study proposed CelebTD-HQ, but it is not publicly available.
  • For label, we use CelebAMask-HQ dataset, which contains manually-annotated semantic mask of facial attributes corresponding to CelebA-HQ.
  • For sketches, we follow the same data generation pipeline as in DeepFaceDrawing. We first apply Photocopy filter in Photoshop to extract edges, which preserves facial details and introduces excessive noise, then apply the sketch-simplification to get edge maps resembling hand-drawn sketches.
  • For background removing, we use an open-source tool Rembg and a commercial software removebg. Different backgrounds can be further added using image composition or harmonization methods like DoveNet.

Usage

This section outlines the process of generating the data for our task.

The scripts provided here are not restricted to the CelebA-HQ dataset and can be utilized to preprocess any dataset that includes attribute annotations, be it image, video, or 3D shape data. This flexibility enables the creation of custom datasets that meet specific requirements. For example, the create_caption.py script can be applied to generate diverse descriptions for each video by using video facial attributes (e.g., those provided by CelebV-HQ), leading to a text-video dataset, similar to CelebV-Text.

Text

Please download celeba-hq-attribute.txt and run the following script.

python create_caption.py

Kindly complete the form to request the processing script. The generated textual descriptions can be found at ./celeba_caption.

Sketch

If Photoshop is available to you, please apply the Photocopy filter in Photoshop to extract edges. Photoshop allows batch processing so you don't have to mannually process each image. The Sobel operator is an lternative way to extract edges when Photoshop is unavailable or a simpler approach is preferred. This process preserves facial details but introduces excessive noise. The sketch-simplification model is applied to get edge maps resembling hand-drawn sketches.

The sketch simplification model requires torch==0.4.1 and torchvision==0.2.1.

python create_sketch.py

The generated sketches can be found at ./celeba_sketch.

Overview

image

Note: Upon request, the download links of raw data and annotations have been removed from this repo. Please redirect to their original site for the raw data.and email me for the post-processing scripts. The scripts for text and sketch generation have been added to the repository.

All data is hosted on Google Drive (not available).

Path Size Files Format Description
multi-modal-celeba ~20 GB 420,002 Main folder
β”œΒ  image ~2 GB 30,000 JPG images from celeba-hq of size 512Γ—512
β”œΒ  text 11 MB 30,0000 TXT 10 descriptions of each image in celeba-hq
β”œΒ  train 347 KB 1 PKL filenames of training images
β”œΒ  test 81 KB 1 PKL filenames of test images

Pretrained Models

We provide the pretrained models of AttnGAN, ControlGAN, DMGAN, DFGAN, and ManiGAN. Please consider citing our paper if you use these pretrained models. Feel free to pull requests if you have any updates. Feel free to pull requests if you have any updates.

Method FID LPIPIS Download
AttnGAN 125.98 0.512 Google Drive
ControlGAN 116.32 0.522 Google Drive
DFGAN 137.60 0.581 Google Drive
DM-GAN 131.05 0.544 Google Drive
TediGAN 106.37 0.456 Google Drive

The pretrained model of ManiGAN is here. The training scripts and pretrained models on faces of sketch-to-to-image and label-to-image can be found here. Those with problems accessing Google Drive can refer to an alternative link at Baidu Cloud (code: b273) for the dataset and pretrained models.

Related Works

  • CelebA dataset:
    Ziwei Liu, Ping Luo, Xiaogang Wang and Xiaoou Tang, "Deep Learning Face Attributes in the Wild", in IEEE International Conference on Computer Vision (ICCV), 2015
  • CelebA-HQ was collected from CelebA and further post-processed by the following paper :
    Karras et. al., "Progressive Growing of GANs for Improved Quality, Stability, and Variation", in Internation Conference on Reoresentation Learning (ICLR), 2018
  • CelebAMask-HQ manually-annotated masks with the size of 512 x 512 and 19 classes including all facial components and accessories such as skin, nose, eyes, eyebrows, ears, mouth, lip, hair, hat, eyeglass, earring, necklace, neck, and cloth. It was collected by the following paper :
    Lee et. al., "MaskGAN: Towards Diverse and Interactive Facial Image Manipulation", in Computer Vision and Pattern Recognition (CVPR), 2020

Citation

If you find the dataset, processing scripts, and pretrained models useful for your research, please consider citing our paper:

@inproceedings{xia2021tedigan,
  title={TediGAN: Text-Guided Diverse Face Image Generation and Manipulation},
  author={Xia, Weihao and Yang, Yujiu and Xue, Jing-Hao and Wu, Baoyuan},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2021}
}

@article{xia2021towards,
  title={Towards Open-World Text-Guided Face Image Generation and Manipulation},
  author={Xia, Weihao and Yang, Yujiu and Xue, Jing-Hao and Wu, Baoyuan},
  journal={arxiv preprint arxiv: 2104.08910},
  year={2021}
}

If you use images and masks, please cite:

@inproceedings{liu2015faceattributes,
 title = {Deep Learning Face Attributes in the Wild},
 author = {Liu, Ziwei and Luo, Ping and Wang, Xiaogang and Tang, Xiaoou},
 booktitle = {Proceedings of International Conference on Computer Vision (ICCV)},
 year = {2015} 
}

@inproceedings{karras2017progressive,
  title={Progressive growing of gans for improved quality, stability, and variation},
  author={Karras, Tero and Aila, Timo and Laine, Samuli and Lehtinen, Jaakko},
  journal={International Conference on Learning Representations (ICLR)},
  year={2018}
}

@inproceedings{CelebAMask-HQ,
  title={MaskGAN: Towards Diverse and Interactive Facial Image Manipulation},
  author={Lee, Cheng-Han and Liu, Ziwei and Wu, Lingyun and Luo, Ping},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2020}
}

License

The use of this software is RESTRICTED to non-commercial research and educational purposes. The license is the same as in CelebAMask-HQ.

More Repositories

1

TediGAN

[CVPR 2021] Pytorch implementation for TediGAN: Text-Guided Diverse Face Image Generation and Manipulation
Python
373
star
2

MANIQA

[CVPRW 2022] MANIQA: Multi-dimension Attention Network for No-Reference Image Quality Assessment
Python
302
star
3

AHIQ

[CVPRW 2022] Attentions Help CNNs See Better: Attention-based Hybrid Image Quality Assessment Network
Python
74
star
4

RADN

[CVPRW 2021] Codes for Region-Adaptive Deformable Network for Image Quality Assessment
Python
62
star
5

interpGaze

[ACM MM 2020] Code and Data For Controllable Continuous Gaze Redirection.
Python
40
star
6

MAP

Python
31
star
7

SCL

Seeing What You Miss: Vision-Language Pre-training with Semantic Completion Learning
Python
20
star
8

PUM

[CVPR 2021] Pytorch implementation for Probabilistic Modeling of Semantic Ambiguity for Scene Graph Generation
Python
17
star
9

HVISNet

Code and Data for Real-time Human-Centric Segmentation for Complex Video Scenes
Python
15
star
10

AutoIE2

[NLPCC 2021] Shared Task on AutoIE2: Sub-Event Identification
Python
14
star
11

AttentionProbe

[ICASSP 2022] Official PyTorch Implementation for "Attention Probe: Vision Transformer Distillation in the Wild" (ICASSP 2022)
Python
11
star
12

PoseDet

[FG 2021] Code for PoseDet: Fast Multi-Person Pose Estimation Using Pose Embedding
Python
11
star
13

MIRTT

[EMNLP 2021] MIRTT: Learning Multimodal Interaction Representations from Trilinear Transformers for Visual Question Answering
Python
8
star
14

CNN-FCF

[CVPR 2019] Compressing Convolutional Neural Networks via Factorized Convolutional Filters.
Python
4
star