FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning
๐ฅ Model Zoo โข ๐ ๏ธ Installation โข ๐๏ธ Training โข ๐บ Sampling โข ๐ฑ Run WebUI
๐ Highlights
- We propose FontDiffuser, which is capable to generate unseen characters and styles, and it can be extended to the cross-lingual generation, such as Chinese to Korean.
- FontDiffuser excels in generating complex character and handling large style variation. And it achieves state-of-the-art performance.
- The generated results by FontDiffuser can be perfectly used for InstructPix2Pix for decoration, as shown in thr above figure.
- We release the ๐ปHugging Face Demo online! Welcome to Try it Out!
๐ News
- 2023.12.20: Our repository is public! ๐๐ค
- 2023.12.19: ๐ฅ๐ The ๐ปHugging Face Demo is public! Welcome to try it out!
- 2023.12.16: The gradio app demo is realeased.
- 2023.12.10: Release source code with phase 1 training and sampling.
- 2023.12.09: ๐๐ Our paper is accepted by AAAI2024.
- Previously: Our Recommendations-of-Diffusion-for-Text-Image repo is public, which contains a paper collection of recent diffusion models for text-image gneeration tasks. Welcome to check it out!
๐ฅ Model Zoo
Model | chekcpoint | status |
---|---|---|
FontDiffuer | GoogleDrive / BaiduYun:gexg | Released |
SCR | - | Coming Soon |
FontDiffuer (trained by a large dataset) | - | May Be Coming |
๐ง TODO List
- Add phase 1 training and sampling script.
- Add WebUI demo.
- Push demo to Hugging Face.
- Combined with InstructPix2Pix.
- Add phase 2 training script and checkpoint.
- Add the pre-training of SCR module.
๐ ๏ธ Installation
Prerequisites (Recommended)
- Linux
- Python 3.9
- Pytorch 1.13.1
- CUDA 11.7
Environment Setup
Clone this repo:
git clone https://github.com/yeungchenwa/FontDiffuser.git
Step 0: Download and install Miniconda from the official website.
Step 1: Create a conda environment and activate it.
conda create -n fontdiffuser python=3.9 -y
conda activate fontdiffuser
Step 2: Install related version Pytorch following here.
# Suggested
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117
Step 3: Install the required packages.
pip install -r requirements.txt
๐๏ธ Training
Data Construction
The training data files tree should be (The data examples are shown in directory data_examples/train/
):
โโโdata_examples
โ โโโ train
โ โโโ ContentImage
โ โ โโโ char0.png
โ โ โโโ char1.png
โ โ โโโ char2.png
โ โ โโโ ...
โ โโโ TargetImage.png
โ โโโ style0
โ โ โโโstyle0+char0.png
โ โ โโโstyle0+char1.png
โ โ โโโ ...
โ โโโ style1
โ โ โโโstyle1+char0.png
โ โ โโโstyle1+char1.png
โ โ โโโ ...
โ โโโ style2
โ โ โโโstyle2+char0.png
โ โ โโโstyle2+char1.png
โ โ โโโ ...
โ โโโ ...
Training - Phase 1
sh train_phase_1.sh
data_root
: The data root, as./data_examples
output_dir
: The training output logs and checkpoints saving directory.resolution
: The resolution of the UNet in our diffusion model.style_image_size
: The resolution of the style image, can be different withresolution
.content_image_size
: The resolution of the content image, should be the same as theresolution
.channel_attn
: Whether to use the channel attention in MCA block.train_batch_size
: The batch size in the training.max_train_steps
: The maximum of the training steps.learning_rate
: The learning rate when training.ckpt_interval
: The checkpoint saving interval when training.drop_prob
: The classifier-free guidance training probability.
Training - Phase 2
Coming Soon...
๐บ Sampling
Step 1 => Prepare the checkpoint
Option (1) Download the checkpoint following GoogleDrive / BaiduYun:gexg, then put the ckpt
to the root directory, including the files unet.pth
, content_encoder.pth
, and style_encoder.pth
.
Option (2) Put your re-training checkpoint folder ckpt
to the root directory, including the files unet.pth
, content_encoder.pth
, and style_encoder.pth
.
Step 2 => Run the script
(1) Sampling image from content image and reference image.
sh script/sample_content_image.sh
ckpt_dir
: The model checkpoints saving directory.content_image_path
: The content/source image path.style_image_path
: The style/reference image path.save_image
: setTrue
if saving as images.save_image_dir
: The image saving directory, the saving files including aout_single.png
and aout_with_cs.png
.device
: The sampling device, recommended GPU acceleration.guidance_scale
: The classifier-free sampling guidance scale.num_inference_steps
: The inference step by DPM-Solver++.
(2) Sampling image from content character.
Note Maybe you need a ttf file that contains numerous Chinese characters, you can download it from BaiduYun:wrth.
sh script/sample_content_character.sh
character_input
: If setTrue
, use character string as content/source input.content_character
: The content/source content character string.- The other parameters are the same as the above option (1).
๐ฑ Run WebUI
(1) Sampling by FontDiffuser
gradio gradio_app.py
Example:
(2) Sampling by FontDiffuser and Rendering by InstructPix2Pix
Coming Soon ...
๐ Gallery
Characters of hard level of complexity
Characters of medium level of complexity
Characters of easy level of complexity
Cross-Lingual Generation (Chinese to Korean)
๐ Acknowledgement
Copyright
- This repository can only be used for non-commercial research purpose.
- For commercial use, please contact Prof. Lianwen Jin ([email protected]).
- Copyright 2023, Deep Learning and Vision Computing Lab (DLVC-Lab), South China University of Technology.
Citation
@inproceedings{peng2022spts,
title={FontDiffuser: One-Shot Font Generation via Denoising Diffusion with
Multi-Scale Content Aggregation and Style Contrastive Learning},
author={Yang, Zhenhua and Peng, Dezhi and Kong, Yuxin and Zhang, Yuyi and Yao, Cong and Jin, Lianwen},
booktitle={Proceedings of the AAAI conference on artificial intelligence},
year={2024}
}