• Stars
  • Rank 180,044 (Top 4 %)
  • Language
  • Created 10 months ago
  • Updated 6 months ago


There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

[AAAI2024] FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning

FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning


arXiv preprint Gradio demo Homepage Code

๐Ÿ”ฅ Model Zoo โ€ข ๐Ÿ› ๏ธ Installation โ€ข ๐Ÿ‹๏ธ Training โ€ข ๐Ÿ“บ Sampling โ€ข ๐Ÿ“ฑ Run WebUI

๐ŸŒŸ Highlights

Vis_1 Vis_2

  • We propose FontDiffuser, which is capable to generate unseen characters and styles, and it can be extended to the cross-lingual generation, such as Chinese to Korean.
  • FontDiffuser excels in generating complex character and handling large style variation. And it achieves state-of-the-art performance.
  • The generated results by FontDiffuser can be perfectly used for InstructPix2Pix for decoration, as shown in thr above figure.
  • We release the ๐Ÿ’ปHugging Face Demo online! Welcome to Try it Out!

๐Ÿ“… News

  • 2023.12.20: Our repository is public! ๐Ÿ‘๐Ÿค—
  • 2023.12.19: ๐Ÿ”ฅ๐ŸŽ‰ The ๐Ÿ’ปHugging Face Demo is public! Welcome to try it out!
  • 2023.12.16: The gradio app demo is realeased.
  • 2023.12.10: Release source code with phase 1 training and sampling.
  • 2023.12.09: ๐ŸŽ‰๐ŸŽ‰ Our paper is accepted by AAAI2024.
  • Previously: Our Recommendations-of-Diffusion-for-Text-Image repo is public, which contains a paper collection of recent diffusion models for text-image gneeration tasks. Welcome to check it out!

๐Ÿ”ฅ Model Zoo

Model chekcpoint status
FontDiffuer GoogleDrive / BaiduYun:gexg Released
SCR - Coming Soon
FontDiffuer (trained by a large dataset) - May Be Coming

๐Ÿšง TODO List

  • Add phase 1 training and sampling script.
  • Add WebUI demo.
  • Push demo to Hugging Face.
  • Combined with InstructPix2Pix.
  • Add phase 2 training script and checkpoint.
  • Add the pre-training of SCR module.

๐Ÿ› ๏ธ Installation

Prerequisites (Recommended)

  • Linux
  • Python 3.9
  • Pytorch 1.13.1
  • CUDA 11.7

Environment Setup

Clone this repo:

git clone https://github.com/yeungchenwa/FontDiffuser.git

Step 0: Download and install Miniconda from the official website.

Step 1: Create a conda environment and activate it.

conda create -n fontdiffuser python=3.9 -y
conda activate fontdiffuser

Step 2: Install related version Pytorch following here.

# Suggested
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117

Step 3: Install the required packages.

pip install -r requirements.txt

๐Ÿ‹๏ธ Training

Data Construction

The training data files tree should be (The data examples are shown in directory data_examples/train/):

โ”‚   โ””โ”€โ”€ train
โ”‚       โ”œโ”€โ”€ ContentImage
โ”‚       โ”‚   โ”œโ”€โ”€ char0.png
โ”‚       โ”‚   โ”œโ”€โ”€ char1.png
โ”‚       โ”‚   โ”œโ”€โ”€ char2.png
โ”‚       โ”‚   โ””โ”€โ”€ ...
โ”‚       โ””โ”€โ”€ TargetImage.png
โ”‚           โ”œโ”€โ”€ style0
โ”‚           โ”‚     โ”œโ”€โ”€style0+char0.png
โ”‚           โ”‚     โ”œโ”€โ”€style0+char1.png
โ”‚           โ”‚     โ””โ”€โ”€ ...
โ”‚           โ”œโ”€โ”€ style1
โ”‚           โ”‚     โ”œโ”€โ”€style1+char0.png
โ”‚           โ”‚     โ”œโ”€โ”€style1+char1.png
โ”‚           โ”‚     โ””โ”€โ”€ ...
โ”‚           โ”œโ”€โ”€ style2
โ”‚           โ”‚     โ”œโ”€โ”€style2+char0.png
โ”‚           โ”‚     โ”œโ”€โ”€style2+char1.png
โ”‚           โ”‚     โ””โ”€โ”€ ...
โ”‚           โ””โ”€โ”€ ...

Training - Phase 1

sh train_phase_1.sh
  • data_root: The data root, as ./data_examples
  • output_dir: The training output logs and checkpoints saving directory.
  • resolution: The resolution of the UNet in our diffusion model.
  • style_image_size: The resolution of the style image, can be different with resolution.
  • content_image_size: The resolution of the content image, should be the same as the resolution.
  • channel_attn: Whether to use the channel attention in MCA block.
  • train_batch_size: The batch size in the training.
  • max_train_steps: The maximum of the training steps.
  • learning_rate: The learning rate when training.
  • ckpt_interval: The checkpoint saving interval when training.
  • drop_prob: The classifier-free guidance training probability.

Training - Phase 2

Coming Soon...

๐Ÿ“บ Sampling

Step 1 => Prepare the checkpoint

Option (1) Download the checkpoint following GoogleDrive / BaiduYun:gexg, then put the ckpt to the root directory, including the files unet.pth, content_encoder.pth, and style_encoder.pth.
Option (2) Put your re-training checkpoint folder ckpt to the root directory, including the files unet.pth, content_encoder.pth, and style_encoder.pth.

Step 2 => Run the script

(1) Sampling image from content image and reference image.

sh script/sample_content_image.sh
  • ckpt_dir: The model checkpoints saving directory.
  • content_image_path: The content/source image path.
  • style_image_path: The style/reference image path.
  • save_image: set True if saving as images.
  • save_image_dir: The image saving directory, the saving files including a out_single.png and a out_with_cs.png.
  • device: The sampling device, recommended GPU acceleration.
  • guidance_scale: The classifier-free sampling guidance scale.
  • num_inference_steps: The inference step by DPM-Solver++.

(2) Sampling image from content character.
Note Maybe you need a ttf file that contains numerous Chinese characters, you can download it from BaiduYun:wrth.

sh script/sample_content_character.sh
  • character_input: If set True, use character string as content/source input.
  • content_character: The content/source content character string.
  • The other parameters are the same as the above option (1).

๐Ÿ“ฑ Run WebUI

(1) Sampling by FontDiffuser

gradio gradio_app.py


(2) Sampling by FontDiffuser and Rendering by InstructPix2Pix

Coming Soon ...

๐ŸŒ„ Gallery

Characters of hard level of complexity


Characters of medium level of complexity


Characters of easy level of complexity


Cross-Lingual Generation (Chinese to Korean)


๐Ÿ’™ Acknowledgement



  title={FontDiffuser: One-Shot Font Generation via Denoising Diffusion with
Multi-Scale Content Aggregation and Style Contrastive Learning},
  author={Yang, Zhenhua and Peng, Dezhi and Kong, Yuxin and Zhang, Yuyi and Yao, Cong and Jin, Lianwen},
  booktitle={Proceedings of the AAAI conference on artificial intelligence},

โญ Star Rising

Star Rising