• Stars
    star
    271
  • Rank 151,717 (Top 3 %)
  • Language
    Python
  • Created 12 months ago
  • Updated 9 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

[AAAI2024] FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning

FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning

FontDiffuser_LOGO

arXiv preprint Gradio demo Homepage Code

๐Ÿ”ฅ Model Zoo โ€ข ๐Ÿ› ๏ธ Installation โ€ข ๐Ÿ‹๏ธ Training โ€ข ๐Ÿ“บ Sampling โ€ข ๐Ÿ“ฑ Run WebUI

๐ŸŒŸ Highlights

Vis_1 Vis_2

  • We propose FontDiffuser, which is capable to generate unseen characters and styles, and it can be extended to the cross-lingual generation, such as Chinese to Korean.
  • FontDiffuser excels in generating complex character and handling large style variation. And it achieves state-of-the-art performance.
  • The generated results by FontDiffuser can be perfectly used for InstructPix2Pix for decoration, as shown in thr above figure.
  • We release the ๐Ÿ’ปHugging Face Demo online! Welcome to Try it Out!

๐Ÿ“… News

  • 2023.12.20: Our repository is public! ๐Ÿ‘๐Ÿค—
  • 2023.12.19: ๐Ÿ”ฅ๐ŸŽ‰ The ๐Ÿ’ปHugging Face Demo is public! Welcome to try it out!
  • 2023.12.16: The gradio app demo is realeased.
  • 2023.12.10: Release source code with phase 1 training and sampling.
  • 2023.12.09: ๐ŸŽ‰๐ŸŽ‰ Our paper is accepted by AAAI2024.
  • Previously: Our Recommendations-of-Diffusion-for-Text-Image repo is public, which contains a paper collection of recent diffusion models for text-image gneeration tasks. Welcome to check it out!

๐Ÿ”ฅ Model Zoo

Model chekcpoint status
FontDiffuer GoogleDrive / BaiduYun:gexg Released
SCR - Coming Soon
FontDiffuer (trained by a large dataset) - May Be Coming

๐Ÿšง TODO List

  • Add phase 1 training and sampling script.
  • Add WebUI demo.
  • Push demo to Hugging Face.
  • Combined with InstructPix2Pix.
  • Add phase 2 training script and checkpoint.
  • Add the pre-training of SCR module.

๐Ÿ› ๏ธ Installation

Prerequisites (Recommended)

  • Linux
  • Python 3.9
  • Pytorch 1.13.1
  • CUDA 11.7

Environment Setup

Clone this repo:

git clone https://github.com/yeungchenwa/FontDiffuser.git

Step 0: Download and install Miniconda from the official website.

Step 1: Create a conda environment and activate it.

conda create -n fontdiffuser python=3.9 -y
conda activate fontdiffuser

Step 2: Install related version Pytorch following here.

# Suggested
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117

Step 3: Install the required packages.

pip install -r requirements.txt

๐Ÿ‹๏ธ Training

Data Construction

The training data files tree should be (The data examples are shown in directory data_examples/train/):

โ”œโ”€โ”€data_examples
โ”‚   โ””โ”€โ”€ train
โ”‚       โ”œโ”€โ”€ ContentImage
โ”‚       โ”‚   โ”œโ”€โ”€ char0.png
โ”‚       โ”‚   โ”œโ”€โ”€ char1.png
โ”‚       โ”‚   โ”œโ”€โ”€ char2.png
โ”‚       โ”‚   โ””โ”€โ”€ ...
โ”‚       โ””โ”€โ”€ TargetImage.png
โ”‚           โ”œโ”€โ”€ style0
โ”‚           โ”‚     โ”œโ”€โ”€style0+char0.png
โ”‚           โ”‚     โ”œโ”€โ”€style0+char1.png
โ”‚           โ”‚     โ””โ”€โ”€ ...
โ”‚           โ”œโ”€โ”€ style1
โ”‚           โ”‚     โ”œโ”€โ”€style1+char0.png
โ”‚           โ”‚     โ”œโ”€โ”€style1+char1.png
โ”‚           โ”‚     โ””โ”€โ”€ ...
โ”‚           โ”œโ”€โ”€ style2
โ”‚           โ”‚     โ”œโ”€โ”€style2+char0.png
โ”‚           โ”‚     โ”œโ”€โ”€style2+char1.png
โ”‚           โ”‚     โ””โ”€โ”€ ...
โ”‚           โ””โ”€โ”€ ...

Training - Phase 1

sh train_phase_1.sh
  • data_root: The data root, as ./data_examples
  • output_dir: The training output logs and checkpoints saving directory.
  • resolution: The resolution of the UNet in our diffusion model.
  • style_image_size: The resolution of the style image, can be different with resolution.
  • content_image_size: The resolution of the content image, should be the same as the resolution.
  • channel_attn: Whether to use the channel attention in MCA block.
  • train_batch_size: The batch size in the training.
  • max_train_steps: The maximum of the training steps.
  • learning_rate: The learning rate when training.
  • ckpt_interval: The checkpoint saving interval when training.
  • drop_prob: The classifier-free guidance training probability.

Training - Phase 2

Coming Soon...

๐Ÿ“บ Sampling

Step 1 => Prepare the checkpoint

Option (1) Download the checkpoint following GoogleDrive / BaiduYun:gexg, then put the ckpt to the root directory, including the files unet.pth, content_encoder.pth, and style_encoder.pth.
Option (2) Put your re-training checkpoint folder ckpt to the root directory, including the files unet.pth, content_encoder.pth, and style_encoder.pth.

Step 2 => Run the script

(1) Sampling image from content image and reference image.

sh script/sample_content_image.sh
  • ckpt_dir: The model checkpoints saving directory.
  • content_image_path: The content/source image path.
  • style_image_path: The style/reference image path.
  • save_image: set True if saving as images.
  • save_image_dir: The image saving directory, the saving files including a out_single.png and a out_with_cs.png.
  • device: The sampling device, recommended GPU acceleration.
  • guidance_scale: The classifier-free sampling guidance scale.
  • num_inference_steps: The inference step by DPM-Solver++.

(2) Sampling image from content character.
Note Maybe you need a ttf file that contains numerous Chinese characters, you can download it from BaiduYun:wrth.

sh script/sample_content_character.sh
  • character_input: If set True, use character string as content/source input.
  • content_character: The content/source content character string.
  • The other parameters are the same as the above option (1).

๐Ÿ“ฑ Run WebUI

(1) Sampling by FontDiffuser

gradio gradio_app.py

Example:

(2) Sampling by FontDiffuser and Rendering by InstructPix2Pix

Coming Soon ...

๐ŸŒ„ Gallery

Characters of hard level of complexity

vis_hard

Characters of medium level of complexity

vis_medium

Characters of easy level of complexity

vis_easy

Cross-Lingual Generation (Chinese to Korean)

vis_korean

๐Ÿ’™ Acknowledgement

Copyright

Citation

@inproceedings{peng2022spts,
  title={FontDiffuser: One-Shot Font Generation via Denoising Diffusion with
Multi-Scale Content Aggregation and Style Contrastive Learning},
  author={Yang, Zhenhua and Peng, Dezhi and Kong, Yuxin and Zhang, Yuyi and Yao, Cong and Jin, Lianwen},
  booktitle={Proceedings of the AAAI conference on artificial intelligence},
  year={2024}
}

โญ Star Rising

Star Rising