• Stars
    star
    433
  • Rank 100,464 (Top 2 %)
  • Language
    Python
  • License
    MIT License
  • Created about 3 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

✨ 首个CJK(中日韩)字体识别以及样式提取模型 YuzuMarker的字体识别模型与实现 / First-ever CJK (Chinese Japanese Korean) Font Recognition and Style Extractor, side project of YuzuMarker
title emoji colorFrom colorTo sdk app_port
YuzuMarker.FontDetection
😅
blue
yellow
docker
7860

YuzuMarker.FontDetection

First-ever CJK (Chinese, Japanese, Korean) font recognition model

Click here for Online Demo Commit activity License Contributors

Buy Me A Coffee

Scene Text Font Dataset Generation

This repository also contains data for automatically generating a dataset of scene text images with different fonts. The dataset is generated using the CJK font pack by VCB-Studio and thousands of background image from pixiv.net.

The pixiv data will not be shared since it is just randomly scraped. You may prepare your own background dataset that would fit your data distribution as you like.

For the text corpus,

All text are also mixed with English text to simulate real-world data.

Data Preparation Walkthrough

  1. Download the CJK font pack and extract it to the dataset/fonts directory.
  2. Prepare the background data and put them in the dataset/pixivimages directory.
  3. Run following script to clean the file names
    python dataset_filename_preprocess.py

Generation Script Walkthrough

Now the preparation is complete. The following command can be used to generate the dataset:

python font_ds_generate_script.py 1 1

Note that the command is followed by two parameters. The second one is to split the task into multiple partitions, and the first one is the index of the partitioned task to run. For example, if you want to run the task in 4 partitions, you can run the following commands in parallel to speed up the process:

python font_ds_generate_script.py 1 4
python font_ds_generate_script.py 2 4
python font_ds_generate_script.py 3 4
python font_ds_generate_script.py 4 4

The generated dataset will be saved in the dataset/font_img directory.

Note that batch_generate_script_cmd_32.bat and batch_generate_script_cmd_64.bat are batch scripts for Windows that can be used to generate the dataset in parallel with 32 partitions and 64 partitions.

Final Check

Since the task might be terminated unexpectedly or deliberately by user. The script has a caching mechanism to avoid re-generating the same image.

In this case, the script might not be able to detect corruption in cache (might be caused by terminating when writing to files) during this task, thus we also provides a script checking the generated dataset and remove the corrupted images and labels.

python font_ds_detect_broken.py

After running the script, you might want to rerun the generation script to fill up the holes of the removed corrupted files.

(Optional) Linux Cluster Generation Walkthrough

If you would like to run the generation script on linux clusters, we also provides the environment setup script linux_venv_setup.sh.

The prerequisite is that you have a linux cluster with python3-venv installed and python3 is available in the path.

To setup the environment, run the following command:

./linux_venv_setup.sh

The script will create a virtual environment in the venv directory and install all the required packages. The script is required in most cases since the script will also install libraqm which is required for the text rendering of PIL and is often not installed by default in most linux server distributions.

After the environment is setup, you might compile a task scheduler to deploy generation task in parallel.

The main idea is similar to the direct usage of the script, except that here we accept three parameters,

  • TOTAL_MISSION: the total number of partitions of the task
  • MIN_MISSION: the minimum partition index of the task to run
  • MAX_MISSION: the maximum partition index of the task to run

and the compilation command is as following:

gcc -D MIN_MISSION=<MIN_MISSION> \
    -D MAX_MISSION=<MAX_MISSION> \
    -D TOTAL_MISSION=<TOTAL_MISSION> \
    batch_generate_script_linux.c \
    -o <object-file-name>.out

For example if you want to run the task in 64 partitions, and want to spilit the work on 4 machines, you can compile the following command on each machine:

# Machine 1
gcc -D MIN_MISSION=1 \
    -D MAX_MISSION=16 \
    -D TOTAL_MISSION=64 \
    batch_generate_script_linux.c \
    -o mission-1-16.out
# Machine 2
gcc -D MIN_MISSION=17 \
    -D MAX_MISSION=32 \
    -D TOTAL_MISSION=64 \
    batch_generate_script_linux.c \
    -o mission-17-32.out
# Machine 3
gcc -D MIN_MISSION=33 \
    -D MAX_MISSION=48 \
    -D TOTAL_MISSION=64 \
    batch_generate_script_linux.c \
    -o mission-33-48.out
# Machine 4
gcc -D MIN_MISSION=49 \
    -D MAX_MISSION=64 \
    -D TOTAL_MISSION=64 \
    batch_generate_script_linux.c \
    -o mission-49-64.out

Then you can run the compiled object file on each machine to start the generation task.

./mission-1-16.out # Machine 1
./mission-17-32.out # Machine 2
./mission-33-48.out # Machine 3
./mission-49-64.out # Machine 4

There is also another helper script to check the progress of the generation task. It can be used as following:

python font_ds_stat.py

MISC Info of the Dataset

The generation is CPU bound, and the generation speed is highly dependent on the CPU performance. Indeed the work itself is an engineering problem.

Some fonts are problematic during the generation process. The script has an manual exclusion list in config/fonts.yml and also support unqualified font detection on the fly. The script will automatically skip the problematic fonts and log them for future model training.

Model Training

Have the dataset ready under the dataset directory, you can start training the model. Note that you can have more than one folder of dataset, and the script will automatically merge them as long as you provide the path to the folder by command line arguments.

$ python train.py -h
usage: train.py [-h] [-d [DEVICES ...]] [-b SINGLE_BATCH_SIZE] [-c CHECKPOINT] [-m {resnet18,resnet34,resnet50,resnet101,deepfont}] [-p] [-i] [-a {v1,v2,v3}]
                [-l LR] [-s [DATASETS ...]] [-n MODEL_NAME] [-f] [-z SIZE] [-t {medium,high,heighest}] [-r]

optional arguments:
  -h, --help            show this help message and exit
  -d [DEVICES ...], --devices [DEVICES ...]
                        GPU devices to use (default: [0])
  -b SINGLE_BATCH_SIZE, --single-batch-size SINGLE_BATCH_SIZE
                        Batch size of single device (default: 64)
  -c CHECKPOINT, --checkpoint CHECKPOINT
                        Trainer checkpoint path (default: None)
  -m {resnet18,resnet34,resnet50,resnet101,deepfont}, --model {resnet18,resnet34,resnet50,resnet101,deepfont}
                        Model to use (default: resnet18)
  -p, --pretrained      Use pretrained model for ResNet (default: False)
  -i, --crop-roi-bbox   Crop ROI bounding box (default: False)
  -a {v1,v2,v3}, --augmentation {v1,v2,v3}
                        Augmentation strategy to use (default: None)
  -l LR, --lr LR        Learning rate (default: 0.0001)
  -s [DATASETS ...], --datasets [DATASETS ...]
                        Datasets paths, seperated by space (default: ['./dataset/font_img'])
  -n MODEL_NAME, --model-name MODEL_NAME
                        Model name (default: current tag)
  -f, --font-classification-only
                        Font classification only (default: False)
  -z SIZE, --size SIZE  Model feature image input size (default: 512)
  -t {medium,high,heighest}, --tensor-core {medium,high,heighest}
                        Tensor core precision (default: high)
  -r, --preserve-aspect-ratio-by-random-crop
                        Preserve aspect ratio (default: False)

Font Classification Experiment Results

On our synthesized dataset,

Backbone Data Aug Pretrained Crop
Text
BBox
Preserve
Aspect
Ratio
Output
Norm
Input Size Hyper
Param
Accur Commit Dataset Precision
DeepFont ✔️* Sigmoid 105x105 I1 [Can't Converge] 665559f I5 bfloat16_3x
DeepFont ✔️* Sigmoid 105x105 IV4 [Can't Converge] 665559f I bfloat16_3x
ResNet-18 Sigmoid 512x512 I 18.58% 5c43f60 I float32
ResNet-18 Sigmoid 512x512 II2 14.39% 5a85fd3 I bfloat16_3x
ResNet-18 Tanh 512x512 II 16.24% ff82fe6 I bfloat16_3x
ResNet-18 ✅*8 Tanh 512x512 II 27.71% a976004 I bfloat16_3x
ResNet-18 * Tanh 512x512 I 29.95% 8364103 I bfloat16_3x
ResNet-18 * Sigmoid 512x512 I 29.37% [Early stop] 8d2e833 I bfloat16_3x
ResNet-18 ✅* Sigmoid 416x416 I [Lower Trend] d5a3215 I bfloat16_3x
ResNet-18 * Sigmoid 320x320 I [Lower Trend] afcdd80 I bfloat16_3x
ResNet-18 * Sigmoid 224x224 I [Lower Trend] 8b9de80 I bfloat16_3x
ResNet-34 * Sigmoid 512x512 I 32.03% 912d566 I bfloat16_3x
ResNet-50 ✅* Sigmoid 512x512 I 34.21% e980b66 I bfloat16_3x
ResNet-18 * Sigmoid 512x512 I 31.24% 416c7bb I bfloat16_3x
ResNet-18 * Sigmoid 512x512 I 34.69% 855e240 I bfloat16_3x
ResNet-18 ✔️*9 Sigmoid 512x512 I 38.32% 1750035 I bfloat16_3x
ResNet-18 ✔️* Sigmoid 512x512 III3 38.87% 0693434 I bfloat16_3x
ResNet-50 ✔️* Sigmoid 512x512 III 48.99% bc0f7fc II6 bfloat16_3x
ResNet-50 ✔️ Sigmoid 512x512 III 48.45% 0f071a5 II bfloat16_3x
ResNet-50 ✔️ 11 Sigmoid 512x512 III 46.12% 0f071a5 II bfloat16
ResNet-50 10 Sigmoid 512x512 III 43.86% 0f071a5 II bfloat16
ResNet-50 Sigmoid 512x512 III 41.35% 0f071a5 II bfloat16
  • * Bug in implementation
  • 1 learning rate = 0.0001, lambda = (2, 0.5, 1)
  • 2 learning rate = 0.00005, lambda = (4, 0.5, 1)
  • 3 learning rate = 0.001, lambda = (2, 0.5, 1)
  • 4 learning rate = 0.01, lambda = (2, 0.5, 1)
  • 5 Initial version of synthesized dataset
  • 6 Doubled synthesized dataset (2x)
  • 7 Quadruple synthesized dataset (4x)
  • 8 Data Augmentation v1: Color Jitter + Random Crop [81%-100%]
  • 9 Data Augmentation v2: Color Jitter + Random Crop [30%-130%] + Random Gaussian Blur + Random Gaussian Noise + Random Rotation [-15°, 15°]
  • 10 Data Augmentation v3: Color Jitter + Random Crop [30%-130%] + Random Gaussian Blur + Random Gaussian Noise + Random Rotation [-15°, 15°] + Random Horizontal Flip + Random Downsample [1, 2]
  • 11 Preserve Aspect Ratio by Random Cropping

Pretrained Models

Available at: https://huggingface.co/gyrojeff/YuzuMarker.FontDetection/tree/main

Note that since I trained everything on pytorch 2.0 with torch.compile, if you want to use the pretrained model you would need to install pytorch 2.0 and compile it with torch.compile as in demo.py.

Demo Deployment (Method 1)

To deploy the demo, you would need either the whole font dataset under ./dataset/fonts or a cache file indicating fonts of model called font_demo_cache.bin. This will be later released as resource.

To deploy, first run the following script to generate the demo font image (if you have the fonts dataset):

python generate_font_sample_image.py

then run the following script to start the demo server:

$ python demo.py -h
usage: demo.py [-h] [-d DEVICE] [-c CHECKPOINT] [-m {resnet18,resnet34,resnet50,resnet101,deepfont}] [-f] [-z SIZE] [-s] [-p PORT] [-a ADDRESS]

optional arguments:
  -h, --help            show this help message and exit
  -d DEVICE, --device DEVICE
                        GPU devices to use (default: 0), -1 for CPU
  -c CHECKPOINT, --checkpoint CHECKPOINT
                        Trainer checkpoint path (default: None). Use link as huggingface://<user>/<repo>/<file> for huggingface.co models, currently only supports model file in the root
                        directory.
  -m {resnet18,resnet34,resnet50,resnet101,deepfont}, --model {resnet18,resnet34,resnet50,resnet101,deepfont}
                        Model to use (default: resnet18)
  -f, --font-classification-only
                        Font classification only (default: False)
  -z SIZE, --size SIZE  Model feature image input size (default: 512)
  -s, --share           Get public link via Gradio (default: False)
  -p PORT, --port PORT  Port to use for Gradio (default: 7860)
  -a ADDRESS, --address ADDRESS
                        Address to use for Gradio (default: 127.0.0.1)

Demo Deployment (Method 2)

If docker is available on your machine, you can deploy directly by docker as how I did for huggingface space.

You may follow the command line argument provided in the last section to change the last line of the Dockerfile to accomodate your needs.

Build the docker image:

docker build -t yuzumarker.fontdetection .

Run the docker image:

docker run -it -p 7860:7860 yuzumarker.fontdetection

Online Demo

The project is also deployed on Huggingface Space: https://huggingface.co/spaces/gyrojeff/YuzuMarker.FontDetection

Related works and Resources

Star History

Star History Chart

More Repositories

1

lightnovel_epub

🍭 epub generator for (light)novels (轻)小说 epub 生成器,支持站点:轻之国度、轻小说文库
Python
291
star
2

YuzuMarker

🍋 [WIP] Manga Translation Tool
C#
99
star
3

Ayase

🥥 Control everything by keyboard.
C#
66
star
4

VSCode-LaTeX-Snippets

🛠 A VSCode extension, includes useful snippets and tools for LaTeX.
JavaScript
39
star
5

dayone2markdown

📔 Day One diary entries to markdown files
Python
37
star
6

mikanani-proxy

蜜柑计划轻量级代理服务 - Lightweight proxy server for https://mikanani.me
Python
31
star
7

DungeonAssistant

WiFi RSSI Based Indoor Localization System
Python
31
star
8

cloudflare-dynamic-best

Automatically select the best Cloudflare IP for your Cloudflare DNS record
Rust
28
star
9

clash-multi-mixin

🚀 让 Clash 的 Mixin 同时对多个飞机场适配
JavaScript
21
star
10

calibre-web-bgm

🧩 calibre-web bangumi 元数据插件
Python
19
star
11

concrete-math

📐 具体数学读书笔记
C++
17
star
12

yolo-v1-pytorch

⚗ YOLO v1 PyTorch Implementation
Jupyter Notebook
17
star
13

MahiruLauncher

🍢 Cross-platform modular launcher
C#
16
star
14

BaiduTongjiExporter

💹 百度统计数据导出脚本
Python
12
star
15

typexo-cli

📝 A Hexo-like cli tool for Typecho Blogs, used for blog writing.
Python
9
star
16

yolo-v2-pytorch

⚗ YOLO v2 PyTorch Implementation
Python
9
star
17

stick-fight-auto

⚔ Stick Fight 自动瞄准
Python
8
star
18

typexo-server

🛠 A Hexo-like server tool for Typecho Blogs. Personal backend blog engine.
Python
5
star
19

akasaka

Dynamic mutiprocess preprocessing task loader and dispatcher
Python
5
star
20

VSCode-Hexo-Next-Snippets

🛠 A VSCode extension, includes some snippets for the theme *Next* for blog *Hexo*
JavaScript
5
star
21

mirror-toolbox

🎥 网课镜像工具箱
Python
4
star
22

alchemist-helper

🧪 Colab Kaggle 极市平台 快速白嫖部署脚本
Python
4
star
23

YuzuMarker.Photoshop

🍹 [WIP] YuzuMarker 配套 Photoshop 插件
JavaScript
4
star
24

remote-lamp-switch

💡 远程开关电容开关灯解决方案
C#
4
star
25

CloudflareDDNS

DDNS service for Cloudflare
Python
4
star
26

esxi-pci-passthrough-auto-enable

Solve the problem that the PCI passthrough deactivates after reboot, also hard to reset due to UI flickering
Shell
4
star
27

BFTGym

[VLDB'24] BFTGym: An Interactive Playground for BFT Protocols
4
star
28

DesktopSwitcher

🖥 A desktop switching tool for OSX.
Swift
3
star
29

FRC-Scouting-6487

[GY] A Scouting app built for FRC Robotics Competition.
C#
3
star
30

BFTBrain

[NSDI'25] BFTBrain: Adaptive BFT Consensus with Reinforcement Learning, [VLDB'24] BFTGym: An Interactive Playground for BFT Protocols
Java
3
star
31

gdrive-integrity-checker

⛅ File integrity checker for google drive using scripts in Colab
Jupyter Notebook
3
star
32

KirinShiKi

✨ A Typecho plugin based on the blog of Jindai Kirin and theme *handsome*. 基于handsome主题的神代綺凜式魔改主题
CSS
3
star
33

gyrojeff.top

📁 Source of my blog - gyrojeff.top, maintained by typexo.
3
star
34

ti-chem-calc

🧮 Molecular mass calculating program for TI calculators.
Lua
3
star
35

DanDanPlayForiOS

🎞 DanDanPlay for iOS (Unofficial version), an anime watching client with danmaku. The original repo is no longer in maintenance. 弹弹play iOS版, 非官方维护及更新 (原作者已不再维护)
Objective-C
3
star
36

windows-config-script

♻ 重装后的一键配置
PowerShell
2
star
37

Max4Min

🍡 Maximize windows in a way easy for minimize
C#
2
star
38

BaiduTongjiAPI

💹 Pypi 百度统计 API 封装
Python
2
star
39

hexo-cnblogs-sync

🗃 Sync the blogs in Hexo (Next theme) to cnblogs.
HTML
2
star
40

yolo-v3-pytorch

⚗ YOLO v3 PyTorch Implementation
Python
2
star
41

CloudflareDDNS-WPF

C# Version for Cloudflare DDNS software (same function as the py ver. written a few weeks ago)
C#
2
star
42

WiFiSignalCapturer

Java
2
star
43

csapp-3e

📔 CS:APP Notes & Homework 深入理解计算机系统笔记
C
1
star
44

ylgy-recognition

🐑 羊了个羊识别
Python
1
star
45

YuzuMarker.MangaDataset

Python
1
star
46

cloudlab-api

Forked version of cloudlab-api with some bugs fixed
Python
1
star
47

TorchLearning

Some Jupyter Notebooks of learning PyTorch.
Jupyter Notebook
1
star
48

JeffersonQin.github.io

Personal Blog: gyrojeff.moe
HTML
1
star
49

FDU-SoC-2020

Source code for Fudan University SoC Course in Summer 2020.
Assembly
1
star
50

PlayingCardDemo

Stanford CS193P Demo: Playing Card
Swift
1
star
51

JeffersonQin

1
star
52

H-Downloader

📖 A Comic Downloader for E-Hentai & Erocool
Java
1
star
53

YuzuMarker.TextDetection

Text Detection Model for YuzuMarker
1
star
54

projects

Code that'll help you kickstart a personal website that showcases your work as a software developer.
HTML
1
star
55

jsdelivr-github-purge

🎈 A script that purges the jsdelivr cdn of github repo
Python
1
star
56

dandanplay_toolchain

🛠 Some useful scripts for DanDanPlay. 弹弹Play的一些有用的脚本。
Python
1
star
57

MahiruLauncher.Api.Python

🍥 MahiruLauncher's API for Python
Python
1
star
58

qbittorrent-auto-flush

⚡ qbittorrent 下载完毕后自动 FTP 上传复制
Python
1
star
59

DanDanPlaySwiftUI

弹弹play iOS客户端(SwiftUI实现),施工中...
Swift
1
star
60

Re-DanDanPlayforiOS

Swift重写弹弹Play iOS客户端
Swift
1
star