Optical Character Recognition with Segment Anything (OCR-SAM)
๐ Introduction ๐
Can SAM be applied to OCR? We take a simple try to combine two off-the-shelf OCR models in MMOCR with SAM to develop some OCR-related application demos, including SAM for Text, Text Removal and Text Inpainting. And we also provide a WebUI by gradio to give a better interaction.
Note: Welcome anyones to join who have the idea and want to contribute to our repo.
๐ Updates ๐
- 2023.08.23: ๐ฅ We creat a repo yeungchenwa/Recommendations-Diffusion-Text-Image to provide a paper collection of recent diffusion models for text-image generation tasks.
- 2023.04.14: ๐ฃ Our repository is migrated to open-mmlab/playground.
- 2023.04.12: Repository Release
- 2023.04.12: Supported the Inpainting combined with DBNet++, SAM and ControlNet.
- 2023.04.11: Supported the Erasing combined with DBNet++, SAM and Latent-Diffusion / Stable-Diffusion.
- 2023.04.10: Supported the SAM for text combined tieh DBNet++ and SAM.
- 2023.04.09: How effective is the SAM used on OCR Text Image, we've discussed it in the Blog.
๐ธ Demo Zoo ๐ฅ
This project includes:
- SAM for Text: DBNet++ + SAM
- Erasing: DBNet++ + SAM + Latent-Diffusion / Stable Diffusion
- Inpainting: DBNet++ + SAM + Stable Diffusion
๐ง Installation ๐ ๏ธ
Prerequisites(Recommended)
- Linux | Windows
- Python 3.8
- Pytorch 1.12
- CUDA 11.3
Environment Setup
Clone this repo:
git clone https://github.com/yeungchenwa/OCR-SAM.git
Step 0: Download and install Miniconda from the official website.
Step 1: Create a conda environment and activate it.
conda create -n ocr-sam python=3.8 -y
conda activate ocr-sam
Step 2: Install related version Pytorch following here.
# Suggested
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 --extra-index-url https://download.pytorch.org/whl/cu113
Step 3: Install the mmengine, mmcv, mmdet, mmcls, mmocr.
pip install -U openmim
mim install mmengine
mim install mmocr
# In Window, the following symbol ' should be changed to "
mim install 'mmcv==2.0.0rc4'
mim install 'mmdet==3.0.0rc5'
mim install 'mmcls==1.0.0rc5'
# Install sam
pip install git+https://github.com/facebookresearch/segment-anything.git
# Install required packages
pip install -r requirements.txt
Step 4: Prepare for the diffusers and latent-diffusion.
# Install Gradio
pip install gradio
# Install the diffusers
pip install diffusers
# Install the pytorch_lightning for ldm
pip install pytorch-lightning==2.0.1.post0
๐ Model checkpoints ๐ฅ
We retrain DBNet++ with Swin Transformer V2 as the backbone on a combination of multiple scene text datsets (e.g. HierText, TextOCR). Checkpoint for DBNet++ on Google Drive (1G).
And you should make dir following:
mkdir checkpoints
mkdir checkpoints/mmocr
mkdir checkpoints/sam
mkdir checkpoints/ldm
mv db_swin_mix_pretrain.pth checkpoints/mmocr
Download the rest of checkpints to the related path (If you've done, ignore the following):
# mmocr recognizer ckpt
wget -O checkpoints/mmocr/abinet_20e_st-an_mj_20221005_012617-ead8c139.pth https://download.openmmlab.com/mmocr/textrecog/abinet/abinet_20e_st-an_mj/abinet_20e_st-an_mj_20221005_012617-ead8c139.pth
# sam ckpt, more details: https://github.com/facebookresearch/segment-anything#model-checkpoints
wget -O checkpoints/sam/sam_vit_h_4b8939.pth https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
# ldm ckpt
wget -O checkpoints/ldm/last.ckpt https://heibox.uni-heidelberg.de/f/4d9ac7ea40c64582b7c9/?dl=1
๐๐ปโโ๏ธ Run Demo ๐โโ๏ธ
SAM for Text๐ง
Run the following script:
python mmocr_sam.py \
--inputs /YOUR/INPUT/IMG_PATH \
--outdir /YOUR/OUTPUT_DIR \
--device cuda \
--inputs
: the path to your input image.--outdir
: the dir to your output.--device
: the device used for inference.
Erasing๐ค
In this application demo, we use the latent-diffusion-inpainting to erase, or the Stable-Diffusion-inpainting with text prompt to erase, which you can choose one of both by the parameter --diffusion_model
. Also, you can choose whether to use the SAM ouput mask to erase by the parameter --use_sam
. More implementation details are listed here
Run the following script:
python mmocr_sam_erase.py \
--inputs /YOUR/INPUT/IMG_PATH \
--outdir /YOUR/OUTPUT_DIR \
--device cuda \
--use_sam True \
--dilate_iteration 2 \
--diffusion_model \
--sd_ckpt None \
--prompt None \
--img_size (512, 512) \
--inputs
: the path to your input image.--outdir
: the dir to your output.--device
: the device used for inference.--use_sam
: whether to use sam for segment.--dilate_iteration
: iter to dilate the SAM's mask.--diffusion_model
: choose 'latent-diffusion' or 'stable-diffusion'.--sd_ckpt
: path to the checkpoints of stable-diffusion.--prompt
: the text prompt when use the stable-diffusion, set 'None' if use the default for erasing.--img_size
: image size of latent-diffusion.
Run the WebUI: see here
Note: The first time you run may cost some time, becasuse downloading the stable-diffusion ckpt cost a lot, wait patiently๐
Inpainting
More implementation details are listed here
Run the following script:
python mmocr_sam_inpainting.py \
--img_path /YOUR/INPUT/IMG_PATH \
--outdir /YOUR/OUTPUT_DIR \
--device cuda \
--prompt YOUR_PROMPT \
--select_index 0 \
--img_path
: the path to your input image.--outdir
: the dir to your output.--device
: the device used for inference.--prompt
: the text prompt.--select_index
: select the index of the text to inpaint.
Run WebUI
This repo also provides the WebUI(decided by gradio), inculding the Erasing and Inpainting.
Before running the script, you should install the gradio package:
pip install gradio
Erasing
python mmocr_sam_erase_app.py
- Example:
Detector and Recognizer WebUI Result
In our WebUI, user can interactly choose the SAM output and the diffusion model. Especially, user can choose which text to be erased.
Inpainting๐ฅธ
python mmocr_sam_inpainting_app.py
- Example:
Note: Before you open the web, it may cost some time, wait patiently๐