• Stars
    star
    518
  • Rank 85,414 (Top 2 %)
  • Language
    Python
  • Created over 1 year ago
  • Updated 10 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Combining MMOCR with Segment Anything & Stable Diffusion. Automatically detect, recognize and segment text instances, with serval downstream tasks, e.g., Text Removal and Text Inpainting

Optical Character Recognition with Segment Anything (OCR-SAM)

๐Ÿ‡ Introduction ๐Ÿ™

Can SAM be applied to OCR? We take a simple try to combine two off-the-shelf OCR models in MMOCR with SAM to develop some OCR-related application demos, including SAM for Text, Text Removal and Text Inpainting. And we also provide a WebUI by gradio to give a better interaction.

Note: Welcome anyones to join who have the idea and want to contribute to our repo.

๐Ÿ“… Updates ๐Ÿ‘€

  • 2023.08.23: ๐Ÿ”ฅ We creat a repo yeungchenwa/Recommendations-Diffusion-Text-Image to provide a paper collection of recent diffusion models for text-image generation tasks.
  • 2023.04.14: ๐Ÿ“ฃ Our repository is migrated to open-mmlab/playground.
  • 2023.04.12: Repository Release
  • 2023.04.12: Supported the Inpainting combined with DBNet++, SAM and ControlNet.
  • 2023.04.11: Supported the Erasing combined with DBNet++, SAM and Latent-Diffusion / Stable-Diffusion.
  • 2023.04.10: Supported the SAM for text combined tieh DBNet++ and SAM.
  • 2023.04.09: How effective is the SAM used on OCR Text Image, we've discussed it in the Blog.

๐Ÿ“ธ Demo Zoo ๐Ÿ”ฅ

This project includes:

๐Ÿšง Installation ๐Ÿ› ๏ธ

Prerequisites(Recommended)

  • Linux | Windows
  • Python 3.8
  • Pytorch 1.12
  • CUDA 11.3

Environment Setup

Clone this repo:

git clone https://github.com/yeungchenwa/OCR-SAM.git

Step 0: Download and install Miniconda from the official website.

Step 1: Create a conda environment and activate it.

conda create -n ocr-sam python=3.8 -y
conda activate ocr-sam

Step 2: Install related version Pytorch following here.

# Suggested
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 --extra-index-url https://download.pytorch.org/whl/cu113

Step 3: Install the mmengine, mmcv, mmdet, mmcls, mmocr.

pip install -U openmim
mim install mmengine
mim install mmocr
# In Window, the following symbol ' should be changed to "
mim install 'mmcv==2.0.0rc4'
mim install 'mmdet==3.0.0rc5'
mim install 'mmcls==1.0.0rc5'


# Install sam
pip install git+https://github.com/facebookresearch/segment-anything.git

# Install required packages
pip install -r requirements.txt

Step 4: Prepare for the diffusers and latent-diffusion.

# Install Gradio
pip install gradio

# Install the diffusers
pip install diffusers

# Install the pytorch_lightning for ldm
pip install pytorch-lightning==2.0.1.post0

๐Ÿ“’ Model checkpoints ๐Ÿ–ฅ

We retrain DBNet++ with Swin Transformer V2 as the backbone on a combination of multiple scene text datsets (e.g. HierText, TextOCR). Checkpoint for DBNet++ on Google Drive (1G).

And you should make dir following:

mkdir checkpoints
mkdir checkpoints/mmocr
mkdir checkpoints/sam
mkdir checkpoints/ldm
mv db_swin_mix_pretrain.pth checkpoints/mmocr

Download the rest of checkpints to the related path (If you've done, ignore the following):

# mmocr recognizer ckpt
wget -O checkpoints/mmocr/abinet_20e_st-an_mj_20221005_012617-ead8c139.pth https://download.openmmlab.com/mmocr/textrecog/abinet/abinet_20e_st-an_mj/abinet_20e_st-an_mj_20221005_012617-ead8c139.pth

# sam ckpt, more details: https://github.com/facebookresearch/segment-anything#model-checkpoints
wget -O checkpoints/sam/sam_vit_h_4b8939.pth https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth

# ldm ckpt
wget -O checkpoints/ldm/last.ckpt https://heibox.uni-heidelberg.de/f/4d9ac7ea40c64582b7c9/?dl=1

๐Ÿƒ๐Ÿปโ€โ™‚๏ธ Run Demo ๐ŸŠโ€โ™‚๏ธ

SAM for Text๐Ÿง

Run the following script:

python mmocr_sam.py \
    --inputs /YOUR/INPUT/IMG_PATH \ 
    --outdir /YOUR/OUTPUT_DIR \ 
    --device cuda \ 
  • --inputs: the path to your input image.
  • --outdir: the dir to your output.
  • --device: the device used for inference.

Erasing๐Ÿค“

In this application demo, we use the latent-diffusion-inpainting to erase, or the Stable-Diffusion-inpainting with text prompt to erase, which you can choose one of both by the parameter --diffusion_model. Also, you can choose whether to use the SAM ouput mask to erase by the parameter --use_sam. More implementation details are listed here

Run the following script:

python mmocr_sam_erase.py \ 
    --inputs /YOUR/INPUT/IMG_PATH \ 
    --outdir /YOUR/OUTPUT_DIR \ 
    --device cuda \ 
    --use_sam True \ 
    --dilate_iteration 2 \ 
    --diffusion_model \ 
    --sd_ckpt None \ 
    --prompt None \ 
    --img_size (512, 512) \ 
  • --inputs : the path to your input image.
  • --outdir: the dir to your output.
  • --device: the device used for inference.
  • --use_sam: whether to use sam for segment.
  • --dilate_iteration: iter to dilate the SAM's mask.
  • --diffusion_model: choose 'latent-diffusion' or 'stable-diffusion'.
  • --sd_ckpt: path to the checkpoints of stable-diffusion.
  • --prompt: the text prompt when use the stable-diffusion, set 'None' if use the default for erasing.
  • --img_size: image size of latent-diffusion.

Run the WebUI: see here

Note: The first time you run may cost some time, becasuse downloading the stable-diffusion ckpt cost a lot, wait patiently๐Ÿ‘€

Inpainting

More implementation details are listed here

Run the following script:

python mmocr_sam_inpainting.py \
    --img_path /YOUR/INPUT/IMG_PATH \ 
    --outdir /YOUR/OUTPUT_DIR \ 
    --device cuda \ 
    --prompt YOUR_PROMPT \ 
    --select_index 0 \ 
  • --img_path: the path to your input image.
  • --outdir: the dir to your output.
  • --device: the device used for inference.
  • --prompt: the text prompt.
  • --select_index: select the index of the text to inpaint.

Run WebUI

This repo also provides the WebUI(decided by gradio), inculding the Erasing and Inpainting.

Before running the script, you should install the gradio package:

pip install gradio

Erasing

python mmocr_sam_erase_app.py
  • Example:

Detector and Recognizer WebUI Result

Erasing WebUI Result

In our WebUI, user can interactly choose the SAM output and the diffusion model. Especially, user can choose which text to be erased.

Inpainting๐Ÿฅธ

python mmocr_sam_inpainting_app.py
  • Example:

Inpainting WebUI Result

Note: Before you open the web, it may cost some time, wait patiently๐Ÿ‘€

๐Ÿ’— Acknowledgement