• Stars
    star
    127
  • Rank 282,790 (Top 6 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created over 1 year ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Simple Finetuning Starter Code for Segment Anything

Simple Finetuner for Segment Anything

This repository contains a simple starter code for finetuning the FAIR Segment Anything (SAM) models leveraging the convenience of PyTorch Lightning.

Setup

  1. Install dependencies

    First run

    git clone --recurse-submodules [email protected]:bhpfelix/segment-anything-finetuner.git

    Then

    cd segment-anything-finetuner

    Follow the setup instruction of Segment Anything to install the proper dependencies. Then run

    pip install -r requirements.txt
  2. Data preparation

    The starter code supports Coco format input with the following layout

    β”œβ”€β”€ dataset_name/
    β”‚   β”œβ”€β”€ train/
    β”‚   β”‚   β”œβ”€β”€ _annotations.coco.json # COCO format annotation
    β”‚   β”‚   β”œβ”€β”€ 000001.png             # Images
    β”‚   β”‚   β”œβ”€β”€ 000002.png
    β”‚   β”‚   β”œβ”€β”€ ...
    β”‚   β”œβ”€β”€ val/
    β”‚   β”‚   β”œβ”€β”€ _annotations.coco.json # COCO format annotation
    β”‚   β”‚   β”œβ”€β”€ xxxxxx.png             # Images
    β”‚   β”‚   β”œβ”€β”€ ...
  3. Download model checkpoints

    Download the necessary SAM model checkpoints and arrange the repo as follows:

    β”œβ”€β”€ dataset_name/              # structure as detailed above
    β”‚   β”œβ”€β”€ ...
    β”œβ”€β”€ segment-anything/          # The FAIR SAM repo
    β”‚   β”œβ”€β”€ ...
    β”œβ”€β”€ SAM/                       # the SAM pretrained checkpoints
    β”‚   β”œβ”€β”€ sam_vit_h_4b8939.pth
    β”‚   β”œβ”€β”€ ...
    β”œβ”€β”€ finetune.py
    β”œβ”€β”€ ...

Finetuning (finetune.py)

This file contains a simple finetuning script for the Segment Anything model on Coco format datasets.

Example usage:

python finetune.py \
    --data_root ./dataset_name \
    --model_type vit_h \
    --checkpoint_path ./SAM/sam_vit_h_4b8939.pth \
    --freeze_image_encoder \
    --batch_size 2 \
    --image_size 1024 \
    --steps 1500 \
    --learning_rate 1.e-5 \
    --weight_decay 0.01

We can optionally use the --freeze_image_encoder flag to detach the image encoder parameters from optimization and save GPU memory.

Notes

  • As of now the image resizing implementation is different from the ResizeLongestSide transform in SAM.
  • Drop path and layer-wise learning rate decay are not currently applied.
  • The finetuning script currently only supports bounding box input prompts.

Resources

Citation

@article{kirillov2023segany,
  title={Segment Anything},
  author={Kirillov, Alexander and Mintun, Eric and Ravi, Nikhila and Mao, Hanzi and Rolland, Chloe and Gustafson, Laura and Xiao, Tete and Whitehead, Spencer and Berg, Alexander C. and Lo, Wan-Yen and Doll{\'a}r, Piotr and Girshick, Ross},
  journal={arXiv:2304.02643},
  year={2023}
}