Giter VIP home page Giter VIP logo

segment-anything-finetuner's Introduction

Simple Finetuner for Segment Anything

This repository contains a simple starter code for finetuning the FAIR Segment Anything (SAM) models leveraging the convenience of PyTorch Lightning.

Setup

  1. Install dependencies

    First run

    git clone --recurse-submodules [email protected]:bhpfelix/segment-anything-finetuner.git

    Then

    cd segment-anything-finetuner

    Follow the setup instruction of Segment Anything to install the proper dependencies. Then run

    pip install -r requirements.txt
  2. Data preparation

    The starter code supports Coco format input with the following layout

    ├── dataset_name/
    │   ├── train/
    │   │   ├── _annotations.coco.json # COCO format annotation
    │   │   ├── 000001.png             # Images
    │   │   ├── 000002.png
    │   │   ├── ...
    │   ├── val/
    │   │   ├── _annotations.coco.json # COCO format annotation
    │   │   ├── xxxxxx.png             # Images
    │   │   ├── ...
  3. Download model checkpoints

    Download the necessary SAM model checkpoints and arrange the repo as follows:

    ├── dataset_name/              # structure as detailed above
    │   ├── ...
    ├── segment-anything/          # The FAIR SAM repo
    │   ├── ...
    ├── SAM/                       # the SAM pretrained checkpoints
    │   ├── sam_vit_h_4b8939.pth
    │   ├── ...
    ├── finetune.py
    ├── ...

Finetuning (finetune.py)

This file contains a simple finetuning script for the Segment Anything model on Coco format datasets.

Example usage:

python finetune.py \
    --data_root ./dataset_name \
    --model_type vit_h \
    --checkpoint_path ./SAM/sam_vit_h_4b8939.pth \
    --freeze_image_encoder \
    --batch_size 2 \
    --image_size 1024 \
    --steps 1500 \
    --learning_rate 1.e-5 \
    --weight_decay 0.01

We can optionally use the --freeze_image_encoder flag to detach the image encoder parameters from optimization and save GPU memory.

Notes

  • As of now the image resizing implementation is different from the ResizeLongestSide transform in SAM.
  • Drop path and layer-wise learning rate decay are not currently applied.
  • The finetuning script currently only supports bounding box input prompts.

Resources

Citation

@article{kirillov2023segany,
  title={Segment Anything},
  author={Kirillov, Alexander and Mintun, Eric and Ravi, Nikhila and Mao, Hanzi and Rolland, Chloe and Gustafson, Laura and Xiao, Tete and Whitehead, Spencer and Berg, Alexander C. and Lo, Wan-Yen and Doll{\'a}r, Piotr and Girshick, Ross},
  journal={arXiv:2304.02643},
  year={2023}
}

segment-anything-finetuner's People

Contributors

bhpfelix avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.