Giter VIP home page Giter VIP logo

sam-3d-selector's Introduction

SAM 3D Selector

This project aims to convert users' multi-view annotated 2D image segmentations (via segment-anything) to the corresponding 3D point cloud/mesh.

Just several coordinate space conversions, no other complicated methods (welcome to leave your suggestions).

I initially implemented this project to help the point selection process for my other project SPIDR, where I manually select points for deformations/animations.

I want to use SAM to automate this process, however, my current solution are still far from the perfection.

👇Point cloud

demo_1

👇Mesh

demo_2

Dependencies

  • SAM
    • Assume you set up SAM at ./segment-anything and download checkpoints at ./segment-anything/checkpoints
    • You can change to other location in app.py
  • open3d >= 0.16
  • python-opencv

How to Use

Annotate keypoints on the displayed image by clicking with the left mouse button.

Here are some control keys under openCV GUI:

Key Action
m Toggle between foreground and background keypoint annotation
z Undo the last keypoint
s Save the mask and keypoints
n Go to the next frame
p Go to the previous frame
r Reset the image
c Crop the point cloud
u Union the point cloud
x Intersect the point cloud
e Export the masked point cloud (compatible with MeshLab)
q Exit the program
k Switch the selection mode
a Add the current frame mask for multi-frame selection

The slider on the bottom controls the depth of the selected 3D points. The percentage is related to the size of the object bound box.

Input Arguments

  • --image: Path to the input image (default: "demo.png").
  • --wo_sam: Flag to not use the SAM model for mask prediction.
  • --save_path: Path to save the mask and keypoints (default: "output/").
  • --dataset_path: Path to a nerf_synthetic-like image folder (default: "").
  • --dataset_split: Dataset split (default: "test").
  • --dataset_skip: Number of frames to skip in the dataset (default: 10).
  • --pcd_path: Path to the 3D point cloud file (default: "").
  • --mesh_path: Path to the 3D mesh file (default: "").

Example

python app.py --dataset_path data/nerf_synthetic/lego --pcd_path data/3d_rep/lego_pcd.ply

The example point cloud & mesh can be downloaded from the following links:

# point cloud
gdown --fuzzy https://drive.google.com/file/d/1z9zuTKNbLFp6DOLfJN42kpUO0_ECCvy_/view?usp=share_link -O data/3d_rep/lego_pcd.ply
# mesh
gdown --fuzzy https://drive.google.com/file/d/17rqjWihUJshzt_Hc1YIJ8J5GNfr5WBJf/view?usp=share_link -O data/3d_rep/lego_mesh.obj

Observations

  • The SAM's segmentations are amazing, but not perfect. You can often see the boundary are not included in the mask (alot manual-tuning).

  • Keypoint prompting's accuracy can be improved a lot with recurrent mask inputs mask_input=logits.

  • 3D geometry consistency is still too difficult for SAM. We cannot easily wrap the mask to the new frame.

  • Automatic combining multi-frame selections is difficult:

    • small components can be easily occluded by other parts: cannot simply union or intersect.
    • intersection on co-visible masks? --> works not well.

sam-3d-selector's People

Contributors

nexuslrf avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.