Giter VIP home page Giter VIP logo

id-pose's Introduction

ID-Pose: Sparse-view Camera Pose Estimation by Inverting Diffusion Models

[Paper] | [Project Page] | [HF Demo] | [Examples]

TL;DR

  • ID-Pose estimates camera poses of sparse-view images of a 3D object (appearance overlaps not required).
  • ID-Pose inversely uses the off-the-shelf Zero-1-to-3 to estimate camera poses by iteratively minimizing denoising errors given input images.
  • ID-Pose is a zero-shot method that requires NO additional model training or finetuning.
  • ID-Pose exhibits strong generalization ability on open-world images as the method effectively leverages the image priors from Zero123 (StableDiffusion).

News

  • [2023-11-12] We incoporate "absolute elevation estimation" as the default setting. We update the default values of the following parameters: --probe_min_timestep, --probe_max_timestep, --min_timestep, --max_timestep.
  • [2023-09-11] We introduce a new feature that initializing relative poses with estimated absolute elevations from input images. The estimation method and the source code are borrowed from One-2-3-45. This feature improves the metrics by about 3%-10% (tested on OmniObject3D). It also reduces the running time as elevations will not be probed.
  • [2023-09-11] We release the evaluation data & code. Please check the Evaluation section.

Usage

Installation

Create an environment with Python 3.9 (Recommend to use Anaconda or Miniconda)

git clone https://github.com/xt4d/id-pose.git
cd id-pose/
pip install -r requirements.txt
git clone https://github.com/CompVis/taming-transformers.git
pip install -e taming-transformers/

Download checkpoints

  1. Download zero123-xl.ckpt to ckpts/.
mkdir -p ckpts/
wget -P ckpts/ https://zero123.cs.columbia.edu/assets/zero123-xl.ckpt
  1. Download indoor_ds_new.ckpt from LoFTR weights to ckpts/.

Run examples

Running requires around 28 GB of VRAM on an NVIDIA Tesla V100 GPU.

## Example 1: Image folder ##
python test_pose_estimation.py --input ./data/demo/lion/ --output outputs/demo/

## Example 2: Structured evaluation data ##
## Include --no_rembg if the images do not have a background.
python test_pose_estimation.py --input ./inputs/real.json --output outputs/real --no_rembg

## Example 3: Structured evaluation data ##
python test_pose_estimation.py --input ./inputs/omni3d.json --output outputs/omni3d --no_rembg

The results will be stored under the directory specified by --output.

Visualization

pip install jupyterlab
jupyter-lab viz.ipynb

Use your own data

Step 1: Create an image folder. For example:

mkdir -p data/demo/lion/

Step 2: Put the images under the folder. For example:

lion
├── 000.jpg
├── 001.jpg

Step 3: Run estimation:

python test_pose_estimation.py --input ./data/demo/lion/ --output outputs/demo/

The results will be stored under outputs/demo/.

Evaluation

The evaluation data can be downloaded from Google Drive. Put the input json files under inputs/ and the dataset folders under data/.

Run pose estimations on each dataset:

python test_pose_estimation.py --input inputs/abo_testset.json --output outputs/abo_tset --no_rembg --bkg_threshold 0.9
python test_pose_estimation.py --input inputs/omni3d_testset.json --output outputs/omni3d_tset --no_rembg --bkg_threshold 0.5

Run the evaluation script as:

python metric.py --input outputs/abo_tset --gt data/abo/
python metric.py --input outputs/omni3d_tset --gt data/omni3d/

Examples

The images outlined in red are anchor views for which the camera poses have been manually found.

👉 Open Interactive Viewer to check more examples.

Work in progress

  • 3D reconstruction with posed images.
  • Reduce the running time of ID-Pose.
  • Upgrade ID-Pose to estimate 6DOF poses.

Citation

@article{cheng2023id,
  title={ID-Pose: Sparse-view Camera Pose Estimation by Inverting Diffusion Models},
  author={Cheng, Weihao and Cao, Yan-Pei and Shan, Ying},
  journal={arXiv preprint arXiv:2306.17140},
  year={2023}
}

id-pose's People

Contributors

xt4d avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

kotthoff

id-pose's Issues

Evaluation code

Hi, thanks for the great work!

I wanted to ask, could you please show how did you evaluate ID-Pose on the datasets in the paper? More precisely, how did you convert (elevation, azimuth, radius) predictions to rotation matrices?

Thank you!

Evaluation dataset

Hi, thanks for this amazing project! The idea is really good. Do you have a plan to release the evaluation dataset as well as evaluation scripts?

Thanks a lot!

Reconstruction of 3d model based on sparse multi view inputs

Hi! I saw the recent tweet on the id-pose of the foosball table, and that you guys are working on releasing 3D reconstruction of the model. I wanted to know if you have any timelines on when we can post 4 to 6 pictures of a real world object so that we can get a 3D model of the object.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.