Giter VIP home page Giter VIP logo

geowizard's Introduction

GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image


GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image

Xiao Fu*, Wei Yin*, Mu Hu*, Kaixuan Wang, Yuexin Ma, Ping Tan, Shaojie Shen, Dahua Lin† , Xiaoxiao Long†

  • Equal contribution; † Corresponding authors
    Arxiv Preprint, 2024

demo_vid

🛠️ Setup

We test our codes under the following environment: Ubuntu 22.04, Python 3.9.18, CUDA 11.8.

  1. Clone this repository.
git clone [email protected]:fuxiao0719/GeoWizard.git
cd GeoWizard
  1. Install packages
conda create -n geowizard python=3.9
conda activate geowizard
pip install -r requirements.txt
cd geowizard

🤖 Usage

Run inference for depth & normal

Place your images in a directory input/example (for example, where we have prepared several cases), and run the following inference. The depth and normal outputs will be stored in output/example.

python run_infer.py \
    --input_dir ${input path} \
    --output_dir ${output path} \
    --ensemble_size ${ensemble size} \
    --denoise_steps ${denoising steps} \
    --domain ${data type}
# e.g.
python run_infer.py \
    --input_dir input/example \
    --output_dir output \
    --ensemble_size 3 \
    --denoise_steps 10 \
    --domain "indoor"

Inference settings: --domain: Data type. Options: "indoor", "outdoor", and "object". Note that "object" is best for background-free objects, like that in objaverse. We find that "indoor" will suit in most scenarios. Default: "indoor". --ensemble_size and --denoise_steps: trade-off arguments for speed and performance, more ensembles and denoising steps to get higher accuracy. Default: 3 and 10.

Run inference for depth & normal (object-oriented)

(2024-04-13) To further meet the community feedback on our v1-model for object-level applications, we additionally train a v2-model on Objaverse with some architecture modifications. Now it can generate more realistic and three-dimensional normal maps on some rare images (e.g., cartoon style, see below). Hope that it could provide more help to the community, and the advanced models will continue to come if further needed.

python run_infer_object.py \
    --input_dir ${input path} \
    --output_dir ${output path} \
    --ensemble_size ${ensemble size} \
    --denoise_steps ${denoising steps} \
    --domain "object"
# e.g.
python run_infer_object.py \
    --input_dir input/example_object \
    --output_dir output \
    --ensemble_size 3 \
    --denoise_steps 10 \
    --domain "object"

Run inference for 3D reconstruction using BiNI algorithm

First put the generated depth & normal npy files under the folder bini/data along with the segmented foreground mask (mask.png. If not set, it will utilize the whole image as mask). We provide two examples for the data structure. Then run the command as follow.

cd bini

python bilateral_normal_integration_numpy.py \
    --path ${input path} \
    -k ${k} \
    --iter ${iterations} \
    --tol ${tol}

# e.g. (paper setting)
python bilateral_normal_integration_numpy.py --path data/test_1 -k 2 --iter 50 --tol 1e-5

📝 TODO List

  • Add training codes.
  • Test on more different local environments.

📚 Related Work

We also encourage readers to follow these concurrent exciting works.

  • Marigold: a finetuned diffusion model for estimating monocular depth.
  • Wonder3D: generate multi-view normal maps and color images and reconstruct high-fidelity textured mesh.
  • HyperHuman: a latent structural diffusion and a structure-guided refiner for high-resolution human generation.
  • GenPercept: a finetuned UNet for a lot of downstream image understanding tasks.
  • Metric3D: a discriminative metric depth and surface normal estimator.

🔗 Citation

@article{fu2024geowizard,
  title={GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image},
  author={Fu, Xiao and Yin, Wei and Hu, Mu and Wang, Kaixuan and Ma, Yuexin and Tan, Ping and Shen, Shaojie and Lin, Dahua and Long, Xiaoxiao},
  journal={arXiv preprint arXiv:2403.12013},
  year={2024}
}

geowizard's People

Contributors

fuxiao0719 avatar jugghm avatar yvanyin avatar xxlong0 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.