Giter VIP home page Giter VIP logo

eschernet's People

Contributors

kxhit avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

eschernet's Issues

Memory requirements

Hi, I'd like to know how much memory is required for EscherNet? An A100 or just RTX3090 with 24GB will be enough?

Generate target images from a single top-view reference image

Hi, thanks for your amazing work.

I am conducting a research about using one single top-view image to generate the entire object. I've tried many model, including zero-123 XL and stable zero-123, as SDS prior. However, none of them can faithfully generate the 3D object.

I have also try your model to generate 100 target views from 1 reference view. However, for building results, the output are not really good, especially for low-elevation views, as shown in attached figure. Do you think is it because of the model is not trained with top-view images?

Another question is that do you think is it possible or suitable to use EscherNet as SDS prior?

Again, thanks for your amazing work!

input
output
input
output

Why Use Camera Distance as a Dividing Form?

First of all, I want to express my gratitude for your outstanding research. I thoroughly enjoyed reading your paper.

However, I have a question regarding the encoding used in your work. I noticed that the encoding equation for [azimuth, elevation, orientation] differs from that of [camera distance]. Specifically, I am unsure why the camera distance requires a dividing form. Could you help me understand the reasoning behind this choice? I feel like I might be missing an important detail.

Thank you very much for your assistance.

image

Gradio Demo

Hi, great work!

Do you plan to release the code to do inference on real word objects?

file not found

train_eschernet.py cant't be found in the folder as shown in readme.md

Result on Franka dataset

Dear authors,
first of all, thank you for the amazing work.
I am trying to use the model on the Frianka16, but it's not clear to me how to do it.
First, in the dataset it seems that the annotated elevations are not correct, as I see images with elevation 0° having actual elevation greater than 0°.
Second, I would like to generate new views on a circular trajectory around the object with a fixed elevation.
I changed the code about the output poses using the same lines of code relative to the input views. However, the elevation of the produced images seems wrong to me.
This happens with the 6DoF model, while with 4DoF it does not work at all.
Could you please give me some suggestions on how to do correctly give the information about the angles?

Best regards,
Giuseppe

Dust3r pose processing

Dear authors,
Thank you for the great work! I just checked the online demo that you released few days ago. I have some questions regarding the use of Dust3r to compute initial input poses for EscherNet:

  1. Do you apply some modifications to computed output from Dust3r to obtain orthogonal canonical frame ? If yes could you provide more details about them?

  2. For reproducibility could you provide the Dust3r’s code line from which you take and save the computed poses to assign to each input image for EscherNet?

Thanks in advance

Question about intrinsics for 3D reconstruction

Hi! really nice work

I'm using Eschernet 6Dof and in my dataset I would need to use different intrinsics for different images, I guess it's not an issue for NeuS renderer

My question however is about both the intrinsics and the range used by Objaverse

downscale = 512 / 256.
self.fx = 560. / downscale
self.fy = 560. / downscale
self.intrinsic = torch.tensor([[self.fx, 0, 128., 0, self.fy, 128., 0, 0, 1.]], dtype=torch.float64).view(3, 3)

self.K = np.array([[280.,0.,128.],[0.,280.,128.],[0.,0.,1.]], dtype=np.float32)

Do I need to use those specific Objaverse intrinsics or is there a workaround?

All the best,

Alberto

Training details

Hello! Congratulations for the great work.
I have one question about the training process. In Section 3.1 you say "It builds upon an existing 2D diffusion model, inheriting its strong web-scale prior through large-scale training". However, in the rest of the paper, it is unclear if the overall architecture is trained from scratch on the Objaverse dataset (rendered as Zero123 does), or if it is fine-tuned by starting from some pre-trained modules of Stable Diffusion. Could you please clarify my doubts?
Thanks in advance

Novel view synthesis from a single image

Dear Authors
First, thank you for your interesting and amazing work.

I would like to do the Novel view synthesis task using my own single rgba image, in this case, should the data_type be Text2Img?
If I put only one image as input, I wonder if it will work with other data_type as well.

Thanks for the great work!!!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.