kxhit / eschernet Goto Github PK
View Code? Open in Web Editor NEW[CVPR2024 Oral] EscherNet: A Generative Model for Scalable View Synthesis
Home Page: https://kxhit.github.io/EscherNet
License: Other
[CVPR2024 Oral] EscherNet: A Generative Model for Scalable View Synthesis
Home Page: https://kxhit.github.io/EscherNet
License: Other
Hi, I'd like to know how much memory is required for EscherNet? An A100 or just RTX3090 with 24GB will be enough?
when come?
Hi, thanks for your amazing work.
I am conducting a research about using one single top-view image to generate the entire object. I've tried many model, including zero-123 XL and stable zero-123, as SDS prior. However, none of them can faithfully generate the 3D object.
I have also try your model to generate 100 target views from 1 reference view. However, for building results, the output are not really good, especially for low-elevation views, as shown in attached figure. Do you think is it because of the model is not trained with top-view images?
Another question is that do you think is it possible or suitable to use EscherNet as SDS prior?
Again, thanks for your amazing work!
First of all, I want to express my gratitude for your outstanding research. I thoroughly enjoyed reading your paper.
However, I have a question regarding the encoding used in your work. I noticed that the encoding equation for [azimuth
, elevation
, orientation
] differs from that of [camera distance
]. Specifically, I am unsure why the camera distance requires a dividing form. Could you help me understand the reasoning behind this choice? I feel like I might be missing an important detail.
Thank you very much for your assistance.
Hi, great work!
Do you plan to release the code to do inference on real word objects?
train_eschernet.py cant't be found in the folder as shown in readme.md
Dear authors,
first of all, thank you for the amazing work.
I am trying to use the model on the Frianka16, but it's not clear to me how to do it.
First, in the dataset it seems that the annotated elevations are not correct, as I see images with elevation 0° having actual elevation greater than 0°.
Second, I would like to generate new views on a circular trajectory around the object with a fixed elevation.
I changed the code about the output poses using the same lines of code relative to the input views. However, the elevation of the produced images seems wrong to me.
This happens with the 6DoF model, while with 4DoF it does not work at all.
Could you please give me some suggestions on how to do correctly give the information about the angles?
Best regards,
Giuseppe
Amazing work, congrats!
Dear authors,
Thank you for the great work! I just checked the online demo that you released few days ago. I have some questions regarding the use of Dust3r to compute initial input poses for EscherNet:
Do you apply some modifications to computed output from Dust3r to obtain orthogonal canonical frame ? If yes could you provide more details about them?
For reproducibility could you provide the Dust3r’s code line from which you take and save the computed poses to assign to each input image for EscherNet?
Thanks in advance
Hi! really nice work
I'm using Eschernet 6Dof and in my dataset I would need to use different intrinsics for different images, I guess it's not an issue for NeuS renderer
My question however is about both the intrinsics and the range used by Objaverse
Lines 82 to 85 in 569240f
EscherNet/3drecon/renderer/renderer.py
Line 493 in 569240f
Do I need to use those specific Objaverse intrinsics or is there a workaround?
All the best,
Alberto
Hello! Congratulations for the great work.
I have one question about the training process. In Section 3.1 you say "It builds upon an existing 2D diffusion model, inheriting its strong web-scale prior through large-scale training". However, in the rest of the paper, it is unclear if the overall architecture is trained from scratch on the Objaverse dataset (rendered as Zero123 does), or if it is fine-tuned by starting from some pre-trained modules of Stable Diffusion. Could you please clarify my doubts?
Thanks in advance
Dear Authors
First, thank you for your interesting and amazing work.
I would like to do the Novel view synthesis task using my own single rgba image, in this case, should the data_type be Text2Img?
If I put only one image as input, I wonder if it will work with other data_type as well.
Thanks for the great work!!!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.