Giter VIP home page Giter VIP logo

latent-nerf's Introduction

Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures

Text-guided image generation has progressed rapidly in recent years, inspiring major breakthroughs in text-guided shape generation. Recently, it has been shown that using score distillation, one can successfully text-guide a NeRF model to generate a 3D object. We adapt the score distillation to the publicly available, and computationally efficient, Latent Diffusion Models, which apply the entire diffusion process in a compact latent space of a pretrained autoencoder. As NeRFs operate in image space, a naΓ―ve solution for guiding them with latent score distillation would require encoding to the latent space at each guidance step. Instead, we propose to bring the NeRF to the latent space, resulting in a Latent-NeRF. Analyzing our Latent-NeRF, we show that while Text-to-3D models can generate impressive results, they are inherently unconstrained and may lack the ability to guide or enforce a specific 3D structure. To assist and direct the 3D generation, we propose to guide our Latent-NeRF using a Sketch-Shape: an abstract geometry that defines the coarse structure of the desired object. Then, we present means to integrate such a constraint directly into a Latent-NeRF. This unique combination of text and shape guidance allows for increased control over the generation process. We also show that latent score distillation can be successfully applied directly on 3D meshes. This allows for generating high-quality textures on a given geometry. Our experiments validate the power of our different forms of guidance and the efficiency of using latent rendering.

Description πŸ“œ

Official Implementation for "Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures".

TL;DR - We explore different ways of introducing shape-guidance for Text-to-3D and present three models: a purely text-guided Latent-NeRF, Latent-NeRF with soft shape guidance for more exact control over the generated shape, and Latent-Paint for texture generation for explicit shapes.

Recent Updates πŸ“°

  • 27.11.2022 - Code release

  • 14.11.2022 - Created initial repo

Latent-Paint 🎨

In the Latent-Paint application, a texture is generated for an explicit mesh directly on its texture map using stable-diffusion as a prior.

Here the geometry is used as a hard constraint where the generation process is tied to the given mesh and its parameterization.

Below we can see the progress of the generation process over the optimization process

To create such results, run the train_latent_paint script. Parameters are handled using pyrallis and can be passed from a config file or the cmd.

 python -m scripts.train_latent_paint --config_path demo_configs/latent_paint/goldfish.yaml

or alternatively

python -m scripts.train_latent_paint --log.exp_name 2022_11_22_goldfish --guide.text "A goldfish"  --guide.shape_path /nfs/private/gal/meshes/blub.obj

Sketch-Guided Latent-NeRF 🧸

Here we use a simple coarse geometry which we call a SketchShape to guide the generation process.

A SketchShape presents a soft constraint which guides the occupancy of a learned NeRF model but isn't constrained to its exact geometry.

A SketchShape can come in many forms, here are some extruded ones.

To create such results, run the train_latent_nerf script. Parameters are handled using pyrallis and can be passed from a config file or the cmd.

 python -m scripts.train_latent_nerf --config_path demo_configs/latent_nerf/lego_man.yaml

Or alternatively

python -m scripts.train_latent_nerf --log.exp_name '2022_11_25_lego_man' --guide.text 'a lego man' --guide.shape_path shapes/teddy.obj --render.nerf_type latent

Unconstrained Latent-NeRF 🏰

Here we apply a text-to-3D without any shape constraint similarly to dreamfusion and stable-dreamfusion.

We directly train the NeRF in latent space, so no encoding into the latent space is required during training.

To create such results, run the train_latent_nerf script. Parameters are handled using pyrallis and can be passed from a config file or the cmd.

 python -m scripts.train_latent_nerf --config_path demo_configs/latent_nerf/sand_castle.yaml

Or alternatively

python -m scripts.train_latent_nerf --log.exp_name 'sand_castle' --guide.text 'a highly detailed sand castle' --render.nerf_type latent

Textual Inversion 🐈

As our Latent-NeRF is supervised by Stable-Diffusion, we can also use Textual Inversion tokens as part of the input text prompt. This allows conditioning the object generation on specific objects and styles, defined only by input images.

For Textual-Inversion results use the guide.concept_name with a concept from the πŸ€— concept library. For example --guide.concept_name=cat-toy and then simply use the corresponding token in your --guide.text

Getting Started

Installation πŸ’Ύ

Install the common dependencies from the requirements.txt file

pip install -r requirements.txt

For Latent-NeRF with shape-guidance, additionally install igl

conda install -c conda-forge igl

For Latent-Paint, additionally install kaolin

 pip install git+https://github.com/NVIDIAGameWorks/kaolin

Note that you also need a πŸ€— token for StableDiffusion. First accept conditions for the model you want to use, default one is CompVis/stable-diffusion-v1-4. Then, add a TOKEN file access token to the root folder of this project, or use the huggingface-cli login command

Training πŸ‹οΈ

Scripts for training are available in the scripts/ folder, see above or in the demo_configs/ for some actual examples.

Meshes for shape-guidance are available under shapes/

Additional Tips and Tricks πŸͺ„

  • Check out the vis/train to see the actual rendering used during the optimization. You might want to play around with the guide.mesh_scale if the object looks too small or too large.

  • For Latent-NeRF with shape-guidance try changing guide.proximal_surface and optim.lambda_shape to control the strictness of the guidance

Repository structure

Path Description
Repository root folder
β”œΒ  demo_configs Configs for running specific experiments
β”œΒ  scripts The training scripts
β”œΒ  shapes Various shapes to use for shape-guidance
β”œ src The actual code for training and evaluation
β”‚Β  β”œΒ  latent_nerf Code for Latent-NeRF training
β”‚Β  β”‚Β  β”œΒ  configs Config structure for training
β”‚Β  β”‚Β  β”œΒ  models NeRF models
β”‚Β  β”‚Β  β”œΒ  raymarching The CUDA ray marching modules
β”‚Β  β”‚Β  β”œΒ  training The Trainer class and related code
β”‚Β  β”œΒ  latent_paint Code for Latent-Paint training
β”‚Β  β”‚Β  β”œΒ  configs Config structure for training
β”‚Β  β”‚Β  β”œΒ  models Textured-Mesh models
β”‚Β  β”‚Β  β”œΒ  training The Trainer class and related code

Acknowledgments

The Latent-NeRF code is heavily based on the stable-dreamfusion project, and the Latent-Paint code borrows from text2mesh.

Citation

If you use this code for your research, please cite our paper Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures

@article{metzer2022latent,
  title={Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures},
  author={Metzer, Gal and Richardson, Elad and Patashnik, Or and Giryes, Raja and Cohen-Or, Daniel},
  journal={arXiv preprint arXiv:2211.07600},
  year={2022}
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.