eladrich / latent-nerf Goto Github PK

Official Implementation for "Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures"

License: MIT License

Python 55.63% C++ 0.87% Cuda 41.33% C 2.17%

latent-nerf's Introduction

Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures

Text-guided image generation has progressed rapidly in recent years, inspiring major breakthroughs in text-guided shape generation. Recently, it has been shown that using score distillation, one can successfully text-guide a NeRF model to generate a 3D object. We adapt the score distillation to the publicly available, and computationally efficient, Latent Diffusion Models, which apply the entire diffusion process in a compact latent space of a pretrained autoencoder. As NeRFs operate in image space, a naïve solution for guiding them with latent score distillation would require encoding to the latent space at each guidance step. Instead, we propose to bring the NeRF to the latent space, resulting in a Latent-NeRF. Analyzing our Latent-NeRF, we show that while Text-to-3D models can generate impressive results, they are inherently unconstrained and may lack the ability to guide or enforce a specific 3D structure. To assist and direct the 3D generation, we propose to guide our Latent-NeRF using a Sketch-Shape: an abstract geometry that defines the coarse structure of the desired object. Then, we present means to integrate such a constraint directly into a Latent-NeRF. This unique combination of text and shape guidance allows for increased control over the generation process. We also show that latent score distillation can be successfully applied directly on 3D meshes. This allows for generating high-quality textures on a given geometry. Our experiments validate the power of our different forms of guidance and the efficiency of using latent rendering.

Description 📜

Official Implementation for "Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures".

TL;DR - We explore different ways of introducing shape-guidance for Text-to-3D and present three models: a purely text-guided Latent-NeRF, Latent-NeRF with soft shape guidance for more exact control over the generated shape, and Latent-Paint for texture generation for explicit shapes.

Recent Updates 📰

27.11.2022 - Code release
14.11.2022 - Created initial repo

Latent-Paint 🎨

In the Latent-Paint application, a texture is generated for an explicit mesh directly on its texture map using stable-diffusion as a prior.

Here the geometry is used as a hard constraint where the generation process is tied to the given mesh and its parameterization.

Below we can see the progress of the generation process over the optimization process

To create such results, run the train_latent_paint script. Parameters are handled using pyrallis and can be passed from a config file or the cmd.

 python -m scripts.train_latent_paint --config_path demo_configs/latent_paint/goldfish.yaml

or alternatively

python -m scripts.train_latent_paint --log.exp_name 2022_11_22_goldfish --guide.text "A goldfish"  --guide.shape_path /nfs/private/gal/meshes/blub.obj

Sketch-Guided Latent-NeRF 🧸

Here we use a simple coarse geometry which we call a SketchShape to guide the generation process.

A SketchShape presents a soft constraint which guides the occupancy of a learned NeRF model but isn't constrained to its exact geometry.

A SketchShape can come in many forms, here are some extruded ones.

To create such results, run the train_latent_nerf script. Parameters are handled using pyrallis and can be passed from a config file or the cmd.

 python -m scripts.train_latent_nerf --config_path demo_configs/latent_nerf/lego_man.yaml

Or alternatively

python -m scripts.train_latent_nerf --log.exp_name '2022_11_25_lego_man' --guide.text 'a lego man' --guide.shape_path shapes/teddy.obj --render.nerf_type latent

Unconstrained Latent-NeRF 🏰

Here we apply a text-to-3D without any shape constraint similarly to dreamfusion and stable-dreamfusion.

We directly train the NeRF in latent space, so no encoding into the latent space is required during training.

To create such results, run the train_latent_nerf script. Parameters are handled using pyrallis and can be passed from a config file or the cmd.

 python -m scripts.train_latent_nerf --config_path demo_configs/latent_nerf/sand_castle.yaml

Or alternatively

python -m scripts.train_latent_nerf --log.exp_name 'sand_castle' --guide.text 'a highly detailed sand castle' --render.nerf_type latent

Textual Inversion 🐈

As our Latent-NeRF is supervised by Stable-Diffusion, we can also use Textual Inversion tokens as part of the input text prompt. This allows conditioning the object generation on specific objects and styles, defined only by input images.

For Textual-Inversion results use the guide.concept_name with a concept from the 🤗 concept library. For example --guide.concept_name=cat-toy and then simply use the corresponding token in your --guide.text

Getting Started

Installation 💾

Install the common dependencies from the requirements.txt file

pip install -r requirements.txt

For Latent-NeRF with shape-guidance, additionally install igl

conda install -c conda-forge igl

For Latent-Paint, additionally install kaolin

 pip install git+https://github.com/NVIDIAGameWorks/kaolin

Note that you also need a 🤗 token for StableDiffusion. First accept conditions for the model you want to use, default one is CompVis/stable-diffusion-v1-4. Then, add a TOKEN file access token to the root folder of this project, or use the huggingface-cli login command

Training 🏋️

Scripts for training are available in the scripts/ folder, see above or in the demo_configs/ for some actual examples.

Meshes for shape-guidance are available under shapes/

Additional Tips and Tricks 🪄

Check out the vis/train to see the actual rendering used during the optimization. You might want to play around with the guide.mesh_scale if the object looks too small or too large.
For Latent-NeRF with shape-guidance try changing guide.proximal_surface and optim.lambda_shape to control the strictness of the guidance

Repository structure

Path	Description
	Repository root folder
├ demo_configs	Configs for running specific experiments
├ scripts	The training scripts
├ shapes	Various shapes to use for shape-guidance
├ src	The actual code for training and evaluation
│ ├ latent_nerf	Code for `Latent-NeRF` training
│ │ ├ configs	Config structure for training
│ │ ├ models	NeRF models
│ │ ├ raymarching	The CUDA ray marching modules
│ │ ├ training	The `Trainer` class and related code
│ ├ latent_paint	Code for `Latent-Paint` training
│ │ ├ configs	Config structure for training
│ │ ├ models	Textured-Mesh models
│ │ ├ training	The `Trainer` class and related code

Acknowledgments

The Latent-NeRF code is heavily based on the stable-dreamfusion project, and the Latent-Paint code borrows from text2mesh.

Citation

If you use this code for your research, please cite our paper Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures

@article{metzer2022latent,
  title={Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures},
  author={Metzer, Gal and Richardson, Elad and Patashnik, Or and Giryes, Raja and Cohen-Or, Daniel},
  journal={arXiv preprint arXiv:2211.07600},
  year={2022}
}

latent-nerf's People

Contributors

Stargazers

Watchers

latent-nerf's Issues

High resolution problems.

It seems like that I encode a high resolution(over 3000 x 4000) will still be OOM, I mean not for train, just for testing the limit of encoding.

Is the setting of learning rate sensitive?

Hi, I set the epoch from 5000 to 10000, and adjusted the learning rate from 1e-3 to 5e-4, and got quite good results. However, this learning rate seems not good for every object. I am a bit confused with the setting of lr and epoch. Can somebody give me some advice? Thx a lot!

Orientation loss

Thanks for your great research.
Compared with the raw version of Dreamfusion(or Stable Dreamfusion), it seems that the orientation loss has been ignored in Latent-NeRF. This loss about normal is designed for better 3D geometry. Do you ignore this loss for some reason?

TypeError: grid_encode_forward(): incompatible function arguments.

I tried to run the command for snadcastle as per the readme which was:
python -m scripts.train_latent_nerf --config_path demo_configs/latent_nerf/sand_castle.yaml
I have installed gridencoder from the stable-dreamfusion repo, not sure how to resolve the error.
But it throws a TypeError: at Grid_encode_forward call.
Below is the trace output
`/usr/lib/python3.8/runpy.py:192 in _run_module_as_main │
│ │
│ 189 │ main_globals = sys.modules["main"].dict │
│ 190 │ if alter_argv: │
│ 191 │ │ sys.argv[0] = mod_spec.origin │
│ ❱ 192 │ return _run_code(code, main_globals, None, │
│ 193 │ │ │ │ │ "main", mod_spec) │
│ 194 │
│ 195 def run_module(mod_name, init_globals=None, │
│ │
│ /usr/lib/python3.8/runpy.py:85 in run_code │
│ │
│ 82 │ │ │ │ │ loader = loader, │
│ 83 │ │ │ │ │ package = pkg_name, │
│ 84 │ │ │ │ │ spec = mod_spec) │
│ ❱ 85 │ exec(code, run_globals) │
│ 86 │ return run_globals │
│ 87 │
│ 88 def run_module_code(code, init_globals=None, │
│ │
│ /home/vghorpad/stable-diff/latent-nerf/scripts/train_latent_nerf.py:17 in │
│ │
│ 14 │ │ trainer.train() │
│ 15 │
│ 16 if name == 'main': │
│ ❱ 17 │ main() │
│ │
│ /home/vghorpad/stable-diff/venv_sdfusion20/lib/python3.8/site-packages/pyrallis/argparsing.py:15 │
│ 8 in wrapper_inner │
│ │
│ 155 │ │ │ argspec = inspect.getfullargspec(fn) │
│ 156 │ │ │ argtype = argspec.annotations[argspec.args[0]] │
│ 157 │ │ │ cfg = parse(config_class=argtype, config_path=config_path) │
│ ❱ 158 │ │ │ response = fn(cfg, *args, **kwargs) │
│ 159 │ │ │ return response │
│ 160 │ │ │
│ 161 │ │ return wrapper_inner │
│ │
│ /home/vghorpad/stable-diff/latent-nerf/scripts/train_latent_nerf.py:14 in main │
│ │
│ 11 │ if cfg.log.eval_only: │
│ 12 │ │ trainer.full_eval() │
│ 13 │ else: │
│ ❱ 14 │ │ trainer.train() │
│ 15 │
│ 16 if name == 'main': │
│ 17 │ main() │
│ │
│ /home/vghorpad/stable-diff/latent-nerf/src/latent_nerf/training/trainer.py:124 in train │
│ │
│ 121 │ def train(self): │
│ 122 │ │ logger.info('Starting training ^^') │
│ 123 │ │ # Evaluate the initialization │
│ ❱ 124 │ │ self.evaluate(self.dataloaders['val'], self.eval_renders_path) │
│ 125 │ │ self.nerf.train() │
│ 126 │ │ │
│ 127 │ │ pbar = tqdm(total=self.cfg.optim.iters, initial=self.train_step, │
│ │
│ /home/vghorpad/stable-diff/latent-nerf/src/latent_nerf/training/trainer.py:173 in evaluate │
│ │
│ 170 │ │ │
│ 171 │ │ for i, data in enumerate(dataloader): │
│ 172 │ │ │ with torch.cuda.amp.autocast(enabled=self.cfg.optim.fp16): │
│ ❱ 173 │ │ │ │ preds, preds_depth, preds_normals = self.eval_render(data) │
│ 174 │ │ │ │
│ 175 │ │ │ pred, pred_depth, pred_normals = tensor2numpy(preds[0]), tensor2numpy(preds │
│ 176 │ │ │ │ preds_normals[0]) │
│ │
│ /home/vghorpad/stable-diff/latent-nerf/src/latent_nerf/training/trainer.py:261 in eval_render │
│ │
│ 258 │ │ ambient_ratio = data['ambient_ratio'] if 'ambient_ratio' in data else 1.0 │
│ 259 │ │ light_d = data['light_d'] if 'light_d' in data else None │
│ 260 │ │ │
│ ❱ 261 │ │ outputs = self.nerf.render(rays_o, rays_d, staged=True, perturb=perturb, light_d │
│ 262 │ │ │ │ │ │ │ │ ambient_ratio=ambient_ratio, shading=shading, force_a │
│ 263 │ │ │
│ 264 │ │ pred_depth = outputs['depth'].reshape(B, H, W) │
│ │
│ /home/vghorpad/stable-diff/latent-nerf/src/latent_nerf/models/renderer.py:410 in render │
│ │
│ 407 │ │ │ results['weights_sum'] = weights_sum │
│ 408 │ │ │
│ 409 │ │ else: │
│ ❱ 410 │ │ │ results = _run(rays_o, rays_d, **kwargs) │
│ 411 │ │ │
│ 412 │ │ return results │
│ │
│ /home/vghorpad/stable-diff/latent-nerf/src/latent_nerf/models/renderer.py:282 in run_cuda │
│ │
│ 279 │ │ │ │ │
│ 280 │ │ │ │ xyzs, dirs, deltas = self.raymarching.march_rays(n_alive, n_step, rays_a │
│ 281 │ │ │ │ │
│ ❱ 282 │ │ │ │ sigmas, rgbs, normals = self(xyzs, dirs, light_d, ratio=ambient_ratio, s │
│ 283 │ │ │ │ self.raymarching.composite_rays(n_alive, n_step, rays_alive, rays_t, sig │
│ 284 │ │ │ │ │
│ 285 │ │ │ │ rays_alive = rays_alive[rays_alive >= 0] │
│ │
│ /home/vghorpad/stable-diff/venv_sdfusion20/lib/python3.8/site-packages/torch/nn/modules/module.p │
│ y:1190 in _call_impl │
│ │
│ 1187 │ │ # this function, and just call forward. │
│ 1188 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │
│ 1189 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1190 │ │ │ return forward_call(*input, **kwargs) │
│ 1191 │ │ # Do not call functions when jit is used │
│ 1192 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1193 │ │ if self._backward_hooks or _global_backward_hooks: │
│ │
│ /home/vghorpad/stable-diff/latent-nerf/src/latent_nerf/models/network_grid.py:108 in forward │
│ │
│ 105 │ │ │
│ 106 │ │ if shading == 'albedo': │
│ 107 │ │ │ # no need to query normal │
│ ❱ 108 │ │ │ sigma, color = self.common_forward(x) │
│ 109 │ │ │ normal = None │
│ 110 │ │ │
│ 111 │ │ else: │
│ │
│ /home/vghorpad/stable-diff/latent-nerf/src/latent_nerf/models/network_grid.py:62 in │
│ common_forward │
│ │
│ 59 │ │ # x: [N, 3], in [-bound, bound] │
│ 60 │ │ │
│ 61 │ │ # sigma │
│ ❱ 62 │ │ h = self.encoder(x, bound=self.bound) │
│ 63 │ │ │
│ 64 │ │ h = self.sigma_net(h) │
│ 65 │
│ │
│ /home/vghorpad/stable-diff/venv_sdfusion20/lib/python3.8/site-packages/torch/nn/modules/module.p │
│ y:1190 in _call_impl │
│ │
│ 1187 │ │ # this function, and just call forward. │
│ 1188 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │
│ 1189 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1190 │ │ │ return forward_call(*input, **kwargs) │
│ 1191 │ │ # Do not call functions when jit is used │
│ 1192 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1193 │ │ if self._backward_hooks or _global_backward_hooks: │
│ │
│ /home/vghorpad/stable-diff/latent-nerf/src/latent_nerf/models/encoders/gridencoder/grid.py:149 │
│ in forward │
│ │
│ 146 │ │ prefix_shape = list(inputs.shape[:-1]) │
│ 147 │ │ inputs = inputs.view(-1, self.input_dim) │
│ 148 │ │ │
│ ❱ 149 │ │ outputs = grid_encode(inputs, self.embeddings, self.offsets, self.per_level_scal │
│ 150 │ │ outputs = outputs.view(prefix_shape + [self.output_dim]) │
│ 151 │ │ │
│ 152 │ │ #print('outputs', outputs.shape, outputs.dtype, outputs.min().item(), outputs.ma │
│ │
│ /home/vghorpad/stable-diff/venv_sdfusion20/lib/python3.8/site-packages/torch/cuda/amp/autocast_m │
│ ode.py:97 in decorate_fwd │
│ │
│ 94 │ def decorate_fwd(*args, **kwargs): │
│ 95 │ │ if cast_inputs is None: │
│ 96 │ │ │ args[0]._fwd_used_autocast = torch.is_autocast_enabled() │
│ ❱ 97 │ │ │ return fwd(*args, **kwargs) │
│ 98 │ │ else: │
│ 99 │ │ │ autocast_context = torch.is_autocast_enabled() │
│ 100 │ │ │ args[0]._fwd_used_autocast = False │
│ │
│ /home/vghorpad/stable-diff/latent-nerf/src/latent_nerf/models/encoders/gridencoder/grid.py:49 in │
│ forward │
│ │
│ 46 │ │ else: │
│ 47 │ │ │ dy_dx = None │
│ 48 │ │ │
│ ❱ 49 │ │ _backend.grid_encode_forward(inputs, embeddings, offsets, outputs, B, D, C, L, S │
│ 50 │ │ │
│ 51 │ │ # permute back to [B, L * C] │
│ 52 │ │ outputs = outputs.permute(1, 0, 2).reshape(B, L * C) │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: grid_encode_forward(): incompatible function arguments. The following argument types are supported:
1. (arg0: at::Tensor, arg1: at::Tensor, arg2: at::Tensor, arg3: at::Tensor, arg4: int, arg5: int, arg6: int, arg7: int, arg8: float, arg9: int, arg10: Optional[at::Tensor], arg11: int, arg12: bool, arg13: int) -> None

Invoked with: tensor([[0.5000, 0.5000, 0.5000],
[0.5000, 0.5000, 0.5000],
[0.5000, 0.5000, 0.5000],
...,
[0.5000, 0.5000, 0.5000],
[0.5000, 0.5000, 0.5000],
[0.5000, 0.5000, 0.5000]], device='cuda:0'), tensor([[-7.7486e-07, 5.3644e-05],
[-8.2314e-05, -7.3612e-05],
[-3.8505e-05, 2.6822e-05],
...,
[ 2.7418e-05, -5.0962e-05],
[ 6.2227e-05, 7.5281e-05],
[ 4.2677e-05, 9.2626e-05]], device='cuda:0', dtype=torch.float16), tensor([ 0, 4920, 18744, 51512, 136696, 352696, 876984, 1401272,
1925560, 2449848, 2974136, 3498424, 4022712, 4547000, 5071288, 5595576,
6119864], device='cuda:0', dtype=torch.int32), tensor([[[0., 0.],
[0., 0.],
[0., 0.],
...,
[0., 0.],
[0., 0.],
[0., 0.]],

    [[0., 0.],
     [0., 0.],
     [0., 0.],
     ...,
     [0., 0.],
     [0., 0.],
     [0., 0.]],

    [[0., 0.],
     [0., 0.],
     [0., 0.],
     ...,
     [0., 0.],
     [0., 0.],
     [0., 0.]],

    ...,

    [[0., 0.],
     [0., 0.],
     [0., 0.],
     ...,
     [0., 0.],
     [0., 0.],
     [0., 0.]],

    [[0., 0.],
     [0., 0.],
     [0., 0.],
     ...,
     [0., 0.],
     [0., 0.],
     [0., 0.]],

    [[0., 0.],
     [0., 0.],
     [0., 0.],
     ...,
     [0., 0.],
     [0., 0.],
     [0., 0.]]], device='cuda:0', dtype=torch.float16), 16512, 3, 2, 16, 0.4666666666666666, 16, None, 1, False`

Get strange result

I use the default codebase and command for training:

python -m scripts.train_latent_nerf --log.exp_name 'sand_castle' --guide.text 'a highly detailed sand castle' --render.nerf_type latent

And I get this strange result '5001_rgb.mp4'. Does anyone have some advice?

5001_rgb.mp4

How to modify nerf_dataset.py to fit my data?

This is perfect work! If I want to use my data(eg, some images) as input, how to modify nerf_dataset.py. Can you give me some advice? Thank you very much!

How to finetune the model from latent type?

Dear eladrich,
Thx for your great repo! I want to finetune latent nerf as your paper said by changing nerf type from "latent" to "latnet_tune" nad setting optim ckpt path. However it coms up with model shape mismatch error as following:

Is there something wrong with my steps?

Some doubts about using rgb and rgbrefinement

In latent-paint mode, when i use rgb to train which means "texture-rgb-mesh", i get some strange results.

1. In latent mode["texture-mesh"], the result seems reasonable

2.In rgb mode["texture-rgb-mesh"], the results is weired

CUDA out of memory issue

Hi, I keep getting Cuda out of memory...
Which parameters should I change for this?

Question about shading

Dear eladrich,
Thanks for your great repo! I tried to add a start_shading_iter. However I find that the result of adding "lambertian" shading is kinda strange. I just wanna ask what may cause this problem? And also I find that if finetune the training epoch from 5000 to 10000, sometimes it will a produce a result which whole color is black? Has anybody faced the same problem before?

Question: Light color in latent space?

Thanks for the great research

Seems that diffuse reflectance (a.k.a dot product shading) should work with any number of channels, but
I'm wondering what the light color, and ambient light color should be in latent space. Dream Fusion uses light color [.9, .9, .9] and ambient light color [.1, .1, .1] in RGB space. How does this translate to latent space?

Conda environment

Is there a specific conda environment that's able to be provided? Getting errors on a p3.2xl which is what the V100 is on an aws EC2

ModuleNotFoundError: No module named '_gridencoder' ninja build stopped

When running the unconstrained Latent-NeRF for text-to-3D, demo command (below), I get a runtime error during the tilegrid encoding

run command: python3 -m scripts.train_latent_nerf --config_path demo_configs/latent_nerf/sand_castle.yaml

Error:

import _gridencoder as _backend
ModuleNotFoundError: No module named '_gridencoder'
During handling of the above exception, another exception occurred:

21 errors detected in the compilation of "/tmp/tmpxft_000039fb_00000000-6_gridencoder.cpp1.ii".
ninja: build stopped: subcommand failed.

The "teddy" sketch shape cannot be loaded properly

Hi, thanks for your great work! When I tried training the sketched-guided latent nerf myself, I found the "teddy.obj" cannot be loaded properly due to unclear reasons, and the result I got with demo_configs/lego_man.yaml looks like trained from scratch and has no relation with the sketch shape. Then I tried loading the "teddy.obj" file with the trimesh library and found that it was loaded as a Scene object instead of a Trimesh object.

I have fixed this issue by converting the Scene into a Trimesh and exporting the result into a new obj file according to this link. But I still hope you can check the "teddy.obj" file in case someone else meets the same issue.

Thanks for your interesting work again!

Reproducing the german shepherd example

Hello, I'm trying to reproduce the german shepherd example in the paper by using the animal.obj file in the shapes folder but it's far from the quality presented in the paper. I'm just modifying the demo_config for lego man with the animal.obj, is there anything else I need to add to reproduce it?

will there be a colab notebook?

[Question] Export to Mesh

What is the easiest way to convert a NeRF generated using this project into a mesh (.obj, for example)?

Hi,when this code public?

command to run Textual Inversion

Hi,

Thank you for your nice work and open source.

Could you give me an example command to run Textual Inversion? I run it with this command python -m scripts.train_latent_nerf --log.exp_name 'textual inversion backpack' --guide.text 'a backpack that looks like *' --render.nerf_type latent --guide.concept_name=cat-toy, but cannot get a good result, is there some problem with my command?

Hope to receive your reply. Thanks.