Giter VIP home page Giter VIP logo

realfusion's People

Contributors

lukemelas avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

realfusion's Issues

'tuple' object has no attribute 'get_input_embeddings'<title>

Description

when I do as examples showing:
python3 main.py --O --image_path $DATA_DIR/rgba.png --learned_embeds_path $DATA_DIR/learned_embeds.bin --text "A high-resolution DSLR image of a $TOKEN" --pretrained_model_name_or_path "runwayml/stable-diffusion-v1-5"
it always comes out:
'lr': 0.001,
'lr_warmup': False,
'max_ray_batch': 4096,
'max_steps': 512,
'min_lr': 1e-06,
'min_near': 0.1,
'negative': '',
'noise_real_camera': 0.001,
'noise_real_camera_annealing': True,
'num_rays': 4096,
'num_steps': 64,
'optim': 'adamw',
'pose_angle': 75,
'pretrained_model_image_size': 512,
'pretrained_model_name_or_path': 'runwayml/stable-diffusion-v1-5',
'radius_range': (1.0, 1.5),
'radius_rot': 1.8,
'real_every': 1,
'real_iters': 0,
'replace_synthetic_camera_every': 10,
'replace_synthetic_camera_noise': 0.02,
'run_name': 'default',
'save_mesh': False,
'save_test_name': 'df_test',
'seed': 101,
'suppress_face': None,
'test': False,
'test_on_real_data': False,
'text': 'A high-resolution DSLR image of a cake_2',
'uniform_sphere_rate': 0.5,
'update_extra_interval': 16,
'upsample_steps': 32,
'wandb': False,
'warm_iters': 2000,
'workspace': 'outputs/default/2023-05-16--12-57-00--seed-101'}
Grid encoder level 0 has resolution 16 and params 4920
Grid encoder level 1 has resolution 22 and params 12168
Grid encoder level 2 has resolution 30 and params 29792
Grid encoder level 3 has resolution 40 and params 65536
Grid encoder level 4 has resolution 55 and params 65536
Grid encoder level 5 has resolution 74 and params 65536
Grid encoder level 6 has resolution 100 and params 65536
Grid encoder level 7 has resolution 135 and params 65536
Grid encoder level 8 has resolution 183 and params 65536
Grid encoder level 9 has resolution 248 and params 65536
Grid encoder level 10 has resolution 336 and params 65536
Grid encoder level 11 has resolution 455 and params 65536
Grid encoder level 12 has resolution 617 and params 65536
Grid encoder level 13 has resolution 836 and params 65536
Grid encoder level 14 has resolution 1134 and params 65536
Grid encoder level 15 has resolution 1536 and params 65536
NeRFNetwork(
(encoder): GridEncoder: input_dim=3 num_levels=16 level_dim=2 resolution=16 -> 1536 per_level_scale=1.3557 params=(898848, 2) gridtype=tiled align_corners=False interpolation=linear
(sigma_net): MLP(
(net): ModuleList(
(0): Linear(in_features=32, out_features=64, bias=True)
(1): Linear(in_features=64, out_features=64, bias=True)
(2): Linear(in_features=64, out_features=4, bias=True)
)
)
(encoder_bg): FreqEncoder: input_dim=3 degree=6 output_dim=39
(bg_net): MLP(
(net): ModuleList(
(0): Linear(in_features=39, out_features=64, bias=True)
(1): Linear(in_features=64, out_features=3, bias=True)
)
)
)
/home/hhn/.local/lib/python3.8/site-packages/diffusers/configuration_utils.py:135: FutureWarning: Accessing config attribute unet directly via 'StableDiffusionModel' object attribute is deprecated. Please access 'unet' over 'StableDiffusionModel's config object instead, e.g. 'scheduler.config.unet'.
deprecate("direct config name access", "1.0.0", deprecation_message, standard_warn=False)
/home/hhn/.local/lib/python3.8/site-packages/diffusers/configuration_utils.py:135: FutureWarning: Accessing config attribute text_encoder directly via 'StableDiffusionModel' object attribute is deprecated. Please access 'text_encoder' over 'StableDiffusionModel's config object instead, e.g. 'scheduler.config.text_encoder'.
deprecate("direct config name access", "1.0.0", deprecation_message, standard_warn=False)
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/hhn/realfusion/main.py:164 in │
│ │
│ 161 │
│ 162 │
│ 163 if name == 'main': │
│ ❱ 164 │ main() │
│ 165 │
│ │
│ /home/hhn/realfusion/main.py:103 in main │
│ │
│ 100 │ │ stable_diffusion_model = StableDiffusionModel.from_pretrained(opt.pretrained_mod │
│ 101 │ │ # import pdb;pdb.set_trace() │
│ 102 │ │ if opt.learned_embeds_path is not None: # add textual inversion tokens to model │
│ ❱ 103 │ │ │ add_tokens_to_model_from_path( │
│ 104 │ │ │ │ opt.learned_embeds_path, stable_diffusion_model.text_encoder, stable_dif │
│ 105 │ │ │ ) │
│ 106 │ │ guidance = StableDiffusion(stable_diffusion_model=stable_diffusion_model, device │
│ │
│ /home/hhn/realfusion/sd/utils.py:40 in add_tokens_to_model_from_path │
│ │
│ 37 │ │ tokenizer: CLIPTokenizer, override_token: Optional[Union[str, dict]] = None) -> │
│ 38 │ r"""Loads tokens from a file and adds them to the tokenizer and text encoder of a mo │
│ 39 │ learned_embeds: Mapping[str, Tensor] = torch.load(learned_embeds_path, map_location= │
│ ❱ 40 │ add_tokens_to_model(learned_embeds, text_encoder, tokenizer, override_token) │
│ 41 │
│ │
│ /home/hhn/realfusion/sd/utils.py:15 in add_tokens_to_model │
│ │
│ 12 │ # Loop over learned embeddings │
│ 13 │ new_tokens = [] │
│ 14 │ for token, embedding in learned_embeds.items(): │
│ ❱ 15 │ │ embedding = embedding.to(text_encoder.get_input_embeddings().weight.dtype) │
│ 16 │ │ if override_token is not None: │
│ 17 │ │ │ token = override_token if isinstance(override_token, str) else override_toke │
│ 18 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
AttributeError: 'tuple' object has no attribute 'get_input_embeddings'

Steps to Reproduce

As examples show:
the command is "export TOKEN="cake_2" # set this according to your textual inversion placeholder_token or use the trick below
export DATA_DIR=$PWD/examples/natural-images/cake_2

python main.py --O
--image_path $DATA_DIR/rgba.png
--learned_embeds_path $DATA_DIR/learned_embeds.bin
--text "A high-resolution DSLR image of a $TOKEN"
--pretrained_model_name_or_path "runwayml/stable-diffusion-v1-5""

Expected Behavior

Maybe I miss some key operation?

Environment

Ubuntu18.04, torch 2.0.0, CUDA 12.0

Textual Inversion code giving error

Description

Hi,
@lukemelas, great work. Wanted something like this for a while. Your model's accuracy is better than earlier versions of 2D to 3D models.

I am running all my code on Google Collab(free version). I am following the Readme, however, I encountered the following error at the Text Inversion step. I had to edit few lines to make it run but in no vain. @lukemelas or anyone could you kindly help me out in setting up the code?

I am uploading 2 screenshots for reference.

Thank you
Screenshot (32)
Screenshot (33)

Steps to Reproduce

.

Expected Behavior

I expected the given code to run as per readme document.

Environment

Google Collab, Python 3.10

Import Error: cannot import name 'narrow_tensor_by_index' from 'torch.distributed._shard._utils'

Description

Thanks for your excellent work. But I have a ImportError when I run main.py. Details of the error are as follows:
Traceback (most recent call last):
File "/root/autodl-tmp/realfusion-main/main.py", line 12, in
from nerf.trainer import Trainer
File "/root/autodl-tmp/realfusion-main/nerf/trainer.py", line 26, in
from sd.sd import StableDiffusion
File "/root/autodl-tmp/realfusion-main/sd/init.py", line 1, in
from .sd import StableDiffusion
File "/root/autodl-tmp/realfusion-main/sd/sd.py", line 4, in
from diffusers import AutoencoderKL, UNet2DConditionModel, PNDMScheduler
File "/root/miniconda3/envs/realfusion/lib/python3.9/site-packages/diffusers/init.py", line 3, in
from .configuration_utils import ConfigMixin
File "/root/miniconda3/envs/realfusion/lib/python3.9/site-packages/diffusers/configuration_utils.py", line 34, in
from .utils import (
File "/root/miniconda3/envs/realfusion/lib/python3.9/site-packages/diffusers/utils/init.py", line 21, in
from .accelerate_utils import apply_forward_hook
File "/root/miniconda3/envs/realfusion/lib/python3.9/site-packages/diffusers/utils/accelerate_utils.py", line 24, in
import accelerate
File "/root/miniconda3/envs/realfusion/lib/python3.9/site-packages/accelerate/init.py", line 3, in
from .accelerator import Accelerator
File "/root/miniconda3/envs/realfusion/lib/python3.9/site-packages/accelerate/accelerator.py", line 35, in
from .checkpointing import load_accelerator_state, load_custom_state, save_accelerator_state, save_custom_state
File "/root/miniconda3/envs/realfusion/lib/python3.9/site-packages/accelerate/checkpointing.py", line 24, in
from .utils import (
File "/root/miniconda3/envs/realfusion/lib/python3.9/site-packages/accelerate/utils/init.py", line 152, in
from .fsdp_utils import load_fsdp_model, load_fsdp_optimizer, save_fsdp_model, save_fsdp_optimizer
File "/root/miniconda3/envs/realfusion/lib/python3.9/site-packages/accelerate/utils/fsdp_utils.py", line 25, in
import torch.distributed.checkpoint as dist_cp
File "/root/miniconda3/envs/realfusion/lib/python3.9/site-packages/torch/distributed/checkpoint/init.py", line 7, in
from .state_dict_loader import load_state_dict
File "/root/miniconda3/envs/realfusion/lib/python3.9/site-packages/torch/distributed/checkpoint/state_dict_loader.py", line 10, in
from .default_planner import DefaultLoadPlanner
File "/root/miniconda3/envs/realfusion/lib/python3.9/site-packages/torch/distributed/checkpoint/default_planner.py", line 13, in
from torch.distributed._shard._utils import narrow_tensor_by_index
ImportError: cannot import name 'narrow_tensor_by_index' from 'torch.distributed._shard._utils' (/root/miniconda3/envs/realfusion/lib/python3.9/site-packages/torch/distributed/_shard/_utils.py)

And I find that the _utils.py doesn't have the function named narrow_tensor_by_index , but it has the function :def narrow_tensor(tensor: torch.Tensor, metadata: ShardMetadata)

Steps to Reproduce

I run the main.py as python main.py --O --image_path examples/natural-images/bird_2/rgba.png --learned_embeds_path examples/natural-images/bird_2/learned_embeds.bin --text "A high-resolution DSLR image of a bird" --pretrained_model_name_or_path "runwayml/stable-diffusion-v1-5"

Expected Behavior

I want to know if I have installed the wrong version of pytorch or something else.

Environment

Ubuntu 18.04 Pytorch1.12.1 CUDA 11.3

Saving mesh not working

Description

Hi @lukemelas, thanks for open sourcing your great work! Upon reproducing your examples, cat_statue specifically, I noticed that --save_mesh option does not work as expected on testing time. Here's the output:

[INFO] Trainer: df | 2023-04-10_14-25-15 | cuda | fp16 | outputs/default/2023-04-10--13-49-56--seed-101/
[INFO] num parameters: 1_806_983
[INFO] num parameters w/ grad: 1_806_983
[INFO] Loading latest checkpoint ...
[INFO] Latest checkpoint is outputs/default/2023-04-10--13-49-56--seed-101/checkpoints/df.pth
[INFO] loaded model.
[INFO] load at epoch 50, global step 5000
==> Start Test, save results to outputs/default/2023-04-10--13-49-56--seed-101/results
100% 100/100 [00:05<00:00, 18.63it/s]rgb
opacity
depth
/home/tongwang/workspace/realfusion/nerf/trainer.py:590: RuntimeWarning: invalid value encountered in cast
preds_np = (preds_tensor.detach().cpu().numpy() * 255).astype(np.uint8)
normals
textureless
grid
==> Finished Test.
100% 100/100 [00:06<00:00, 15.85it/s]
==> Saving mesh to outputs/default/2023-04-10--13-49-56--seed-101/mesh
==> Finished saving mesh.

Although the log says it is "==>Saving mesh", but it did not actually save the mesh. Could you please look into this issue? Thanks in advance.

Steps to Reproduce

python main.py --workspace $model_path --O --test --save_mesh

Expected Behavior

save a textured mesh

Environment

ubuntu 20.04, torch1.13+cu116

Loss description

Thanks for your great effort.

I have some Q on the loss.

  1. Is the loss image in page 6 is the reconstuction loss in reference view? (the input image)

  2. Is loss(rec, mask) is differen with loss(mask)?

  3. the loss(rec, mask) which is L2 between O and M, the O is computed using neural field which is real number. then use it with 0, 1 mask M?

Thank you.

janus problem

Thank you for your work!
I found that there was a janus problem in the result of the "teddy bear" example, because the textural inversion was over-fitted to its front view, resulting in not producing the correct rear view, which could lead to janus problems.
Will there be such a phenomenon in the official result?

df_ep0100_rgb.mp4

My command is as follows:

export MODEL_NAME="/home/litaiqing/.cache/huggingface/hub/models--runwayml--stable-diffusion-v1-5/snapshots/aa9ba505e1973ae5cd05f5aedd345178f52f8e6a"
export DATA_DIR="/media/ssd_1/litaiqing/realfusion-main/examples/natural-images/teddy_bear_1"
export OUTPUT_DIR="/media/ssd_1/litaiqing/realfusion-main/examples/natural-images/teddy_bear_1"

CUDA_VISIBLE_DEVICES=7 python textual_inversion.py \
  --pretrained_model_name_or_path=$MODEL_NAME \
  --train_data_dir=$DATA_DIR \
  --learnable_property="object" \
  --placeholder_token="_teddy_bear_" \
  --initializer_token="teddy " \
  --resolution=512 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=4 \
  --max_train_steps=3000 \
  --learning_rate=5.0e-04 --scale_lr \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --output_dir=$OUTPUT_DIR \
  --use_augmentations

export DATA_DIR=/media/ssd_1/litaiqing/realfusion-main/examples/natural-images/teddy_bear_1

CUDA_VISIBLE_DEVICES=7 python main.py --O \
    --image_path $DATA_DIR/rgba.png \
    --learned_embeds_path $DATA_DIR/learned_embeds.bin \
    --text "a  _teddy_bear_" \
    --pretrained_model_name_or_path "/home/litaiqing/.cache/huggingface/hub/models--runwayml--stable-diffusion-v1-5/snapshots/aa9ba505e1973ae5cd05f5aedd345178f52f8e6a"

how to train on custom dataset?

I have a .vtp 3d mesh model, and 2d rendered images in different views. Is it possible to train the real fusion based on such data?
Thank you :-)

TypeError AnnealedValue = list[float]

Description

Running python main.py --0 gives an error with TypeError.

  File "main.py", line 9, in <module>
    from nerf.provider_image import NeRFDataset as ImageOnlyNeRFDataset
  File "/data/ruihan/projects/realfusion/nerf/provider_image.py", line 13, in <module>
    from .options import Options
  File "/data/ruihan/projects/realfusion/nerf/options.py", line 13, in <module>
    AnnealedValue = list[float]
TypeError: 'type' object is not subscriptable```

### Steps to Reproduce

python main.py --0

### Expected Behavior

Run the code. 

### Environment

Ubuntu 20.04, cudatoolkit 11.3.1, pytorch 1.11.0, transformers 4.28.1, diffusers 0.15.1

Code release

Hi authors, thank you very much for your great work! It is pretty appealing for me. When will you release your code?

Inquiry about evaluation and dataset

I am very interested in your research and have some questions about your paper. Firstly, it seems that the evaluation part is missing in the code. Are you planning to release the related code? Secondly, were the quantitative results reported in the paper measured on images that include the background? Also, could you please provide information on the shading method used in the experiment? Thirdly, could you provide information on which 21 images were used for performance measurement in each of the 7 categories mentioned in the paper? If I have missed that part in the code, it would be very helpful if you could let me know where to refer to.

Thanks in advance

No modules named mcubes

Description

When we run the scripts python main.py --0, we get the error No module named 'mcubes'. And then we try to use pip install mcubes to install this package, but there is no matching distribution found for mcubes. We also search the package in PyPI, but we cannot find this package. So how can be install this package ?

Steps to Reproduce

Run the scripts python main.py --0, and get the error No module named 'mcubes'

Expected Behavior

Expect to install this package mcubes.

Environment

Ubuntu18.04, Cuda10.2

Package version

Hi @lukemelas, thanks for releasing your great work!

Could you please release the version of the packages you are using as well (e.g. direct export of your python environment)? I am trying out your code but there are some random issues. For example, stable_diffusion_model.text_encoder now gives a tuple of strings instead of the clip text model (looks like a version issue).

lovely_tensors

Could you please help me out here? What is lovely_tensors? I've never heard that before, and what is that for?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.