lukemelas / realfusion Goto Github PK
View Code? Open in Web Editor NEWOfficial code for "RealFusion: 360° Reconstruction of Any Object from a Single Image" (CVPR 2023)
License: Apache License 2.0
Official code for "RealFusion: 360° Reconstruction of Any Object from a Single Image" (CVPR 2023)
License: Apache License 2.0
when I do as examples showing:
python3 main.py --O --image_path $DATA_DIR/rgba.png --learned_embeds_path $DATA_DIR/learned_embeds.bin --text "A high-resolution DSLR image of a $TOKEN" --pretrained_model_name_or_path "runwayml/stable-diffusion-v1-5"
it always comes out:
'lr': 0.001,
'lr_warmup': False,
'max_ray_batch': 4096,
'max_steps': 512,
'min_lr': 1e-06,
'min_near': 0.1,
'negative': '',
'noise_real_camera': 0.001,
'noise_real_camera_annealing': True,
'num_rays': 4096,
'num_steps': 64,
'optim': 'adamw',
'pose_angle': 75,
'pretrained_model_image_size': 512,
'pretrained_model_name_or_path': 'runwayml/stable-diffusion-v1-5',
'radius_range': (1.0, 1.5),
'radius_rot': 1.8,
'real_every': 1,
'real_iters': 0,
'replace_synthetic_camera_every': 10,
'replace_synthetic_camera_noise': 0.02,
'run_name': 'default',
'save_mesh': False,
'save_test_name': 'df_test',
'seed': 101,
'suppress_face': None,
'test': False,
'test_on_real_data': False,
'text': 'A high-resolution DSLR image of a cake_2',
'uniform_sphere_rate': 0.5,
'update_extra_interval': 16,
'upsample_steps': 32,
'wandb': False,
'warm_iters': 2000,
'workspace': 'outputs/default/2023-05-16--12-57-00--seed-101'}
Grid encoder level 0 has resolution 16 and params 4920
Grid encoder level 1 has resolution 22 and params 12168
Grid encoder level 2 has resolution 30 and params 29792
Grid encoder level 3 has resolution 40 and params 65536
Grid encoder level 4 has resolution 55 and params 65536
Grid encoder level 5 has resolution 74 and params 65536
Grid encoder level 6 has resolution 100 and params 65536
Grid encoder level 7 has resolution 135 and params 65536
Grid encoder level 8 has resolution 183 and params 65536
Grid encoder level 9 has resolution 248 and params 65536
Grid encoder level 10 has resolution 336 and params 65536
Grid encoder level 11 has resolution 455 and params 65536
Grid encoder level 12 has resolution 617 and params 65536
Grid encoder level 13 has resolution 836 and params 65536
Grid encoder level 14 has resolution 1134 and params 65536
Grid encoder level 15 has resolution 1536 and params 65536
NeRFNetwork(
(encoder): GridEncoder: input_dim=3 num_levels=16 level_dim=2 resolution=16 -> 1536 per_level_scale=1.3557 params=(898848, 2) gridtype=tiled align_corners=False interpolation=linear
(sigma_net): MLP(
(net): ModuleList(
(0): Linear(in_features=32, out_features=64, bias=True)
(1): Linear(in_features=64, out_features=64, bias=True)
(2): Linear(in_features=64, out_features=4, bias=True)
)
)
(encoder_bg): FreqEncoder: input_dim=3 degree=6 output_dim=39
(bg_net): MLP(
(net): ModuleList(
(0): Linear(in_features=39, out_features=64, bias=True)
(1): Linear(in_features=64, out_features=3, bias=True)
)
)
)
/home/hhn/.local/lib/python3.8/site-packages/diffusers/configuration_utils.py:135: FutureWarning: Accessing config attribute unet
directly via 'StableDiffusionModel' object attribute is deprecated. Please access 'unet' over 'StableDiffusionModel's config object instead, e.g. 'scheduler.config.unet'.
deprecate("direct config name access", "1.0.0", deprecation_message, standard_warn=False)
/home/hhn/.local/lib/python3.8/site-packages/diffusers/configuration_utils.py:135: FutureWarning: Accessing config attribute text_encoder
directly via 'StableDiffusionModel' object attribute is deprecated. Please access 'text_encoder' over 'StableDiffusionModel's config object instead, e.g. 'scheduler.config.text_encoder'.
deprecate("direct config name access", "1.0.0", deprecation_message, standard_warn=False)
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/hhn/realfusion/main.py:164 in │
│ │
│ 161 │
│ 162 │
│ 163 if name == 'main': │
│ ❱ 164 │ main() │
│ 165 │
│ │
│ /home/hhn/realfusion/main.py:103 in main │
│ │
│ 100 │ │ stable_diffusion_model = StableDiffusionModel.from_pretrained(opt.pretrained_mod │
│ 101 │ │ # import pdb;pdb.set_trace() │
│ 102 │ │ if opt.learned_embeds_path is not None: # add textual inversion tokens to model │
│ ❱ 103 │ │ │ add_tokens_to_model_from_path( │
│ 104 │ │ │ │ opt.learned_embeds_path, stable_diffusion_model.text_encoder, stable_dif │
│ 105 │ │ │ ) │
│ 106 │ │ guidance = StableDiffusion(stable_diffusion_model=stable_diffusion_model, device │
│ │
│ /home/hhn/realfusion/sd/utils.py:40 in add_tokens_to_model_from_path │
│ │
│ 37 │ │ tokenizer: CLIPTokenizer, override_token: Optional[Union[str, dict]] = None) -> │
│ 38 │ r"""Loads tokens from a file and adds them to the tokenizer and text encoder of a mo │
│ 39 │ learned_embeds: Mapping[str, Tensor] = torch.load(learned_embeds_path, map_location= │
│ ❱ 40 │ add_tokens_to_model(learned_embeds, text_encoder, tokenizer, override_token) │
│ 41 │
│ │
│ /home/hhn/realfusion/sd/utils.py:15 in add_tokens_to_model │
│ │
│ 12 │ # Loop over learned embeddings │
│ 13 │ new_tokens = [] │
│ 14 │ for token, embedding in learned_embeds.items(): │
│ ❱ 15 │ │ embedding = embedding.to(text_encoder.get_input_embeddings().weight.dtype) │
│ 16 │ │ if override_token is not None: │
│ 17 │ │ │ token = override_token if isinstance(override_token, str) else override_toke │
│ 18 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
AttributeError: 'tuple' object has no attribute 'get_input_embeddings'
As examples show:
the command is "export TOKEN="cake_2" # set this according to your textual inversion placeholder_token or use the trick below
export DATA_DIR=$PWD/examples/natural-images/cake_2
python main.py --O
--image_path $DATA_DIR/rgba.png
--learned_embeds_path $DATA_DIR/learned_embeds.bin
--text "A high-resolution DSLR image of a $TOKEN"
--pretrained_model_name_or_path "runwayml/stable-diffusion-v1-5""
Maybe I miss some key operation?
Ubuntu18.04, torch 2.0.0, CUDA 12.0
Hi,
@lukemelas, great work. Wanted something like this for a while. Your model's accuracy is better than earlier versions of 2D to 3D models.
I am running all my code on Google Collab(free version). I am following the Readme, however, I encountered the following error at the Text Inversion step. I had to edit few lines to make it run but in no vain. @lukemelas or anyone could you kindly help me out in setting up the code?
I am uploading 2 screenshots for reference.
.
I expected the given code to run as per readme document.
Google Collab, Python 3.10
ModuleNotFoundError: No module named '_gridencoder'
This error appears, how do I resolve it?
AttributeError: 'tuple' object has no attribute 'get_input_embeddings'
Thanks for your excellent work. But I have a ImportError when I run main.py. Details of the error are as follows:
Traceback (most recent call last):
File "/root/autodl-tmp/realfusion-main/main.py", line 12, in
from nerf.trainer import Trainer
File "/root/autodl-tmp/realfusion-main/nerf/trainer.py", line 26, in
from sd.sd import StableDiffusion
File "/root/autodl-tmp/realfusion-main/sd/init.py", line 1, in
from .sd import StableDiffusion
File "/root/autodl-tmp/realfusion-main/sd/sd.py", line 4, in
from diffusers import AutoencoderKL, UNet2DConditionModel, PNDMScheduler
File "/root/miniconda3/envs/realfusion/lib/python3.9/site-packages/diffusers/init.py", line 3, in
from .configuration_utils import ConfigMixin
File "/root/miniconda3/envs/realfusion/lib/python3.9/site-packages/diffusers/configuration_utils.py", line 34, in
from .utils import (
File "/root/miniconda3/envs/realfusion/lib/python3.9/site-packages/diffusers/utils/init.py", line 21, in
from .accelerate_utils import apply_forward_hook
File "/root/miniconda3/envs/realfusion/lib/python3.9/site-packages/diffusers/utils/accelerate_utils.py", line 24, in
import accelerate
File "/root/miniconda3/envs/realfusion/lib/python3.9/site-packages/accelerate/init.py", line 3, in
from .accelerator import Accelerator
File "/root/miniconda3/envs/realfusion/lib/python3.9/site-packages/accelerate/accelerator.py", line 35, in
from .checkpointing import load_accelerator_state, load_custom_state, save_accelerator_state, save_custom_state
File "/root/miniconda3/envs/realfusion/lib/python3.9/site-packages/accelerate/checkpointing.py", line 24, in
from .utils import (
File "/root/miniconda3/envs/realfusion/lib/python3.9/site-packages/accelerate/utils/init.py", line 152, in
from .fsdp_utils import load_fsdp_model, load_fsdp_optimizer, save_fsdp_model, save_fsdp_optimizer
File "/root/miniconda3/envs/realfusion/lib/python3.9/site-packages/accelerate/utils/fsdp_utils.py", line 25, in
import torch.distributed.checkpoint as dist_cp
File "/root/miniconda3/envs/realfusion/lib/python3.9/site-packages/torch/distributed/checkpoint/init.py", line 7, in
from .state_dict_loader import load_state_dict
File "/root/miniconda3/envs/realfusion/lib/python3.9/site-packages/torch/distributed/checkpoint/state_dict_loader.py", line 10, in
from .default_planner import DefaultLoadPlanner
File "/root/miniconda3/envs/realfusion/lib/python3.9/site-packages/torch/distributed/checkpoint/default_planner.py", line 13, in
from torch.distributed._shard._utils import narrow_tensor_by_index
ImportError: cannot import name 'narrow_tensor_by_index' from 'torch.distributed._shard._utils' (/root/miniconda3/envs/realfusion/lib/python3.9/site-packages/torch/distributed/_shard/_utils.py)
And I find that the _utils.py doesn't have the function named narrow_tensor_by_index , but it has the function :def narrow_tensor(tensor: torch.Tensor, metadata: ShardMetadata)
I run the main.py as python main.py --O --image_path examples/natural-images/bird_2/rgba.png --learned_embeds_path examples/natural-images/bird_2/learned_embeds.bin --text "A high-resolution DSLR image of a bird" --pretrained_model_name_or_path "runwayml/stable-diffusion-v1-5"
I want to know if I have installed the wrong version of pytorch or something else.
Ubuntu 18.04 Pytorch1.12.1 CUDA 11.3
Hi @lukemelas, thanks for open sourcing your great work! Upon reproducing your examples, cat_statue specifically, I noticed that --save_mesh option does not work as expected on testing time. Here's the output:
[INFO] Trainer: df | 2023-04-10_14-25-15 | cuda | fp16 | outputs/default/2023-04-10--13-49-56--seed-101/
[INFO] num parameters: 1_806_983
[INFO] num parameters w/ grad: 1_806_983
[INFO] Loading latest checkpoint ...
[INFO] Latest checkpoint is outputs/default/2023-04-10--13-49-56--seed-101/checkpoints/df.pth
[INFO] loaded model.
[INFO] load at epoch 50, global step 5000
==> Start Test, save results to outputs/default/2023-04-10--13-49-56--seed-101/results
100% 100/100 [00:05<00:00, 18.63it/s]rgb
opacity
depth
/home/tongwang/workspace/realfusion/nerf/trainer.py:590: RuntimeWarning: invalid value encountered in cast
preds_np = (preds_tensor.detach().cpu().numpy() * 255).astype(np.uint8)
normals
textureless
grid
==> Finished Test.
100% 100/100 [00:06<00:00, 15.85it/s]
==> Saving mesh to outputs/default/2023-04-10--13-49-56--seed-101/mesh
==> Finished saving mesh.
Although the log says it is "==>Saving mesh", but it did not actually save the mesh. Could you please look into this issue? Thanks in advance.
python main.py --workspace $model_path --O --test --save_mesh
save a textured mesh
ubuntu 20.04, torch1.13+cu116
How can I get a model to train?
Could you tell me which .py file I should use?
I want to replicate your work. Can this method be executed on RTX3090
Thanks for your great effort.
I have some Q on the loss.
Is the loss image in page 6 is the reconstuction loss in reference view? (the input image)
Is loss(rec, mask) is differen with loss(mask)?
the loss(rec, mask) which is L2 between O and M, the O is computed using neural field which is real number. then use it with 0, 1 mask M?
Thank you.
Thank you for your work!
I found that there was a janus problem in the result of the "teddy bear" example, because the textural inversion was over-fitted to its front view, resulting in not producing the correct rear view, which could lead to janus problems.
Will there be such a phenomenon in the official result?
My command is as follows:
export MODEL_NAME="/home/litaiqing/.cache/huggingface/hub/models--runwayml--stable-diffusion-v1-5/snapshots/aa9ba505e1973ae5cd05f5aedd345178f52f8e6a"
export DATA_DIR="/media/ssd_1/litaiqing/realfusion-main/examples/natural-images/teddy_bear_1"
export OUTPUT_DIR="/media/ssd_1/litaiqing/realfusion-main/examples/natural-images/teddy_bear_1"
CUDA_VISIBLE_DEVICES=7 python textual_inversion.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--train_data_dir=$DATA_DIR \
--learnable_property="object" \
--placeholder_token="_teddy_bear_" \
--initializer_token="teddy " \
--resolution=512 \
--train_batch_size=1 \
--gradient_accumulation_steps=4 \
--max_train_steps=3000 \
--learning_rate=5.0e-04 --scale_lr \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--output_dir=$OUTPUT_DIR \
--use_augmentations
export DATA_DIR=/media/ssd_1/litaiqing/realfusion-main/examples/natural-images/teddy_bear_1
CUDA_VISIBLE_DEVICES=7 python main.py --O \
--image_path $DATA_DIR/rgba.png \
--learned_embeds_path $DATA_DIR/learned_embeds.bin \
--text "a _teddy_bear_" \
--pretrained_model_name_or_path "/home/litaiqing/.cache/huggingface/hub/models--runwayml--stable-diffusion-v1-5/snapshots/aa9ba505e1973ae5cd05f5aedd345178f52f8e6a"
Could you kindly provide the original high-resolution images? Now, only the high-resolution RGBA images are provided but others are all resized to 512x512.
I have a .vtp 3d mesh model, and 2d rendered images in different views. Is it possible to train the real fusion based on such data?
Thank you :-)
I dont know if we can run this code on server without GPU, I only have CPU server, thanks!
Running python main.py --0 gives an error with TypeError.
File "main.py", line 9, in <module>
from nerf.provider_image import NeRFDataset as ImageOnlyNeRFDataset
File "/data/ruihan/projects/realfusion/nerf/provider_image.py", line 13, in <module>
from .options import Options
File "/data/ruihan/projects/realfusion/nerf/options.py", line 13, in <module>
AnnealedValue = list[float]
TypeError: 'type' object is not subscriptable```
### Steps to Reproduce
python main.py --0
### Expected Behavior
Run the code.
### Environment
Ubuntu 20.04, cudatoolkit 11.3.1, pytorch 1.11.0, transformers 4.28.1, diffusers 0.15.1
Hi authors, thank you very much for your great work! It is pretty appealing for me. When will you release your code?
I am very interested in your research and have some questions about your paper. Firstly, it seems that the evaluation part is missing in the code. Are you planning to release the related code? Secondly, were the quantitative results reported in the paper measured on images that include the background? Also, could you please provide information on the shading method used in the experiment? Thirdly, could you provide information on which 21 images were used for performance measurement in each of the 7 categories mentioned in the paper? If I have missed that part in the code, it would be very helpful if you could let me know where to refer to.
Thanks in advance
When we run the scripts python main.py --0
, we get the error No module named 'mcubes'
. And then we try to use pip install mcubes
to install this package, but there is no matching distribution found for mcubes. We also search the package in PyPI, but we cannot find this package. So how can be install this package ?
Run the scripts python main.py --0
, and get the error No module named 'mcubes'
Expect to install this package mcubes.
Ubuntu18.04, Cuda10.2
Hi @lukemelas, thanks for releasing your great work!
Could you please release the version of the packages you are using as well (e.g. direct export of your python environment)? I am trying out your code but there are some random issues. For example, stable_diffusion_model.text_encoder now gives a tuple of strings instead of the clip text model (looks like a version issue).
Could you please help me out here? What is lovely_tensors? I've never heard that before, and what is that for?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.