kerrj / lerf Goto Github PK

View Code? Open in Web Editor NEW

628.0 14.0 59.0 651 KB

Code for LERF: Language Embedded Radiance Fields

Home Page: https://www.lerf.io/

License: MIT License

Python 100.00%

lerf's Introduction

LERF: Language Embedded Radiance Fields

This is the official implementation for LERF.

Installation

LERF follows the integration guidelines described here for custom methods within Nerfstudio.

0. Install Nerfstudio dependencies

Follow these instructions up to and including "tinycudann" to install dependencies and create an environment

1. Clone this repo

git clone https://github.com/kerrj/lerf

2. Install this repo as a python package

Navigate to this folder and run python -m pip install -e .

3. Run `ns-install-cli`

Checking the install

Run ns-train -h: you should see a list of "subcommands" with lerf, lerf-big, and lerf-lite included among them.

Using LERF

Now that LERF is installed you can play with it!

Launch training with ns-train lerf --data <data_folder>. This specifies a data folder to use. For more details, see Nerfstudio documentation.
Connect to the viewer by forwarding the viewer port (we use VSCode to do this), and click the link to viewer.nerf.studio provided in the output of the train script
Within the viewer, you can type text into the textbox, then select the relevancy_0 output type to visualize relevancy maps.

Relevancy Map Normalization

By default, the viewer shows raw relevancy scaled with the turbo colormap. As values lower than 0.5 correspond to irrelevant regions, we recommend setting the range parameter to (-1.0, 1.0). To match the visualization from the paper, check the Normalize tick-box, which stretches the values to use the full colormap.

The images below show the rgb, raw, centered, and normalized output views for the query "Lily".

Resolution

The Nerfstudio viewer dynamically changes resolution to achieve a desired training throughput.

To increase resolution, pause training. Rendering at high resolution (512 or above) can take a second or two, so we recommend rendering at 256px

`lerf-big` and `lerf-lite`

If your GPU is struggling on memory, we provide a lerf-lite implementation that reduces the LERF network capacity and number of samples along rays. If you find you still need to reduce memory footprint, the most impactful parameters for memory are num_lerf_samples, hashgrid levels, and hashgrid size.

lerf-big provides a larger model that uses ViT-L/14 instead of ViT-B/16 for those with large memory GPUs.

Extending LERF

Be mindful that code for visualization will change as more features are integrated into Nerfstudio, so if you fork this repo and build off of it, check back regularly for extra changes.

Issues

Please open Github issues for any installation/usage problems you run into. We've tried to support as broad a range of GPUs as possible with lerf-lite, but it might be necessary to provide even more low-footprint versions. Thank you!

Known TODOs

Integrate into ns-render commands to render videos from the command line with custom prompts

Using custom image encoders

We've designed the code to modularly accept any image encoder that implements the interface in BaseImageEncoder (image_encoder.py). An example of different encoder implementations can be seen in clip_encoder.py vs openclip_encoder.py, which implement OpenAI's CLIP and OpenCLIP respectively.

Code structure

(TODO expand this section) The main file to look at for editing and building off LERF is lerf.py, which extends the Nerfacto model from Nerfstudio, adds an additional language field, losses, and visualization. The CLIP and DINO pre-processing are carried out by pyramid_interpolator.py and dino_dataloader.py.

Bibtex

If you find this useful, please cite the paper!

@inproceedings{lerf2023,
 author = {Kerr, Justin and Kim, Chung Min and Goldberg, Ken and Kanazawa, Angjoo and Tancik, Matthew},
 title = {LERF: Language Embedded Radiance Fields},
 booktitle = {International Conference on Computer Vision (ICCV)},
 year = {2023},
}

lerf's People

Contributors

Stargazers

Watchers

lerf's Issues

Render stuck at low resolution for a long time

My viewport gets stuck at a 30x52 resolution for a long time (around 10 minutes?), even after training is finished, while vanilla Nerfacto can render at 512 resolution in a few seconds. I know that my GTX 1080i on Windows is subpar for the task at hand, but is Lerf supposed to be this much slower than Nerfacto?

Rendering images for evaluation

Thank you for open-sourcing your great work. I was wondering if there currently exists functionality to render images and activation maps for evaluation. If so, could you provide the evaluation code?

Thank you

Need help: We cannot reproduce some scenes

Hello, @kerrj! Thank you for sharing your work!

When training with LERF on original datasets, we got weird views like below.

How to get VSCode working with NerfStudio

What do you mean by?
Connect to the viewer by forwarding the viewer port (we use VSCode to do this), and click the link to viewer.nerf.studio provided in the output of the train script. How do you visualize in VSCode?Is experience different in VSCode vs browser?

load func in the class PyramidEmbeddingDataloader

Hi,

I noticed that the load function does not assign the cache value to the self.data_dict in the class PyramidEmbeddingDataloader. Is it a bug? if not, why?

    def load(self):
        # don't create anything, PatchEmbeddingDataloader will create itself
        cache_info_path = self.cache_path.with_suffix(".info")

        # check if cache exists
        if not cache_info_path.exists():
            raise FileNotFoundError

        # if config is different, remove all cached content
        with open(cache_info_path, "r") as f:
            cfg = json.loads(f.read())
        if cfg != self.cfg:
            for f in os.listdir(self.cache_path):
                os.remove(os.path.join(self.cache_path, f))
            raise ValueError("Config mismatch")

        raise FileNotFoundError  # trigger create

Thank you for your time!
Best,
Chris

uniform scales

Hi, thanks for the interesting work! How to set the value of scale when using _uniform_scales? Or how do just set one scale to train lerf?

    def __call__(self, img_points, scale=None):     
        if scale is None:
            return self._random_scales(img_points)
        else:
            return self._uniform_scales(img_points, scale)    

    def _uniform_scales(self, img_points, scale):
        # import pdb; pdb.set_trace()
        scale_bin = torch.floor(
            (scale - self.tile_sizes[0]) / (self.tile_sizes[-1] - self.tile_sizes[0]) * (self.tile_sizes.shape[0] - 1)
        ).to(torch.int64)
        scale_weight = (scale - self.tile_sizes[scale_bin]) / (
            self.tile_sizes[scale_bin + 1] - self.tile_sizes[scale_bin]
        )
        interp_lst = torch.stack([interp(img_points) for interp in self.data_dict.values()])
        point_inds = torch.arange(img_points.shape[0])
        interp = torch.lerp(
            interp_lst[scale_bin, point_inds],
            interp_lst[scale_bin + 1, point_inds],
            torch.Tensor([scale_weight]).half().to(self.device)[..., None],
        )
        return interp / interp.norm(dim=-1, keepdim=True), scale

Connecting to the nerf viewer

Thank you for the amazing work. I am bit new to nerf studio. Can you please guide me how can I do the following?
Connect to the viewer by forwarding the viewer port (we use VSCode to do this), and click the link to viewer.nerf.studio provided in the output of the train script

Scene Redering isn't working on Nerfstudio

Hi, thanks for the wonderful project.

I have followed the Lerf detailed instruction for installation on the latest version of the Nerfstudio.
Take the "bookstore" dataset for example. The training does go on with multi-view images but the scene rendering is not working. I don't know if the visualization phase has not been yet fully adopted when integrated to Nerfstudio. With the same dataset, Nerfactor does render the full scene.

Correct me if I am wrong or something.
Again, thanks for significant work.

No textbox in viewer

Hi Justin,

Thanks for the great work and the code release for LERF!

I tried to follow the instructions in this repo to run lerf and have successfully trained a lerf-lite model on my custom data. However, even if I build custom viewer as you suggested and use yarn start to launch and connect it with my local training port, when I open the viewer, it looks like a regular nerfstudio viewer and I cannot find a text box as you showed in the demo to search even I selected relevancy_0 field. The following is a screenshot (the resolution is pretty low, I'll use a better machine to render later)

Can you kindly suggest where could I did wrong here?

Best,
JD

Possible issue with image size

In https://github.com/kerrj/lerf/blob/main/lerf/data/lerf_datamanager.py#L100 image size is used as images.shape[2:4] however the shape of images is Nx3xHxW. which makes image size equal to [H, W].
In PatchEmbeddingDataloader center_x is defined using cfg["image_shape"][0] and center_y is defined using cfg["image_shape"][1].

Am I missing something or is it a bug?

LERF fails with brandenburg-gate

This works (Nerfacto)
ns-train nerfacto --data data\phototourism\brandenburg-gate phototourism-data

This fails
ns-train lerf --data data\phototourism\brandenburg-gate phototourism-data
with

RuntimeError: Sizes of tensors must match except in dimension 0. Expected size 685 but got size 1028 for tensor number 1
in the list.

Minimize relevancy score instead of maximize

Hi, thanks for the excellent work! I have a question regarding your implementation:

lerf/lerf/encoders/openclip_encoder.py

Line 96 in 3b2cb90

best_id = softmax[..., 0].argmin(dim=1) # rays x 2

And in your paper Sec.3.5 (Relevancy Score), you stated:

Intuitively, this score represents how much closer the rendered embedding is towards the query embedding compared to the canonical embeddings.

My understanding about your inline equation and code is:
you try to pick $\phi^i_{canon}$ that is closer to $\phi_{lang}$ compared to $\phi_{lang}$ from $\phi_{quer}$, because minimization over $i$ means maximization of similarity between $\phi^i_{canon}$ and $\phi_{lang}$.

My question is:
why is this minimization instead of maximization? I think we are looking for $\phi_{lang}$ that best matches $\phi_{quer}$ instead of $\phi^i_{canon}$, right? Is it because we want the embedding to fit to both $\phi_{quer}$ and $\phi^i_{canon}$ at the same time? From my experiment, I do see that results getting worse if I change min to max, but could you explain a little bit more please?

Much thanks!

Ground-truth labels for localization evaluation

Dear authors of LERF, thanks for sharing this phenomenal work!
In section 4.3 of your paper, you "label bounding boxes for 72 objects across 5 scenes". I checked the dataset link and it seems that the ground-truth labels are not included. I wonder if you can share a labeled version of dataset so that people can potentially reproduce numbers in the paper.
Thanks!

ns-install-cli creates _ns-export with error message.

I followed the instructions for installing lerf with nefstudio in Conda env.
Steps 0, 1,2 and 3 ( Run ns-install-cli)
Everything went well.
But when I try to activate the conda env,
I get this message:

Warning:: command not found
Unable: command not found
libio_e57.so:: command not found
-bash: ~/anaconda3/envs/lerf_nerfstudio/lib/python3.8/site-packages/scripts/completions/bash/_ns-export: line 6: syntax error near unexpected token (' -bash: ~/anaconda3/envs/lerf_nerfstudio/lib/python3.8/site-packages/scripts/completions/bash/_ns-export: line 6: Cannot load library ~/anaconda3/envs/lerf_nerfstudio/lib/python3.8/site-packages/pymeshlab/lib/plugins/libio_e57.so: (/lib/x86_64-linux-gnu/libp11-kit.so.0: undefined symbol: ffi_type_pointer, version LIBFFI_BASE_7.0)'

This is due to ' __ns-export' not generated properly. But I did not any error message in step 3. ns-install-cli --mode install.

Any suggestion. Should I just ignore it?
I can just comment those warning messages lines for now.

How to load the torch models from internet if there is bad network connection?

Thank you for your great work!

When trying to extract dino features, we need to load model from github through the code:

lerf/lerf/data/utils/dino_extractor.py

Lines 67 to 82 in 3b2cb90

 if 'dino' in model_type: 

 model = torch.hub.load('facebookresearch/dino:main', model_type) 

 else: # model from timm -- load weights from timm to dino model (enables working on arbitrary size images). 

 temp_model = timm.create_model(model_type, pretrained=True) 

 model_type_dict = { 

 'vit_small_patch16_224': 'dino_vits16', 

 'vit_small_patch8_224': 'dino_vits8', 

 'vit_base_patch16_224': 'dino_vitb16', 

 'vit_base_patch8_224': 'dino_vitb8' 

 } 

 model = torch.hub.load('facebookresearch/dino:main', model_type_dict[model_type]) 

 temp_state_dict = temp_model.state_dict() 

 del temp_state_dict['head.weight'] 

 del temp_state_dict['head.bias'] 

 model.load_state_dict(temp_state_dict) 

 return model

but I cannot find a way to do that without available network. Do you have any alternative methods?

Warning from patch_embedding_dataloader.py - results in slow performance.

While training and exporting I see tehse warning messages.

** Warning - see below **
/home/ubuntu/lerf/lerf/data/utils/patch_embedding_dataloader.py:81: UserWarning: torch.searchsorted(): input value tensor is non-contiguous, this will lower
the performance due to extra data copy when converting non-contiguous tensor to contiguous, please use contiguous input value tensor if possible. This
message will only appear once per program. (Triggered internally at ../aten/src/ATen/native/BucketizationUtils.h:33.)
x_ind = torch.searchsorted(self.center_x, img_points_x, side="left") - 1

** Command used **
ns-export pointcloud --load-config /home/karun/lerf/outputs/TowerData/lerf/2023-05-09_130121/config.yml --output-dir exports/Tower9BH/ --num-points 1000000 --remove-outliers True --estimate-normals False --use-bounding-box True --bounding-box-min -1 -1 -1 --bounding-box-max 1 1 1

problem about nerfstudio training

outputs\ramen\dino.info
Traceback (most recent call last):
File "D:\MoCheng\lerf\lerf\data\utils\feature_dataloader.py", line 55, in try_load
self.load()
File "D:\MoCheng\lerf\lerf\data\utils\feature_dataloader.py", line 38, in load
raise FileNotFoundError
FileNotFoundError

and the last line is

urllib.error.HTTPError: HTTP Error 403: rate limit exceeded

ns-install-cli error

Hello!
When I try to run step 3 ns-install-cli , it shows the following error:
$HOME is not set. Exiting. install.py:353
I'm not sure how to fix this error, and I followed the installation process up to step 3 but am stuck here. Can you please help me on this?
Thanks!

use ns-render , but oom???

RAM to run LERF -- Problem -- Any thoughts?

Hi all,

I am attempting to train a LERF model on a custom dataset containing approximately 2000 images. When I run the 'ns-train' command, my 32GB of RAM becomes fully utilized, resulting in the termination of the process. Previously, I have successfully trained standard NeRF models using NeRF Studio on the same dataset without any issues. Therefore, I am wondering whether 32GB of RAM is insufficient for training LERF models, or if I might be missing a specific command necessary for LERF training. I have also tried reducing the number of images in my dataset, but this approach did not resolve the issue. The command I am using for training is:

ns-train lerf --output-dir --data data/colmap/

Thank you so much for your help.

Error when execute ns-install-cli

Hi there. Nice jobs.
But when I try to install lerf after step 3 ns-install-cli, I receive this error:

subprocess.CalledProcessError: Command '['ns-train', '--tyro-print-completion', 'zsh']' returned non-zero exit status 1.

This does not allow me to continue to the next step, ns-train -h, but if I continue receive another error:

AssertionError: pipeline.datamanager.dataparserwas provided a default value of type <class 'nerfstudio.data.dataparsers.nerfstudio_dataparser.NerfstudioDataParserConfig'> but no matching subcommand was found. A type may be missing in the Union type declaration forpipeline.datamanager.dataparser, which is currently set to typing.Union[typing_extensions.Annotated[nerfstudio.data.dataparsers.nerfstudio_dataparser.NerfstudioDataParserConfig, _SubcommandConfiguration(name='nerfstudio-data', default=NerfstudioDataParserConfig(_target=<class 'nerfstudio.data.dataparsers.nerfstudio_dataparser.Nerfstudio'>, data=PosixPath('.'), scale_factor=1.0, downscale_factor=None, scene_scale=1.0, orientation_method='up', center_method='poses', auto_scale_poses=True, train_split_fraction=0.9, depth_unit_scale_factor=0.001), description='', prefix_name=False)], typing_extensions.Annotated[nerfstudio.data.dataparsers.minimal_dataparser.MinimalDataParserConfig, _SubcommandConfiguration(name='minimal-parser', default=MinimalDataParserConfig(_target=<class 'nerfstudio.data.dataparsers.minimal_dataparser.MinimalDataParser'>, data=PosixPath('/home/nikhil/nerfstudio-main/tests/data/lego_test/minimal_parser')), description='', prefix_name=False)], typing_extensions.Annotated[nerfstudio.data.dataparsers.arkitscenes_dataparser.ARKitScenesDataParserConfig, _SubcommandConfiguration(name='arkit-data', default=ARKitScenesDataParserConfig(_target=<class 'nerfstudio.data.dataparsers.arkitscenes_dataparser.ARKitScenes'>, data=PosixPath('data/ARKitScenes/3dod/Validation/41069021'), scale_factor=1.0, scene_scale=1.0, center_method='poses', auto_scale_poses=True, train_split_fraction=0.9, depth_unit_scale_factor=0.001), description='', prefix_name=False)], typing_extensions.Annotated[nerfstudio.data.dataparsers.blender_dataparser.BlenderDataParserConfig, _SubcommandConfiguration(name='blender-data', default=BlenderDataParserConfig(_target=<class 'nerfstudio.data.dataparsers.blender_dataparser.Blender'>, data=PosixPath('data/blender/lego'), scale_factor=1.0, alpha_color='white'), description='', prefix_name=False)], typing_extensions.Annotated[nerfstudio.data.dataparsers.instant_ngp_dataparser.InstantNGPDataParserConfig, _SubcommandConfiguration(name='instant-ngp-data', default=InstantNGPDataParserConfig(_target=<class 'nerfstudio.data.dataparsers.instant_ngp_dataparser.InstantNGP'>, data=PosixPath('data/ours/posterv2'), scene_scale=0.3333), description='', prefix_name=False)], typing_extensions.Annotated[nerfstudio.data.dataparsers.nuscenes_dataparser.NuScenesDataParserConfig, _SubcommandConfiguration(name='nuscenes-data', default=NuScenesDataParserConfig(_target=<class 'nerfstudio.data.dataparsers.nuscenes_dataparser.NuScenes'>, data=PosixPath('scene-0103'), data_dir=PosixPath('/mnt/local/NuScenes'), version='v1.0-mini', cameras=('FRONT',), mask_dir=None, train_split_fraction=0.9, verbose=False), description='', prefix_name=False)], typing_extensions.Annotated[nerfstudio.data.dataparsers.dnerf_dataparser.DNeRFDataParserConfig, _SubcommandConfiguration(name='dnerf-data', default=DNeRFDataParserConfig(_target=<class 'nerfstudio.data.dataparsers.dnerf_dataparser.DNeRF'>, data=PosixPath('data/dnerf/lego'), scale_factor=1.0, alpha_color='white'), description='', prefix_name=False)], typing_extensions.Annotated[nerfstudio.data.dataparsers.phototourism_dataparser.PhototourismDataParserConfig, _SubcommandConfiguration(name='phototourism-data', default=PhototourismDataParserConfig(_target=<class 'nerfstudio.data.dataparsers.phototourism_dataparser.Phototourism'>, data=PosixPath('data/phototourism/brandenburg-gate'), scale_factor=3.0, alpha_color='white', train_split_fraction=0.9, scene_scale=1.0, orientation_method='up', center_method='poses', auto_scale_poses=True), description='', prefix_name=False)], typing_extensions.Annotated[nerfstudio.data.dataparsers.dycheck_dataparser.DycheckDataParserConfig, _SubcommandConfiguration(name='dycheck-data', default=DycheckDataParserConfig(_target=<class 'nerfstudio.data.dataparsers.dycheck_dataparser.Dycheck'>, data=PosixPath('data/iphone/mochi-high-five'), scale_factor=5.0, alpha_color='white', downscale_factor=1, scene_box_bound=1.5), description='', prefix_name=False)], typing_extensions.Annotated[nerfstudio.data.dataparsers.scannet_dataparser.ScanNetDataParserConfig, _SubcommandConfiguration(name='scannet-data', default=ScanNetDataParserConfig(_target=<class 'nerfstudio.data.dataparsers.scannet_dataparser.ScanNet'>, data=PosixPath('data/scannet/scene0423_02'), scale_factor=1.0, scene_scale=1.0, center_method='poses', auto_scale_poses=True, train_split_fraction=0.9, depth_unit_scale_factor=0.001), description='', prefix_name=False)], typing_extensions.Annotated[nerfstudio.data.dataparsers.sdfstudio_dataparser.SDFStudioDataParserConfig, _SubcommandConfiguration(name='sdfstudio-data', default=SDFStudioDataParserConfig(_target=<class 'nerfstudio.data.dataparsers.sdfstudio_dataparser.SDFStudio'>, data=PosixPath('data/DTU/scan65'), include_mono_prior=False, include_foreground_mask=False, downscale_factor=1, scene_scale=2.0, skip_every_for_val_split=1, auto_orient=True), description='', prefix_name=False)], typing_extensions.Annotated[nerfstudio.data.dataparsers.nerfosr_dataparser.NeRFOSRDataParserConfig, _SubcommandConfiguration(name='nerfosr-data', default=NeRFOSRDataParserConfig(_target=<class 'nerfstudio.data.dataparsers.nerfosr_dataparser.NeRFOSR'>, data=PosixPath('data/NeRF-OSR/Data'), scene='stjacob', scene_scale=1.0, scale_factor=1.0, use_masks=False, orientation_method='vertical', center_method='focus', auto_scale_poses=True), description='', prefix_name=False)], typing_extensions.Annotated[nerfstudio.data.dataparsers.sitcoms3d_dataparser.Sitcoms3DDataParserConfig, _SubcommandConfiguration(name='sitcoms3d-data', default=Sitcoms3DDataParserConfig(_target=<class 'nerfstudio.data.dataparsers.sitcoms3d_dataparser.Sitcoms3D'>, data=PosixPath('data/sitcoms3d/TBBT-big_living_room'), include_semantics=True, downscale_factor=4, scene_scale=2.0), description='', prefix_name=False)]].

Do you know what happens in this case?

NOTE: Before installing lerf, I could see ns-train -h with a list of "subcommand".

I look forward to any information you can give me to fix the problem.
Thank you

confusion about position encoding interpolation

Thanks for your great work!
I want to know that why the code below need to minus patch_size

# compute number of tokens taking stride into account
 w0 = 1 + (w - patch_size) // stride_hw[1]
 h0 = 1 + (h - patch_size) // stride_hw[0]

2D visualization in Figure 4

Hi, could you share the generation process of the 2D CLIP visualization as shown in Figure 4?

Thanks!

"multiphrase" output renderer

When I try to follow the instructions, I only get relevancy_0 output renderer. However, The controls shown in the demo video are totally different that what I get. Also, I see that the demo video uses multiphrase output renderer. How am I supposed to get the same thing running on my side?

low data reload speed

it seems that the dino/clip features are not stored locally, so every time I try to load a trained lerf model, it regenerate the dino/clip features again. Is it possible to save the features and speed up the data parsing process? Many thanks!

How to generate LERF dataset with own data?

AttributeError: 'LERFDataManager' object has no attribute 'fixed_indices_eval_dataloader'

(alg) k8s@zjm-sz8tney2-qlxvm:~/nerfstudio$ ns-eval --load-config /opt/image-synthesis/algorithm_manage/model_zoo/test_video/lerf/2023-07-13_112928/config.yml --output-path /opt/image-synthesis/eval_output/test_video/lerf.json
Loading latest checkpoint from load_dir
✅ Done loading checkpoint from
/opt/image-synthesis/algorithm_manage/model_zoo/test_video/lerf/2023-07-13_112928/nerfstudio_models/step-000028000.ckpt
Traceback (most recent call last) ────────────────────────────────╮
│ /home/ubuntu/miniconda3/envs/alg/bin/ns-eval:8 in │
│ │
│ 5 from nerfstudio.scripts.eval import entrypoint │
│ 6 if name == 'main': │
│ 7 │ sys.argv[0] = re.sub(r'(-script.pyw|.exe)?$', '', sys.argv[0]) │
│ ❱ 8 │ sys.exit(entrypoint()) │
│ 9 │
│ │
│ /home/ubuntu/miniconda3/envs/alg/lib/python3.10/site-packages/nerfstudio/scripts/eval.py:61 in │
│ entrypoint │
│ │
│ 58 def entrypoint(): │
│ 59 │ """Entrypoint for use with pyproject scripts.""" │
│ 60 │ tyro.extras.set_accent_color("bright_yellow") │
│ ❱ 61 │ tyro.cli(ComputePSNR).main() │
│ 62 │
│ 63 │
│ 64 if name == "main": │
│ │
│ /home/ubuntu/miniconda3/envs/alg/lib/python3.10/site-packages/nerfstudio/scripts/eval.py:44 in │
│ main │
│ │
│ 41 │ │ """Main function.""" │
│ 42 │ │ config, pipeline, checkpoint_path, _ = eval_setup(self.load_config) │
│ 43 │ │ assert self.output_path.suffix == ".json" │
│ ❱ 44 │ │ metrics_dict = pipeline.get_average_eval_image_metrics() │
│ 45 │ │ self.output_path.parent.mkdir(parents=True, exist_ok=True) │
│ 46 │ │ # Get the output and define the names to save to │
│ 47 │ │ benchmark_info = { │
│ │
│ /home/ubuntu/miniconda3/envs/alg/lib/python3.10/site-packages/nerfstudio/utils/profiler.py:127 │
│ in inner │
│ │
│ 124 │ │ def inner(*args, **kwargs): │
│ 125 │ │ │ self._function_call_args = (args, kwargs) │
│ 126 │ │ │ with self: │
│ ❱ 127 │ │ │ │ out = func(*args, **kwargs) │
│ 128 │ │ │ self._function_call_args = None │
│ 129 │ │ │ return out │
│ 130 │
│ │
│ /home/ubuntu/miniconda3/envs/alg/lib/python3.10/site-packages/nerfstudio/pipelines/base_pipeline │
│ .py:351 in get_average_eval_image_metrics │
│ │
│ 348 │ │ self.eval() │
│ 349 │ │ metrics_dict_list = [] │
│ 350 │ │ assert isinstance(self.datamanager, VanillaDataManager) │
│ ❱ 351 │ │ num_images = len(self.datamanager.fixed_indices_eval_dataloader) │
│ 352 │ │ with Progress( │
│ 353 │ │ │ TextColumn("[progress.description]{task.description}"), │
│ 354 │ │ │ BarColumn(), │
│ │
│ /home/ubuntu/miniconda3/envs/alg/lib/python3.10/site-packages/torch/nn/modules/module.py:1614 in │
│ getattr │
│ │
│ 1611 │ │ │ modules = self.dict['_modules'] │
│ 1612 │ │ │ if name in modules: │
│ 1613 │ │ │ │ return modules[name] │
│ ❱ 1614 │ │ raise AttributeError("'{}' object has no attribute '{}'".format( │
│ 1615 │ │ │ type(self).name, name)) │
│ 1616 │ │
│ 1617 │ def setattr(self, name: str, value: Union[Tensor, 'Module']) -> None: │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
AttributeError: 'LERFDataManager' object has no attribute 'fixed_indices_eval_dataloader'

why

Bug in patch sampling

Hello, for patch sampling, I think using torch.searchsorted(...) -1 leads to bugs

For example, consider x_center = tensor([ 0., 87., 174., 261., 348., 435., 522., 609.]) and some sample points = tensor([ 87, 609, 348, 522, 174, 261, 0, 348, 609, 609, 87, 435, 174, 0, 522, 435, 522, 522])

Using torch.searchsorted(...) leads to tensor([ 0, 6, 3, 5, 1, 2, -1, 3, 6, 6, 0, 4, 1, -1, 5, 4, 5, 5])
Instead, you can just do

x_ind = torch.floor((x - (self.center_x[0])) / self.stride).long()

which produces tensor([1, 7, 4, 6, 2, 3, 0, 4, 7, 7, 1, 5, 2, 0, 6, 5, 6, 6]).

What is the purpose of this code in _embed_clip_tiles method?

What is the use or purpose of the following lines, which is apparently copying the last column and row of the embedding matrix and repeating it?
clip_embeds = torch.concat((clip_embeds, clip_embeds[:, [-1], :]), dim=1)
clip_embeds = torch.concat((clip_embeds, clip_embeds[[-1], :, :]), dim=0)

Link to the code: https://github.com/kerrj/lerf/blob/4a13f2074f3f3a162d9b3b9e6c4bbcc21499a7f2/lerf/data/utils/patch_embedding_dataloader.py#L114C9-L115C82

Minor: torchtyping not listed as dependency

In order to run ns-install-cli in a clean conda environment, I had to manually pip install torchtyping. Seems like this should be a dependency.

ImportError: cannot import name 'LERFModelConfig' from 'lerf.lerf'

Hi, I tried to install lerf and got ImportError: cannot import name 'LERFModelConfig' from 'lerf.lerf' after running ns-install-cli.

After that, running ns-train -h also got the same ImportError. How to solve this problem?

dino inference

HI,

when i view your codes on the inference step, I find that when you generate the row_relevancy output, you do not include the output from dino network. So, we are wondering the current results of the code you provided is whether the results from 'lerf w/o dino' or 'lerf w/ dino' (In fig 5 in your paper). Can you help us on it?

Best,
Any

how to measure the dimension or distance from the relevancy maps?

Hi
Is is t possible to use Nerfstudio viewer to know the distance/dimension of the relevancy maps?
Let me use the image below as an example, the sponge is highlighted in the relevancy maps, but how could I get the rough dimension of the sponge?

Sparse point cloud for the dataset

Hi, thanks for releasing the dataset. In our work, we would acquire the Colmap results including the sparse points and poses when generating the given trajectory. Could you please release them as well? Thanks so much!

nerfstudio does not have text box for Negative Phrases

Hi, I couldnt find the text box for negative phrases on nerfstudio as shown in the demo video . Is it removed ? Thanks !

how to render the relevancy map with cli, given the text query

We can get the query result in the viewer. But I'm wondering how to get the relevance map and the queried result with cli?

Missing scene "Table"

The paper uses a scene called "Table" for evaluation but I don't see that scene in the dataset or videos that are linked from the project website. Is that intentional?

'ModuleNotFoundError: No module named 'nerfstudio.viewer.viewer_elements''

Hi,
Thank you for sharing the code.
I don't have the problem before when using nerfstudio but after run '2. Install this repo as a python package
Navigate to this folder and run python -m pip install -e .' , I am having this error: 'ModuleNotFoundError: No module named 'nerfstudio.viewer.viewer_elements'', although using the same Docker as before.
Do you have any clue about this problem?
Thanks!

How to extract relevancy map in the code?

Hi, I was wondering if there is a way to extract the relevancy maps as a segmentation mask for a given viewing angle?
Not in the nerfstudio GUI but in code, such that the mask can be used for further processing?

Any insights would be greatly appreciated

LERFPipeline.init() got an unexpected keyword argument 'grad_scaler'

Hi all,

Ran into a type error while trying to run ns-train lerf --data <data_folder>

File "C:\Users\User\miniconda3\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\User\miniconda3\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "C:\Users\User\miniconda3\Scripts\ns-train.exe_main.py", line 7, in
File "C:\Users\User\nerfstudio\nerfstudio\scripts\train.py", line 261, in entrypoint
main(
File "C:\Users\User\nerfstudio\nerfstudio\scripts\train.py", line 246, in main
launch(
File "C:\Users\User\nerfstudio\nerfstudio\scripts\train.py", line 189, in launch
main_func(local_rank=0, world_size=world_size, config=config)
File "C:\Users\User\nerfstudio\nerfstudio\scripts\train.py", line 99, in train_loop
trainer.setup()
File "C:\Users\User\nerfstudio\nerfstudio\engine\trainer.py", line 148, in setup
self.pipeline = self.config.pipeline.setup(
File "C:\Users\User\nerfstudio\nerfstudio\configs\base_config.py", line 57, in setup
return self._target(self, **kwargs)
TypeError: LERFPipeline.init() got an unexpected keyword argument 'grad_scaler'

Any idea what might be causing this? Thanks in advance!

Language loss

In the paper it is mentioned that, you are maximising the cosine loss between the clip vector and the predicted vector but in the code it seems to be that you are using a huber loss. So could you please clarify this if possible

Clarification in LERF Field processing

Could you please explain the purpose for the following line of code (link) in the get_outputs() function under LERFField:

positions = (positions + 2.0) / 4.0

Typo: definition of scale s(t)

Hello, @kerrj!

Thank you for sharing this excellent work!

While reading the paper, I noticed a typo.
In Section 3.1, the scale s(t) is defined as follows:
$s(t) = s_{img} \times f_{xy} / t$.

However, the correct definition of s(t) should be:
$s(t) = s_{img} \times t / f_{xy}$.

I kindly request a modification to reflect the accurate information. Thank you!

NaN in training

Hi, thanks for your excellent work! The project is really cool!

However, I've encountered NaN in training, and the scene in viewer becomes completely black.

I followed the instructions to use lerf-merge branch of nerfstudio, and use the demo poster dataset. I got reasonable results with nerfacto, but got NaN with lerf.

Do you have any idea what the problem is? Thanks a lot.

Running LeRF with Blender synthetic data

Hi,

I attempted to train a LeRF model using Blender synthetic data, such as the Lego scene. However, it exhibited overfitting to the training views and failed to converge on novel views. You can see the result in this image:

I then found a issue that is related here, which says that Nerfacto doesn't work with blender data in the default setting, and it gives a solution. I followed the arguments it used and try to train a Nerfacto and it converged. I then tried the same argument to train a LeRF and it generated pure white:

To address this issue, I made some modifications to the LeRF code, detaching the CLIP and DINO features. It also worked:

I guess the problem may be related to feature training. It's possible that there needs to be some background density for the model to learn the features, while the RGB training may discourage background density.

Have you tried this training setting before, and is it possible to resolve the issue by adjusting certain training options?

Thank you very much!

Ground Truth Box

Hi!
Thank you for your interesting open-source work! Could you please provide the corresponding evaluation code and the ground truth box for evaluation in Section 4.3 Localization( "To evaluate how well LERF can localize text prompts in a scene we render novel views and label bounding boxes for 72 objects across 5 scenes." ) ? Thank you again for your work!

Debbuging nerfstudio project

@kerrj This is a great project and awesome example of how to build a new method using Nerfstudio, thanks to you and other authors. However for me as a newbie it is unclear how to debug such a project. Maybe as a LERF developer you can share some instructions on how you did it. I use VSCode.

Video Rendering

Hi, Thank you for the amazing work. Any update on Integrate into ns-render commands to render videos from the command line with custom prompts @kerrj

render lerf but torch.cuda.OutOfMemoryError: CUDA out of memory

ns-render camera-path --load-config outputs/xxx/lerf-big/2024-02-02_095634/config.yml --camera-path-filename camera_paths/2024-01-23_154844.json --output-path renders/xxx/lerf.mp4 i use 4090, why oom???

Obtain volume-rendered semantic field in at test time?

Hello,

is there a way to obtain the per-image volume-rendered 2d feature fields during test time? Seems that outputs["clip"] = self.renderer_clip(...) outputs the per-ray feature which don't necessarily add up to a whole image right? Is there any part of the LERF codebase where I have access to the whole HxWxEmb_size tensor per each frame or do I need to dig into the NerfStudio code for this?

Thanks,

Benet

	if 'dino' in model_type:
	model = torch.hub.load('facebookresearch/dino:main', model_type)
	else: # model from timm -- load weights from timm to dino model (enables working on arbitrary size images).
	temp_model = timm.create_model(model_type, pretrained=True)
	model_type_dict = {
	'vit_small_patch16_224': 'dino_vits16',
	'vit_small_patch8_224': 'dino_vits8',
	'vit_base_patch16_224': 'dino_vitb16',
	'vit_base_patch8_224': 'dino_vitb8'
	}
	model = torch.hub.load('facebookresearch/dino:main', model_type_dict[model_type])
	temp_state_dict = temp_model.state_dict()
	del temp_state_dict['head.weight']
	del temp_state_dict['head.bias']
	model.load_state_dict(temp_state_dict)
	return model