linshan-bin / occnerf Goto Github PK

View Code? Open in Web Editor NEW

281.0 281.0 17.0 248.86 MB

Code of "OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments".

License: Apache License 2.0

Python 100.00%

occnerf's People

Contributors

Stargazers

Watchers

Forkers

tangtaogo yanwang202199 urbanist-ai eltociear whuhxb sylvia6 seabird-go applededipan fine2copyv qianqian121 qian5683 zhangzw12319 virtualxqf ccplxx josen-jiang aidasdir aheldis

occnerf's Issues

torch and CUDA version

Hi,

Thanks for your great work! There are some problems when I installed the torch environment.
My device is Ubuntu 2204, CUDA 11.3, 4 RTX4090 (40 series need at least CUDA 11.3).
My training setting: v1.0-mini dataset. contracted_coord = False, auxiliary_frame = False, render_h = 45, render_w = 80, input_channel = 16, train only depth.

conda install pytorch==1.9.1 torchvision==0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge
This is from your repo, but it would install torch of cpu version and torch.cuda.is_available() = False
conda install pytorch==1.10.0 torchvision==0.11.0 torchaudio==0.10.0 cudatoolkit=11.3 -c pytorch -c conda-forge and
conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge
These two are from pytorch official website. They are gpu version, but they have the same error.
pip install torch==1.9.1+cu111 torchvision==0.10.1+cu111 torchaudio==0.9.1 -f https://download.pytorch.org/whl/torch_stable.html
This is from pytorch official website. It trained successfully with contracted_coord = True and little warnings, but there is error about CUDA with contracted_coord = False, maybe because it is cu111:
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113 and conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch
They both are from pytorch official website. It seems like training successfully. There are lots of warnings about CUDA, is this torch and cuda version suitable?
How to set the config to train model with semantic with 4090(24GB each)? A little accuracy can be sacrificed.

Thanks for your help!

Some problems of nusc-depth training code

Thanks for your great work! I have run the training code for depth estimation and found following two problems:

Sometimes get nan while using fp16.
The loss function does not decrease while training.

Could you please give me some advice about these problems? I run the nusc-depth training code in 4 GPUS with the same setting of the release code (auxiliary_frame=True and use_fp16=True).

Question about the use of auxiliary_frame in Depth and Semantic task

Thank you for your excellent work, I would like to confirm if you do not use auxiliary_frame and depth auxiliary_frame in sementic training for your work.

How long does it typically take to train the OccNeRF model with semantic supervision?

Lack of memory on the GPU during training

Hi, I'm reproducing with the latest version of the code on 8 A30 GPUs (24GB of memory each) and it shows CUDA out of memory at the beginning of the 1st epoch, what is the cause of this.

Generalizability

Hi, thank you for great work.

I have a question.
Does the model need to be trained every single scene or is it generalizable?

depth training problems when set contracted_coord = False

你好，感谢你们出色的工作！我希望使用更小的显存和更少的时间来训练模型，于是我设置contracted_coord = False并调整了voxels_size = [16, 200, 200]。但此时模型训练失败，验证时最小深度和最大深度差距很小，在训练一两个epoch后渲染的深度值就为同一值了。你们有遇到过这样的情况吗？面对这样的情况我该如何调整模型参数让模型能够训练？非常感谢，如果你们能够给予一些帮助！

depth trainning problems

你好，感谢你出色的工作！我在使用你的项目进行自定义数据的训练时（only custom-depth.txt），遇到一些问题想请教。训练损失看起来正常，但是在前几个epoch验证的时候，我发现输出的深度的min和max差距很小，像下图这样：

请问这种情况你初始化训练遇到过吗，还是说我需要针对自定义数据集进行一些参数的调整？

How long does it take to train the full model? (8 a100)

How to train with v1.0-mini dataset?

Hi, thanks for your great work!

Because of equipment limitations, I can only train with 8GB GPU (Nivdia 4060 laptop) and v1.0-mini dataset. Metrics are not important, just want to run on my own computer.

I changed some hyperparameters, generated train.txt and val.txt of v1.0-mini (the first and last frames of each scene are excluded). It seem to train successfully:
input_channel: 64 -> 4
con_channel: 16 -> 1
encoder: 101 ->50
render_h: 180 ->45
render_w: 320 -> 80
Do you have some advice on training with v1.0-mini dataset?

I generate the train.txt, val.txt, and depth of v1.0mini dataset by tools/export_gt_depth_nusc.py

Do I need to change ground truth occupancy labels in ./data/nuscenes/gts, 2D semantic labels in ./data/nuscenes/nuscenes_semantic and checkpoint in ./ckpts?

There are 700 training scene and 150 val scene in v1.0-full dataset, but why there are just 20096 sample tokens (about 500 scene) in your train.txt?

Where is GroundedSAM preprocess code.

Thanks for your great work. I'd like to query for the GroundedSAM preprocess code which I do not find as a submodule maybe?

Why do you flip the xyz here?

ind_norm_sem = ind_norm_sem.flip((-1,))
In my opinion, the xyz has been converted to the world coordiante. So if flip the xyz, the function will sample grid along the wrong axis（x along z-axis, z along x-axis）.
Is there anything wrong with my idea？

How to visualize the occupancy prediction

Hello author, thank you for your excellent work, I was wondering how to visualize the occupancy prediction？

Training code for SemanticKITTI

Hello, could you release the training code for SemanticKITTI?

Can this method obtain absolute depth?

Hello, thank you for your great work, I was wondering whether this self-supervised method can obtain absolute depth.

Contact Author

Hello author, I am from Wuhan University of Technology. I would like you to discuss some technical details of your algorithm and its applications in other fields. Could you please provide a commonly used email so that we can communicate via email? Thank you.

depth training problems

When training custom data, the output depth value is strange, and the difference between min and max is very small.

why cumsum the prob to render depth？

Hello author, I'm not quite familiar with NeRF volume rendering. Could you explain why probability accumulation is summed up here to render depth? What is the corresponding mathematical formula for this process? What is the corresponding physical meaning? in my opinion，cumulative multiplication might make more sense?

def get_density(self, rays_o, rays_d, Voxel_feat, is_train, inputs):
        dtype = torch.float16 if self.opt.use_fp16 else torch.float32
        device = rays_o.device
        rays_o, rays_d, Voxel_feat = rays_o.to(dtype), rays_d.to(dtype), Voxel_feat.to(dtype)

        reg_loss = {}
        eps_time = time.time()
        with torch.no_grad():
            rays_o_i = rays_o[0, ...].flatten(0, 2)  # HXWX3
            rays_d_i = rays_d[0, ...].flatten(0, 2)  # HXWX3
            rays_pts, mask_outbbox, z_vals, rays_pts_depth = self.sample_ray(rays_o_i, rays_d_i, is_train=is_train)

        dists = rays_pts_depth[..., 1:] - rays_pts_depth[..., :-1]  # [num pixels, num points - 1]
        dists = torch.cat([dists, 1e4 * torch.ones_like(dists[..., :1])], dim=-1)  # [num pixels, num points]

        sample_ret = self.grid_sampler(rays_pts, Voxel_feat, avail_mask=~mask_outbbox)
        if self.use_semantic:
            if self.opt.semantic_sample_ratio < 1.0:
                geo_feats, mask, semantic, mask_sem, group_num, group_size = sample_ret
            else:
                geo_feats, mask, semantic = sample_ret
        else:
            geo_feats, mask = sample_ret


        if self.opt.render_type == 'prob':
            weights = torch.zeros_like(rays_pts[..., 0])
            weights[:, -1] = 1
            geo_feats = torch.sigmoid(geo_feats)
            if self.opt.last_free:
                geo_feats = 1.0 - geo_feats  # the last channel is the probability of being free
            weights[mask] = geo_feats

            # accumulate
            weights = weights.cumsum(dim=1).clamp(max=1)
            alphainv_fin = weights[..., -1]
            weights = weights.diff(dim=1, prepend=torch.zeros((rays_pts.shape[:1])).unsqueeze(1).to(device=device, dtype=dtype))
            depth = (weights * z_vals).sum(-1)
            rgb_marched = 0```