linshan-bin / occnerf Goto Github PK
View Code? Open in Web Editor NEWCode of "OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments".
License: Apache License 2.0
Code of "OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments".
License: Apache License 2.0
Hi,
Thanks for your great work! There are some problems when I installed the torch environment.
My device is Ubuntu 2204, CUDA 11.3, 4 RTX4090 (40 series need at least CUDA 11.3).
My training setting: v1.0-mini dataset. contracted_coord = False, auxiliary_frame = False, render_h = 45, render_w = 80, input_channel = 16, train only depth.
conda install pytorch==1.9.1 torchvision==0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge
This is from your repo, but it would install torch of cpu version and torch.cuda.is_available() = False
conda install pytorch==1.10.0 torchvision==0.11.0 torchaudio==0.10.0 cudatoolkit=11.3 -c pytorch -c conda-forge
and
conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge
These two are from pytorch official website. They are gpu version, but they have the same error.
pip install torch==1.9.1+cu111 torchvision==0.10.1+cu111 torchaudio==0.9.1 -f https://download.pytorch.org/whl/torch_stable.html
This is from pytorch official website. It trained successfully with contracted_coord = True and little warnings, but there is error about CUDA with contracted_coord = False, maybe because it is cu111:
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113
and conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch
They both are from pytorch official website. It seems like training successfully. There are lots of warnings about CUDA, is this torch and cuda version suitable?
How to set the config to train model with semantic with 4090(24GB each)? A little accuracy can be sacrificed.
Thanks for your help!
Thanks for your great work! I have run the training code for depth estimation and found following two problems:
Could you please give me some advice about these problems? I run the nusc-depth training code in 4 GPUS with the same setting of the release code (auxiliary_frame=True and use_fp16=True).
Thank you for your excellent work, I would like to confirm if you do not use auxiliary_frame and depth auxiliary_frame in sementic training for your work.
Hi, thank you for great work.
I have a question.
Does the model need to be trained every single scene or is it generalizable?
How long does it take to train the full model? (8 a100)
Hi, thanks for your great work!
Because of equipment limitations, I can only train with 8GB GPU (Nivdia 4060 laptop) and v1.0-mini dataset. Metrics are not important, just want to run on my own computer.
I changed some hyperparameters, generated train.txt and val.txt of v1.0-mini (the first and last frames of each scene are excluded). It seem to train successfully:
input_channel: 64 -> 4
con_channel: 16 -> 1
encoder: 101 ->50
render_h: 180 ->45
render_w: 320 -> 80
Do you have some advice on training with v1.0-mini dataset?
I generate the train.txt, val.txt, and depth of v1.0mini dataset by tools/export_gt_depth_nusc.py
Do I need to change ground truth occupancy labels in ./data/nuscenes/gts, 2D semantic labels in ./data/nuscenes/nuscenes_semantic and checkpoint in ./ckpts?
Thanks for your great work. I'd like to query for the GroundedSAM preprocess code which I do not find as a submodule maybe?
ind_norm_sem = ind_norm_sem.flip((-1,))
In my opinion, the xyz has been converted to the world coordiante. So if flip the xyz, the function will sample grid along the wrong axis(x along z-axis, z along x-axis).
Is there anything wrong with my idea?
Hello author, thank you for your excellent work, I was wondering how to visualize the occupancy prediction?
Hello, could you release the training code for SemanticKITTI?
Hello, thank you for your great work, I was wondering whether this self-supervised method can obtain absolute depth.
Hello author, I am from Wuhan University of Technology. I would like you to discuss some technical details of your algorithm and its applications in other fields. Could you please provide a commonly used email so that we can communicate via email? Thank you.
Hello author, I'm not quite familiar with NeRF volume rendering. Could you explain why probability accumulation is summed up here to render depth? What is the corresponding mathematical formula for this process? What is the corresponding physical meaning? in my opinion,cumulative multiplication might make more sense?
def get_density(self, rays_o, rays_d, Voxel_feat, is_train, inputs):
dtype = torch.float16 if self.opt.use_fp16 else torch.float32
device = rays_o.device
rays_o, rays_d, Voxel_feat = rays_o.to(dtype), rays_d.to(dtype), Voxel_feat.to(dtype)
reg_loss = {}
eps_time = time.time()
with torch.no_grad():
rays_o_i = rays_o[0, ...].flatten(0, 2) # HXWX3
rays_d_i = rays_d[0, ...].flatten(0, 2) # HXWX3
rays_pts, mask_outbbox, z_vals, rays_pts_depth = self.sample_ray(rays_o_i, rays_d_i, is_train=is_train)
dists = rays_pts_depth[..., 1:] - rays_pts_depth[..., :-1] # [num pixels, num points - 1]
dists = torch.cat([dists, 1e4 * torch.ones_like(dists[..., :1])], dim=-1) # [num pixels, num points]
sample_ret = self.grid_sampler(rays_pts, Voxel_feat, avail_mask=~mask_outbbox)
if self.use_semantic:
if self.opt.semantic_sample_ratio < 1.0:
geo_feats, mask, semantic, mask_sem, group_num, group_size = sample_ret
else:
geo_feats, mask, semantic = sample_ret
else:
geo_feats, mask = sample_ret
if self.opt.render_type == 'prob':
weights = torch.zeros_like(rays_pts[..., 0])
weights[:, -1] = 1
geo_feats = torch.sigmoid(geo_feats)
if self.opt.last_free:
geo_feats = 1.0 - geo_feats # the last channel is the probability of being free
weights[mask] = geo_feats
# accumulate
weights = weights.cumsum(dim=1).clamp(max=1)
alphainv_fin = weights[..., -1]
weights = weights.diff(dim=1, prepend=torch.zeros((rays_pts.shape[:1])).unsqueeze(1).to(device=device, dtype=dtype))
depth = (weights * z_vals).sum(-1)
rgb_marched = 0```
Thanks for this great work. Wondering the zero-shot ability?
Thanks for sharing this nice work!
Sorry, I can't run the code directly in my environment. One question I have is what the input to photometric loss looks like.
loss = L1_loss(pred, target), are the pred and target 3-channels RGB images?
Whether the value ranges from 0 to 255 (original image) or normalized value (normalized by mean/std value)? I don't see a place in the code for image normalization.
mask_mems = (torch.abs(feat_mems) > 0).float()
feat_mem = basic.reduce_masked_mean(feat_mems, mask_mems, dim=1) # B, C, Z, Y, X
feat_mem = feat_mem.permute(0, 1, 4, 3, 2) # [0, ...].unsqueeze(0) # ZYX -> XYZ
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.