jabb0 / fastflow3d Goto Github PK
View Code? Open in Web Editor NEWImplementation of the FastFlow3D architecture for scene flow estimation from LiDAR point clouds in PyTorch using PyTorch Lightning.
License: MIT License
Implementation of the FastFlow3D architecture for scene flow estimation from LiDAR point clouds in PyTorch using PyTorch Lightning.
License: MIT License
Hi,
Thanks for your implementation! Now I have a question about the calculation of the metric. In your code, I find you compute the pointwise metric at each step. Then, pytorch lightning will average the metric on each step automatically to get the mean metric on epoch. In my understanding, the point-wise mentioned in the paper is performed on the entire epoch. I want to know if this will lead to some bias in evaluation.
Thanks for providing this fastflow3d implementation here. I'm using it with a custom dataset. Some of the data in it trigger an index out of bounds error, an example trace is pasted at the end.
I think that the error happens because the upper limit of the grid (x_max, y_max, z_max) is an exclusive boundary and lidar points that fall exactly on that value are then out of bounds. For example in a 1D grid from x_min=-2 to x_max=2 with a grid_size of 4, the grid cells would contain
0 1 2 3
[-2.0, -1.0) [-1.0, 0.0) [0.0, 1.0) [1.0, 2.0)
A point at x=2.0 (x=x_max) would fall into cell with index 4, which is out of bounds.
The easiest workaround I see is to change remove_out_of_bounds_points
in utils/pillars.py
to exclude the *_max values, i.e. change <=
to <
for x_max, y_max, z_max. This seems to fix the error for me. Does this make sense?
diff --git a/utils/pillars.py b/utils/pillars.py
index 5714c8d..88f0125 100644
--- a/utils/pillars.py
+++ b/utils/pillars.py
@@ -4,9 +4,9 @@ import numpy as np
def remove_out_of_bounds_points(pc, y, x_min, x_max, y_min, y_max, z_min, z_max):
# Calculate the cell id that this entry falls into
# Store the X, Y indices of the grid cells for each point cloud point
- mask = (pc[:, 0] >= x_min) & (pc[:, 0] <= x_max) \
- & (pc[:, 1] >= y_min) & (pc[:, 1] <= y_max) \
- & (pc[:, 2] >= z_min) & (pc[:, 2] <= z_max)
+ mask = (pc[:, 0] >= x_min) & (pc[:, 0] < x_max) \
+ & (pc[:, 1] >= y_min) & (pc[:, 1] < y_max) \
+ & (pc[:, 2] >= z_min) & (pc[:, 2] < z_max)
pc_valid = pc[mask]
y_valid = None
if y is not None:
[...]
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:111: operator(): block: [118834,0,0], thread: [124,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:111: operator(): block: [118834,0,0], thread: [125,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:111: operator(): block: [118834,0,0], thread: [126,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:111: operator(): block: [118834,0,0], thread: [127,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/trainer.py", line 722, in _call_and_handle_interrupt
return self.strategy.launcher.launch(trainer_fn, *args, trainer=self, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 93, in launch
return function(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/trainer.py", line 812, in _fit_impl
results = self._run(model, ckpt_path=self.ckpt_path)
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/trainer.py", line 1237, in _run
results = self._run_stage()
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/trainer.py", line 1324, in _run_stage
return self._run_train()
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/trainer.py", line 1354, in _run_train
self.fit_loop.run()
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/loops/base.py", line 204, in run
self.advance(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/loops/fit_loop.py", line 269, in advance
self._outputs = self.epoch_loop.run(self._data_fetcher)
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/loops/base.py", line 204, in run
self.advance(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 208, in advance
batch_output = self.batch_loop.run(batch, batch_idx)
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/loops/base.py", line 204, in run
self.advance(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 88, in advance
outputs = self.optimizer_loop.run(split_batch, optimizers, batch_idx)
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/loops/base.py", line 204, in run
self.advance(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 203, in advance
result = self._run_optimization(
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 256, in _run_optimization
self._optimizer_step(optimizer, opt_idx, batch_idx, closure)
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 369, in _optimizer_step
self.trainer._call_lightning_module_hook(
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/trainer.py", line 1596, in _call_lightning_module_hook
output = fn(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/core/lightning.py", line 1625, in optimizer_step
optimizer.step(closure=optimizer_closure)
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/core/optimizer.py", line 168, in step
step_output = self._strategy.optimizer_step(self._optimizer, self._optimizer_idx, closure, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/strategies/ddp.py", line 278, in optimizer_step
optimizer_output = super().optimizer_step(optimizer, opt_idx, closure, model, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/strategies/strategy.py", line 193, in optimizer_step
return self.precision_plugin.optimizer_step(model, optimizer, opt_idx, closure, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 155, in optimizer_step
return optimizer.step(closure=closure, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/optim/optimizer.py", line 88, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/optim/adam.py", line 100, in step
loss = closure()
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 140, in _wrap_closure
closure_result = closure()
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 148, in __call__
self._result = self.closure(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 134, in closure
step_output = self._step_fn()
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 427, in _training_step
training_step_output = self.trainer._call_strategy_hook("training_step", *step_kwargs.values())
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/trainer.py", line 1766, in _call_strategy_hook
output = fn(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/strategies/ddp.py", line 344, in training_step
return self.model(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/parallel/distributed.py", line 963, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/overrides/base.py", line 82, in forward
output = self.module.training_step(*inputs, **kwargs)
File "/workspace/FastFlow3D/models/BaseModel.py", line 167, in training_step
loss, metrics = self.general_step(batch, batch_idx, phase)
File "/workspace/FastFlow3D/models/BaseModel.py", line 119, in general_step
y_hat = self(x)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/workspace/FastFlow3D/models/FastFlow3DModelScatter.py", line 93, in forward
current_pillar_embeddings = self._pillar_feature_net(current_batch_pc_embedding, current_batch_grid)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/workspace/FastFlow3D/networks/pillarFeatureNetScatter.py", line 35, in forward
grid.scatter_add_(1, indices, x)
RuntimeError: CUDA error: device-side assert triggered
[W CUDAGuardImpl.h:113] Warning: CUDA warning: device-side assert triggered (function destroyEvent)
terminate called after throwing an instance of 'c10::CUDAError'
what(): CUDA error: device-side assert triggered
Exception raised from create_event_internal at ../c10/cuda/CUDACachingAllocator.cpp:1230 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f5becd167d2 in /usr/local/lib/python3.8/dist-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x2319e (0x7f5becf8319e in /usr/local/lib/python3.8/dist-packages/torch/lib/libc10_cuda.so)
frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x22d (0x7f5becf84d3d in /usr/local/lib/python3.8/dist-packages/torch/lib/libc10_cuda.so)
frame #3: <unknown function> + 0x2ffc28 (0x7f5c40051c28 in /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_python.so)
frame #4: c10::TensorImpl::release_resources() + 0x175 (0x7f5beccff005 in /usr/local/lib/python3.8/dist-packages/torch/lib/libc10.so)
frame #5: std::vector<c10d::Reducer::Bucket, std::allocator<c10d::Reducer::Bucket> >::~vector() + 0x2e9 (0x7f5c2b9018d9 in /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_cpu.so)
frame #6: c10d::Reducer::~Reducer() + 0x205 (0x7f5c2b8f4015 in /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_cpu.so)
frame #7: std::_Sp_counted_ptr<c10d::Reducer*, (__gnu_cxx::_Lock_policy)2>::_M_dispose() + 0x12 (0x7f5c4052f8d2 in /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_python.so)
frame #8: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x46 (0x7f5c3ff3fbc6 in /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_python.so)
frame #9: <unknown function> + 0x7e0eef (0x7f5c40532eef in /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_python.so)
frame #10: <unknown function> + 0x1f51e0 (0x7f5c3ff471e0 in /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_python.so)
frame #11: <unknown function> + 0x1f638e (0x7f5c3ff4838e in /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_python.so)
frame #12: python() [0x5d0147]
frame #13: python() [0x5a9e9d]
frame #14: python() [0x5d0168]
frame #15: python() [0x5a6152]
frame #16: python() [0x4ef7f8]
<omitting python frames>
frame #22: __libc_start_main + 0xf3 (0x7f5c41db70b3 in /usr/lib/x86_64-linux-gnu/libc.so.6)
Aborted (core dumped)
Do you have the model you trained on waymo available to download, or do we have to download dataset, preprocess data, and train for 3 days to get the model you achieved?
Thanks for your excellent job and detailed tutorial! I notice that there maybe a little bug in the readme.md, since there are double "offset_y" in Architecture-Scene Encoder-4.Encode each point as 8D (pillarCenter_x, pillarCenter_y, pillarCenter_z, offset_x, offset_y, offset_y, feature_0, feature_1)
Moreover, will you kindly release the trained checkpoint file for the network? Sincerely looking forward to your reply!
Thank you again for this outstanding work! Here I have a question about the waymo dataset version you used. I notice that the waymo dataset download path you provided in the readme.md (https://console.cloud.google.com/storage/browser/waymo_open_dataset_scene_flow) is different from the dataset of any version on the official website of waymo dataset (now latest version is waymo 1.4.0). I would like to know what is the difference between the dataset used in this work and the datasets that are open for download in the official website. Sincerely looking forward to your reply!
Hello again! I encounter an error when I'm trying to run the train.py, which shows that accelerator='ddp' is an invalid accelerator name. The error message is shown at the end of the issue.
My environment is:
CUDA 11.3
Python 3.10.8
PyTorch 1.12.1
PyTorch lightning 1.8.3
and I've also tried the environment setting as follows and still encounter the same problem:
CUDA 11.3
Python 3.8.13
PyTorch 1.10.0
PyTorch lightning 1.7.7
Can you kindly offer some suggestions? Thanks a lot and looking forward to your reply!
~/FastFlow3D-main$ python train.py --accelerator='ddp' --batch_size=16 --gpus=4 --num_workers=16 --learning_rate=0.0001 --disable_ddp_unused_check=True
No weights and biases API key set. Using tensorboard instead!
Disabling unused parameter check for DDP
Traceback (most recent call last):
File "/home/fjy/FastFlow3D-main/train.py", line 286, in
cli()
File "/home/fjy/FastFlow3D-main/train.py", line 263, in cli
trainer = pl.Trainer.from_argparse_args(args,
File "/home/fjy/anaconda3/envs/fastflow/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1917, in from_argparse_args
return from_argparse_args(cls, args, **kwargs)
File "/home/fjy/anaconda3/envs/fastflow/lib/python3.10/site-packages/pytorch_lightning/utilities/argparse.py", line 66, in from_argparse_args
return cls(**trainer_kwargs)
File "/home/fjy/anaconda3/envs/fastflow/lib/python3.10/site-packages/pytorch_lightning/utilities/argparse.py", line 340, in insert_env_defaults
return fn(self, **kwargs)
File "/home/fjy/anaconda3/envs/fastflow/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 408, in init
self._accelerator_connector = AcceleratorConnector(
File "/home/fjy/anaconda3/envs/fastflow/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py", line 192, in init
self._check_config_and_set_final_flags(
File "/home/fjy/anaconda3/envs/fastflow/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py", line 291, in _check_config_and_set_final_flags
raise ValueError(
ValueError: You selected an invalid accelerator name:accelerator='ddp'
. Available names are: cpu, cuda, hpu, ipu, mps, tpu.
Hello @Jabb0 , thanks for your implementation!
I have some questions about the performance of this paper. After your model is trained, can the test results reach the accuracy in the paper? And could you share the test results?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.