tsunghan-wu / redal Goto Github PK

🍀 Official pytorch implementation of "ReDAL: Region-based and Diversity-aware Active Learning for Point Cloud Semantic Segmentation. Wu et al. ICCV 2021."

Python 74.79% CMake 0.13% C++ 24.99% Shell 0.09%

3d-vision active-learning computer-vision point-cloud pytorch-implementation semantic-segmentation

redal's People

Contributors

Stargazers

Watchers

Forkers

qingyonghu songw-zju liuxinren456852 rongmq8802 jie311 whuhxb ivo-gilles zengxinran wjjiansheng luobo555

redal's Issues

Semantic KITTI training configuration

Hi authors, thanks for the interesting work. I am having trouble running the ReDAL benchmark for Semantic KITTI. I've processed the data according to the steps with no errors, but I'm hitting the following error when training:

/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [31,0,0] Assertion `t >= 0 && t < n_classes` failed.

I ran training with:

CUDA_VISIBLE_DEVICES=0 python3 train_region_active.py -n semantic_kitti -d <dataset location> --active_method ReDAL --ReDAL_config_path active_selection/ReDAL_SemanticKITTI.config --num_classes 19

I tried with num_classes as 20 as well and got the same error.

With some debugging, it appears that the max of targets = batch['targets'].F.long().cuda(non_blocking=True) = 255 at line 86 of base_agent.py which is more than the 19 classes of Semantic KITTI. I see the same range of values [0,255] for labels after adding print statements in region_dataset.py. Not sure if this is the intended range of labels but the mismatch in num classes and range of labels should not work when directly fed into torch.nn.CrossEntropyLoss(ignore_index=self.args.ignore_idx) in line 44 of base_agent.py.

The miou result in S3DIS

Hi, I run ReDAL in S3DIS, but no matter how many times I try, I can't reproduce the results in the paper, for the best I reproduce, it can only get 38.2/5%, 43.2/7%,45.5/%9,48.9/%11, much lower than the paper show. On the other hand, the results of the experiment seem to be very unstable, and the fluctuation of the results of each experiment is very large, about 2%~3%

I try it on two RTX3090(24GB) and the experiment settings are same as the paper(the code default setting). I see others in issue#5 said they can reproduce the result on two GPUs (V5000) with 24GB, so I don't doubt the authenticity of the experimental results.

One reason I guess is that the division of superpoints is not stable, leading to different experimental results, but I have not been able to obtain superpoints that can achieve the performance presented in the paper. Or do you have any other suggestions to reproduce your model performance?

VCCS Efficiency for Large-scale datasets; nuScenes dependency

About config file

Hi, thanks for opening source this brilliant work.

But it seems you miss the ReDAL config file. Could you provide that?

Many thanks~

Question about the S3DIS Validation Set

Thank you for your great work!
In your code I notice that you divide the S3DIS dataset into train set and val set(area 5, like other projects have done). But when a new active learning iteration begins , you choose the checkpoint that performs the best miou in val set to initialize parameters this iteration. I'm wondering if this may causes information leakage and makes the model overfitting.

Question about memory

Hello, first thank you for your excellent work. And I would like to ask you several questions.

When I run your program on S3DIS, after each active learning, the memory will increase greatly, resulting in cuda being out of memory. Is this the same when you are training? I train the network on 4 1080Ti GPUs and set the batch size to 4.
And even if the second iteration can run successfully, the speed of a single epoch will be much slower. Is this normal?

Distributed training lowers perfomance

Hello, first of all thanks for sharing your codebase!
We've been testing it for a while and it's working well for us.
But unfortunately we've noticed that turning on distributed training degrades the performance significantly on our setup.
Running fully supervised on the S3DIS dataset with spvcnn as the model we get ~62% validation mIoU.
With same hyper-parameters and distributed_training on 4 gpus it is much faster, but we only get ~50%.
Tweaking some hps and increasing the training epochs, the best we got was ~56%. (with batch size 2 and lr 0.005)

Now we're wondering, if you used the distributed training and noticed similar performance drops?
Or are there maybe some other parameters that need to be adjusted when using distributed training?

Thanks in advance!

Error 404 when cloning repository

Hi, I am getting the following error when performing git clone on the repository

git clone https://github.com/tsunghan-wu/ReDAL
Cloning into 'ReDAL'...
remote: Enumerating objects: 193, done.
remote: Counting objects: 100% (193/193), done.
remote: Compressing objects: 100% (137/137), done.
remote: Total 193 (delta 55), reused 182 (delta 50), pack-reused 0
Receiving objects: 100% (193/193), 29.03 MiB | 20.89 MiB/s, done.
Resolving deltas: 100% (55/55), done.
Checking out files: 100% (153/153), done.
Downloading data_preparation/region_division/region_label_data.json (732 KB)
Error downloading object: data_preparation/region_division/region_label_data.json (0046aba): Smudge error: Error downloading data_preparation/region_division/region_label_data.json (0046aba13aac0d39b798b7df19003f3c52e0aae5d0578b6ddc59f40905d12de1): [0046aba13aac0d39b798b7df19003f3c52e0aae5d0578b6ddc59f40905d12de1] Object does not exist on the server: [404] Object does not exist on the server

manual git lfs pull results in

~/code/ReDAL$ git lfs pull
Git LFS: (0 of 0 files, 4 skipped) 0 B / 0 B, 49.13 MB skipped                                                                                    [8ab4c6ad5d9833f71c2244ce96772041c668546b5911826fe13a21158222874e] Object does not exist on the server: [404] Object does not exist on the server
[e78af34772365164a11efcd11d758bb4036df0e51c1aabbb24ca20a9326f72f8] Object does not exist on the server: [404] Object does not exist on the server
[0046aba13aac0d39b798b7df19003f3c52e0aae5d0578b6ddc59f40905d12de1] Object does not exist on the server: [404] Object does not exist on the server
[0f831139e7bd89c360b69462b6bbba3722ad56f27ac9a2dd455f329b9d5360b6] Object does not exist on the server: [404] Object does not exist on the server
error: failed to fetch some objects from 'https://github.com/tsunghan-wu/ReDAL.git/info/lfs'

data preparation for semantic_kitti

Hi, thanks for your great work!

I'm getting errors when trying to prepare data for semantickitti in Step 3(Calculate point cloud properties). gen_surface_variation.py only contains opreations for S3DIS. When I modify the dataset setting for semantic_kitti, I get the following error

Traceback (most recent call last):
  File "gen_surface_variation.py", line 58, in <module>
    for fname in os.listdir(coords_dir):
FileNotFoundError: [Errno 2] No such file or directory: '/data/semantic_kitti/dataset/sequences/00/coords'

Image segmentation

@tsunghan-mama hi thanks for sharing this wonderful code base i had few queries

Does this source code support or can be user for point cloud data lidar for automotive driving scenario
Does this source code also support 2d image segmentation
Thanks in advance

tsunghan-wu / redal Goto Github PK

redal's People

Contributors

Stargazers

Watchers

Forkers

redal's Issues

Semantic KITTI training configuration

The miou result in S3DIS

VCCS Efficiency for Large-scale datasets; nuScenes dependency

About config file

Question about the S3DIS Validation Set

Question about memory

Distributed training lowers perfomance

Error 404 when cloning repository

data preparation for semantic_kitti

Image segmentation

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent