vcip-rgbd / dformer Goto Github PK

View Code? Open in Web Editor NEW

121.0 6.0 14.0 4.92 MB

[ICLR 2024] DFormer: Rethinking RGBD Representation Learning for Semantic Segmentation

Home Page: https://yinbow.github.io/Projects/DFormer/index.html

License: MIT License

Shell 0.32% Python 99.68%

nyu-depth-v2 rgbd-segmentation sun-rgbd iclr-2024 rgbd-representation-learning

dformer's Issues

How to visualized?

Thank you very much for your outstanding work. How can the results be visualized? I'm not very clear about this i

How to check FLOPs

Hello! Thank you for your hard work.

The paper includes calculations of FLOPs in a table, but upon reviewing the code, it's challenging to locate where FLOPs are measured.

Could you possibly tell me where it is measured?"

Calculate the latency

Hello.

Thank you for releasing good research.

When i check Appendix B(Details of DFormer), there are results of latency measurements.

Could you possibly share the code used for measuring latency?

Questions for using pretrained models for Evaluation

Hello!
Thanks for the nice work.
I have a question of using your pretrained network on evaluation.
I'm trying to run your model on my custom dataset...
So, I slightly changed the code following infer.sh, local_configs/template/DFormer_Large.py.
However, when I try to change the # of classes,

DFormer/local_configs/template/DFormer_Large.py

Line 34 in 2aa25e3

C.num_classes = N

I get the error because the NYUv2 pretrained model used NYUv2 dataset, which has 40 classes.
Then, is there any way to use custom dataset which has other # of classes?

Error Log:
RuntimeError: Error(s) in loading state_dict for EncoderDecoder:
size mismatch for decode_head.conv_seg.weight: copying a param with shape torch.Size([40, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([5, 512, 1, 1]).
size mismatch for decode_head.conv_seg.bias: copying a param with shape torch.Size([40]) from checkpoint, the shape in current model is torch.Size([5]).

My Settings:
In local_configs/template/DFormer_Large.py:

  C.dataset_name = "NYUDepthv2" # for test
  ...
  C.num_classes = 5
  ...
  C.pretrained_model = 'checkpoints/pretrained/DFormer_Large.pth.tar'
  ...

In infer.sh:

  GPUS=2
  NNODES=1
  NODE_RANK=${NODE_RANK:-0}
  PORT=${PORT:-29958}
  MASTER_ADDR=${MASTER_ADDR:-"127.0.0.1"}
  
  PYTHONPATH="$(dirname $0)/..":$PYTHONPATH \
  python -m torch.distributed.launch \
      --nnodes=$NNODES \
      --node_rank=$NODE_RANK \
      --master_addr=$MASTER_ADDR \
      --nproc_per_node=$GPUS \
      --master_port=$PORT  \
      utils/infer.py \
      --config=local_configs.template.DFormer_Large \
      --continue_fpath=checkpoints/trained/NYUv2_DFormer_Large.pth \
      --save_path "/data/result/" \
      --gpus=$GPUS

I'm currently using my own dataset of classes: 5.
Using 40 classes could be one solution, but this way makes so much error, so I want to solve this issue.
I'm confused that if I want to use with other # of classes, there should be some fine-tuning,,,. or like that.
Also, do you have any ideas which model will work well? I'm currently using NYUv2_DFormer_Large.pth, but wonder would it be the best solution.

Thanks:)

您好，请问该项目有支持windows系统的代码吗？

作者您好，我在自己数据集上复现了该模型，在使用infer.py时我想debug每个步骤的具体结果，但是该脚本似乎不支持在windows系统下运行，请问作者是否有相应代码，十分感谢！！！

自制数据集的要求

感谢您杰出的工作。我想要自己制作数据集去运行您的代码，想知道代码对数据集有什么样的要求？

导出onnx时出现了不支持AdaptiveAvgPool2d算子

作者您好，在DFormer.py的attention模块中有一步骤self.pool = nn.AdaptiveAvgPool2d(output_size=(7,7))，它的目的是可以接受任意输入shape，返回（7,7）的池化结果，但是这里的AdaptiveAvgPool2d算子在导出onnx模型时不被支持，我想请问设置动态输入是否有什么具体的作用，如果想导出onnx模型该如何解决这个问题，虚心请教

May I ask if you can publish the code for RGB-D SOD

Hello, your paper mentions the relevant work of RGB-D salient object detection. Can you provide the corresponding code?

爆显存问题

你好，首先感谢您的代码公开分享。
作为一个初学者，有个问题向您请教：我在复现您的代码（数据集使用NYU，在两张2080 TI上进行训练），第一个epoch正常进行，但是在进行val的时候却显示cuda out of memory

权重文件加载

您好，请问如何加载权重文件。我跑过的代码都是加载以.pth结尾的权重文件，而您提供的是.tar压缩包形式，解压后没有.pth的权重文件

期待您的回复

关于benchmark.py

请问benchmark.py计算计算量的时候是不是没有把矩阵运算算进去？为什么我更改了平均池化和双线性插值的窗口大小后，benchmark.py显示的计算量没有发生变化？

可以提供一下伪深度图吗？

When using raw depth maps，is this done by copying the single-channel depth image input 3 times？

Doubt about num_classes for NYU Depth v2 Dataset

Hello, thank you for sharing the code of this great work!!

I have a query about the NYU dataset.

The NYU dataset has 40 classes. So, in the final output layer of all of your decoders, num_classes is set to 40.

When I check the pixel values of the ground truths (0.png, 1.png, etc) inside the NYUDepthv2/Label/ directory, the pixel values range from 0 to 40. This indicates there are 41 classes.

import cv2
import numpy as np

gt_path = "...../NYUDepthv2_DFormer/Label"

for idx in range(10):
    label = f"{gt_path }/{idx}.png"
    unique_values = np.unique(cv2.imread(gt_path ))
    print(f"Classes in label {idx}: {unique_values}")

Classes in label 0: [ 0 1 2 3 5 7 12 22 24 26 38 39 40]
Classes in label 1: [ 0 1 3 12 22 24 26 34 38 39 40]
Classes in label 2: [ 0 1 5 7 8 26 29 38 40]
Classes in label 3: [ 0 1 3 5 14 26 40]
Classes in label 4: [ 0 1 3 5 7 8 12 15 22 26 30 34 38 39 40]
Classes in label 5: [ 0 1 2 5 7 22 29 38 39 40]
Classes in label 6: [ 0 1 2 5 15 22 38 39 40]
Classes in label 7: [ 0 1 2 5 8 9 26 37 38 39 40]
Classes in label 8: [ 0 1 2 5 7 8 11 15 22 26 29 38 39 40]
Classes in label 9: [ 0 1 2 3 8 15 22 26 38 39 40]

So can you kindly tell me how have you dealt with the extra class in the ground truth labels?
In my code, this discrepency causes error in the cross-entropy loss function.

Thank you.

Inference for SUN RGB-D dataset

When I executed 'inver.sh', I got failed message from evaluate_msf function in 'val_mm.py'.
It seems that inference with SUN RGB-D dataset was not implemented. (line number 385).
Is there any reason ?

怎么跑自制数据集？

感谢您杰出的工作，我尝试用自己制作的数据集去运行您的代码，出现了一些错误，loss=nan，以下是一些错误信息
27 23:43:55 Initing weights ...
27 23:44:00 begin trainning:
Epoch 1/500 Iter 156/156: lr=5.9615e-06 loss=nan total_loss=nan: [01:48<00:00, 1.44it/s]
27 23:45:48 WRN NaN or Inf found in input tensor.
期待您的答复

ImageNet-1k pretrain training time

I read your paper with great interest. In particular, it was interesting in that an RGB-D pretrain model was built.

I checked that 8 NVIDIA 3090s were used when training DFormer with ImageNet-1K. I would also like to conduct additional experiments on ImageNet-1K. Can I know the training time required for each size?

Single GPU training

Thanks for your excellent work! I melt the issue when training the model with a single 3090 GPU. I cannot sort out the source of the issue.

RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.

Evaluation with my own dataset

Hello.
Thanks for sharing your nice work!
I'm trying to run my own dataset with your model, but got stuck by some points.
Could you give any suggestions?

(1) I'm trying to run DFormer on my own captured rgb videos. Therefore, first, I made depth images using Depth Estimation model. However, as your model also requires labels for metric evaluation, I don't have any ideas for that. Could you give any easy way to make label image?

(2) In this code,

DFormer/utils/val_mm.py

Line 105 in 2aa25e3

palette = [

you used palette for saving the image.
Would you give any descriptions for (i) what the palette is(for which purpose), (ii) how can I make the palette for my own dataset?

Thanks!

vcip-rgbd / dformer Goto Github PK

dformer's Issues

您好，请问如何加载权重文件。我跑过的代码都是加载以.pth结尾的权重文件，而您提供的是.tar压缩包形式，解压后没有.pth的权重文件

期待您的回复

Recommend Projects

Recommend Topics

Recommend Org