vcip-rgbd / dformer Goto Github PK
View Code? Open in Web Editor NEW[ICLR 2024] DFormer: Rethinking RGBD Representation Learning for Semantic Segmentation
Home Page: https://yinbow.github.io/Projects/DFormer/index.html
License: MIT License
[ICLR 2024] DFormer: Rethinking RGBD Representation Learning for Semantic Segmentation
Home Page: https://yinbow.github.io/Projects/DFormer/index.html
License: MIT License
Thank you very much for your outstanding work. How can the results be visualized? I'm not very clear about this i
Hello! Thank you for your hard work.
The paper includes calculations of FLOPs in a table, but upon reviewing the code, it's challenging to locate where FLOPs are measured.
Could you possibly tell me where it is measured?"
Hello.
Thank you for releasing good research.
When i check Appendix B(Details of DFormer), there are results of latency measurements.
Could you possibly share the code used for measuring latency?
Hello!
Thanks for the nice work.
I have a question of using your pretrained network on evaluation.
I'm trying to run your model on my custom dataset...
So, I slightly changed the code following infer.sh
, local_configs/template/DFormer_Large.py
.
However, when I try to change the # of classes,
I get the error because the NYUv2 pretrained model used NYUv2 dataset, which has 40 classes.
Then, is there any way to use custom dataset which has other # of classes?
Error Log:
RuntimeError: Error(s) in loading state_dict for EncoderDecoder:
size mismatch for decode_head.conv_seg.weight: copying a param with shape torch.Size([40, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([5, 512, 1, 1]).
size mismatch for decode_head.conv_seg.bias: copying a param with shape torch.Size([40]) from checkpoint, the shape in current model is torch.Size([5]).
My Settings:
In local_configs/template/DFormer_Large.py
:
C.dataset_name = "NYUDepthv2" # for test
...
C.num_classes = 5
...
C.pretrained_model = 'checkpoints/pretrained/DFormer_Large.pth.tar'
...
In infer.sh
:
GPUS=2
NNODES=1
NODE_RANK=${NODE_RANK:-0}
PORT=${PORT:-29958}
MASTER_ADDR=${MASTER_ADDR:-"127.0.0.1"}
PYTHONPATH="$(dirname $0)/..":$PYTHONPATH \
python -m torch.distributed.launch \
--nnodes=$NNODES \
--node_rank=$NODE_RANK \
--master_addr=$MASTER_ADDR \
--nproc_per_node=$GPUS \
--master_port=$PORT \
utils/infer.py \
--config=local_configs.template.DFormer_Large \
--continue_fpath=checkpoints/trained/NYUv2_DFormer_Large.pth \
--save_path "/data/result/" \
--gpus=$GPUS
I'm currently using my own dataset of classes: 5.
Using 40 classes could be one solution, but this way makes so much error, so I want to solve this issue.
I'm confused that if I want to use with other # of classes, there should be some fine-tuning,,,. or like that.
Also, do you have any ideas which model will work well? I'm currently using NYUv2_DFormer_Large.pth
, but wonder would it be the best solution.
Thanks:)
作者您好,我在自己数据集上复现了该模型,在使用infer.py时我想debug每个步骤的具体结果,但是该脚本似乎不支持在windows系统下运行,请问作者是否有相应代码,十分感谢!!!
感谢您杰出的工作。我想要自己制作数据集去运行您的代码,想知道代码对数据集有什么样的要求?
作者您好,在DFormer.py的attention模块中有一步骤self.pool = nn.AdaptiveAvgPool2d(output_size=(7,7)),它的目的是可以接受任意输入shape,返回(7,7)的池化结果,但是这里的AdaptiveAvgPool2d算子在导出onnx模型时不被支持,我想请问设置动态输入是否有什么具体的作用,如果想导出onnx模型该如何解决这个问题,虚心请教
Hello, your paper mentions the relevant work of RGB-D salient object detection. Can you provide the corresponding code?
你好,首先感谢您的代码公开分享。
作为一个初学者,有个问题向您请教:我在复现您的代码(数据集使用NYU,在两张2080 TI上进行训练),第一个epoch正常进行,但是在进行val的时候却显示cuda out of memory
请问benchmark.py计算 计算量的时候是不是没有把矩阵运算算进去?为什么我更改了平均池化和双线性插值的窗口大小后,benchmark.py显示的计算量没有发生变化?
可以提供一下伪深度图吗?
Hello, thank you for sharing the code of this great work!!
I have a query about the NYU dataset.
The NYU dataset has 40 classes. So, in the final output layer of all of your decoders, num_classes is set to 40.
When I check the pixel values of the ground truths (0.png, 1.png, etc) inside the NYUDepthv2/Label/ directory, the pixel values range from 0 to 40. This indicates there are 41 classes.
import cv2
import numpy as np
gt_path = "...../NYUDepthv2_DFormer/Label"
for idx in range(10):
label = f"{gt_path }/{idx}.png"
unique_values = np.unique(cv2.imread(gt_path ))
print(f"Classes in label {idx}: {unique_values}")
Classes in label 0: [ 0 1 2 3 5 7 12 22 24 26 38 39 40]
Classes in label 1: [ 0 1 3 12 22 24 26 34 38 39 40]
Classes in label 2: [ 0 1 5 7 8 26 29 38 40]
Classes in label 3: [ 0 1 3 5 14 26 40]
Classes in label 4: [ 0 1 3 5 7 8 12 15 22 26 30 34 38 39 40]
Classes in label 5: [ 0 1 2 5 7 22 29 38 39 40]
Classes in label 6: [ 0 1 2 5 15 22 38 39 40]
Classes in label 7: [ 0 1 2 5 8 9 26 37 38 39 40]
Classes in label 8: [ 0 1 2 5 7 8 11 15 22 26 29 38 39 40]
Classes in label 9: [ 0 1 2 3 8 15 22 26 38 39 40]
So can you kindly tell me how have you dealt with the extra class in the ground truth labels?
In my code, this discrepency causes error in the cross-entropy loss function.
Thank you.
When I executed 'inver.sh', I got failed message from evaluate_msf function in 'val_mm.py'.
It seems that inference with SUN RGB-D dataset was not implemented. (line number 385).
Is there any reason ?
感谢您杰出的工作,我尝试用自己制作的数据集去运行您的代码,出现了一些错误,loss=nan,以下是一些错误信息
27 23:43:55 Initing weights ...
27 23:44:00 begin trainning:
Epoch 1/500 Iter 156/156: lr=5.9615e-06 loss=nan total_loss=nan: [01:48<00:00, 1.44it/s]
27 23:45:48 WRN NaN or Inf found in input tensor.
期待您的答复
I read your paper with great interest. In particular, it was interesting in that an RGB-D pretrain model was built.
I checked that 8 NVIDIA 3090s were used when training DFormer with ImageNet-1K. I would also like to conduct additional experiments on ImageNet-1K. Can I know the training time required for each size?
Thanks for your excellent work! I melt the issue when training the model with a single 3090 GPU. I cannot sort out the source of the issue.
RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.
Hello.
Thanks for sharing your nice work!
I'm trying to run my own dataset with your model, but got stuck by some points.
Could you give any suggestions?
(1) I'm trying to run DFormer on my own captured rgb videos. Therefore, first, I made depth images using Depth Estimation model. However, as your model also requires labels
for metric evaluation, I don't have any ideas for that. Could you give any easy way to make label image?
(2) In this code,
Line 105 in 2aa25e3
palette
for saving the image.palette
is(for which purpose), (ii) how can I make the palette
for my own dataset?
Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.