Giter VIP home page Giter VIP logo

stereo-transformer's People

Contributors

deh40 avatar giwel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

stereo-transformer's Issues

pretrain model

it seems that I can't download the link of the pre-trained model. I can't download it with the downloader such as Thunder. Could you please put a link of the resource of BaiduYun? Thank you very much.

Question about occlusion map of sceneflow

Hi, thanks for your great work. I have a question about occlusion map label. You mentioned that computing occluded area can be found in dataset/preprocess.py. However, with this file, we can only get occlusion region on image border, but the occlusion map provided in sample data of sceneflow seems not. Thank you very much for your help!

I appreciate you

The Dependencies were very clear, with python version and everything, thank you a lot, saves so much time. I wish everyone would do this.

how accurate the retification needed for sttr to work

I tried to run the inference using a stereo images captured by my stereo camera using pretrained models. But none of them showing reasonable depth prediction. However, if I use Kitti dataset, then it looks good. This makes me think that this might means:

  1. my stereo images retification is not good enough, if so, is there a way (or metric) to specify how good a rectification need to be to run sttr?
  2. or sttr is only good for the dataset it was trained for and not so good to run arbitray stereo images set?

Thank you for sharing your insight in advanced.

Selection of SCARED dataset

Hello!

That's an excellent job, thanks for sharing.

The paper mentioned that 315 pictures of the SCARED data set were selected. I want to know how did you select them?

And have you noticed that the calibration parameters of the fourth and fifth data sets are incorrect?

The interpolated depth and the video frame seem to be misaligned...

about the dimension of relative position encoding

Hi, nice work! I find that the 1d relative position encoding is of dimension 2W-1? Why it is not W? And I also wonder if this makes sttr unable to handle input of arbitrary size, e.g. the image is large that 2W-1?

about table5

hi, i don't really know how to get table5 in your essay without pre-train model provided by you. Did you train on sceneflow and finetune on kitti 2012 and 2015? And what this table in readme mean?
image

Torch version affects the network's training performance

I am opening this issue because apparently depending on which version of pytorch you are using, the training result will be different. Here are the 3px error evaluation curves of on a minimal example of overfitting the network on a single image for 300 epochs:

Screenshot from 2020-12-22 20-06-19

The purple line is trained with Pytorch 1.7.0 and the orange line is trained with Pytorch 1.5.1. As you can see, with version 1.7.0 the error rate is flat 100%, while version 1.5.1 the error rate is dropping. Reason for this is that the BatchNorm function has changed between version 1.5.1 and Pytorch 1.7.0. In version 1.5.1, if I disable track_running_stats here, both evaluation and training will use batch stats. However in Pytorch 1.7.0, it is forced to use running_mean and running_var in evaluation mode, while in training the batch stats is used. With track_running_stats disabled, the running_mean is 0 and running_var is 1, which is clearly different from the batch stats.

Therefore, instead of trying to do something against torch's implementation, I will recommend to use Pytorch 1.5.1 if you want to retrain from scratch. Otherwise, if you want to use other Pytorch version, you can replace all BatchNorm with InstanceNorm and port the learnt values from BatchNorm (i.e. weight and bias). This is a wontfix problem because it is quite hard to accomodate all torch versions.

Questions from novices

Hello, thank you for your open source code. It has been very helpful to me. But I have doubts about how to use "Pretrain". Especially how to import and run the “pretrained” model. Thank you. Hope to get a reply

Question about tokenizer.

Thank you very much for your code and excellent job. Could you please explain more about the Tokenizer? I haven't found any introduction to the Tokenizer in your paper. Thank you very much for your help!

ask for help

Excuse me,how can I download the SCARED dataset in miccai challenge. First I don't have the permission to download.so I try to join the challenge by clicking an option button"join",but I haven't received the link of dataset.Is that a correct way to download the dataset? Can you share the process of downloading the dataset,please.

Pretrain model:sttr_light_sceneflow_pretrained_model.pth.tar

Hi,
When I used the pre-trained model: sttr_light_sceneflow_pretrained_model.pth.tar, it told as follows:

RuntimeError: Error(s) in loading state_dict for STTR:
Missing key(s) in state_dict: "tokenizer.up.2.convTrans.0.weight", "tokenizer.up.2.convTrans.1.weight", "tokenizer.up.2.convTrans.1.bias", "tokenizer.up.2.convTrans.1.running_mean", "tokenizer.up.2.convTrans.1.running_var", "tokenizer.up.2.convTrans.1.num_batches_tracked", "tokenizer.up.2.convTrans.2.weight", "tokenizer.up.2.convTrans.2.bias", "tokenizer.dense_block.2.double_conv.0.weight", "tokenizer.dense_block.2.double_conv.1.weight", "tokenizer.dense_block.2.double_conv.1.bias", "tokenizer.dense_block.2.double_conv.1.running_mean", "tokenizer.dense_block.2.double_conv.1.running_var", "tokenizer.dense_block.2.double_conv.1.num_batches_tracked", "tokenizer.dense_block.2.double_conv.3.weight", "tokenizer.dense_block.2.double_conv.4.weight", "tokenizer.dense_block.2.double_conv.4.bias", "tokenizer.dense_block.2.double_conv.4.running_mean", "tokenizer.dense_block.2.double_conv.4.running_var", "tokenizer.dense_block.2.double_conv.4.num_batches_tracked".
size mismatch for tokenizer.bottle_neck.denselayer1.conv1.weight: copying a param with shape torch.Size([64, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([16, 128, 1, 1]).
size mismatch for tokenizer.bottle_neck.denselayer1.norm2.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([16]).
size mismatch for tokenizer.bottle_neck.denselayer1.norm2.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([16]).
size mismatch for tokenizer.bottle_neck.denselayer1.norm2.running_mean: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([16]).
size mismatch for tokenizer.bottle_neck.denselayer1.norm2.running_var: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([16]).
size mismatch for tokenizer.bottle_neck.denselayer1.conv2.weight: copying a param with shape torch.Size([16, 64, 3, 3]) from checkpoint, the shape in current model is torch.Size([4, 16, 3, 3]).
size mismatch for tokenizer.bottle_neck.denselayer2.norm1.weight: copying a param with shape torch.Size([144]) from checkpoint, the shape in current model is torch.Size([132]).
size mismatch for tokenizer.bottle_neck.denselayer2.norm1.bias: copying a param with shape torch.Size([144]) from checkpoint, the shape in current model is torch.Size([132]).
size mismatch for tokenizer.bottle_neck.denselayer2.norm1.running_mean: copying a param with shape torch.Size([144]) from checkpoint, the shape in current model is torch.Size([132]).
size mismatch for tokenizer.bottle_neck.denselayer2.norm1.running_var: copying a param with shape torch.Size([144]) from checkpoint, the shape in current model is torch.Size([132]).
size mismatch for tokenizer.bottle_neck.denselayer2.conv1.weight: copying a param with shape torch.Size([64, 144, 1, 1]) from checkpoint, the shape in current model is torch.Size([16, 132, 1, 1]).
size mismatch for tokenizer.bottle_neck.denselayer2.norm2.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([16]).
size mismatch for tokenizer.bottle_neck.denselayer2.norm2.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([16]).
size mismatch for tokenizer.bottle_neck.denselayer2.norm2.running_mean: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([16]).
size mismatch for tokenizer.bottle_neck.denselayer2.norm2.running_var: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([16]).
size mismatch for tokenizer.bottle_neck.denselayer2.conv2.weight: copying a param with shape torch.Size([16, 64, 3, 3]) from checkpoint, the shape in current model is torch.Size([4, 16, 3, 3]).
size mismatch for tokenizer.bottle_neck.denselayer3.norm1.weight: copying a param with shape torch.Size([160]) from checkpoint, the shape in current model is torch.Size([136]).
size mismatch for tokenizer.bottle_neck.denselayer3.norm1.bias: copying a param with shape torch.Size([160]) from checkpoint, the shape in current model is torch.Size([136]).
size mismatch for tokenizer.bottle_neck.denselayer3.norm1.running_mean: copying a param with shape torch.Size([160]) from checkpoint, the shape in current model is torch.Size([136]).
size mismatch for tokenizer.bottle_neck.denselayer3.norm1.running_var: copying a param with shape torch.Size([160]) from checkpoint, the shape in current model is torch.Size([136]).
size mismatch for tokenizer.bottle_neck.denselayer3.conv1.weight: copying a param with shape torch.Size([64, 160, 1, 1]) from checkpoint, the shape in current model is torch.Size([16, 136, 1, 1]).
size mismatch for tokenizer.bottle_neck.denselayer3.norm2.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([16]).
size mismatch for tokenizer.bottle_neck.denselayer3.norm2.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([16]).
size mismatch for tokenizer.bottle_neck.denselayer3.norm2.running_mean: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([16]).
size mismatch for tokenizer.bottle_neck.denselayer3.norm2.running_var: copying a param with shape to

sttr_light_kitti_finetuned_model.pth.tar

Hi @mli0603 , first of all great work!

I'd like to run the inference on the kitti finetuned model. To do that I guess I have to finetune the sttr_light_sceneflow_pretrained_model, but I'm having problems installing nvidia apex.

Could you please, If you have it, publish sttr_light_kitti_finetuned_model.pth.tar as well so I can test the inference on the kitti dataset directly?

BTW I'm creating a docker image and a inference script that I expect to share via PR when I manage to fix the nvidia apex issue.

Thanks!

I am not able to download pyTorch in jetson xavier NX

I am trying this code to run it on jetson Xavier Nx but i am not able to download pytorch1.5.1 i have tried " conda install pytorch==1.5.1 torchvision==0.6.0 cudatoolkit=10.2 -c pytorch"

Solving environment: failed

PackagesNotFoundError: The following packages are not available from current channels:

  • cudatoolkit=10.2
  • torchvision==0.6.0
  • pytorch==1.5.1

Current channels:

To search for alternate channels that may provide the conda package you're
looking for, navigate to

https://anaconda.org

and use the search bar at the top of the page.

IndexError: list index out of range

Thanks for your code but I got an index error ....I don't know how to deal with.....

pixel_error 0.944978416022389 0%|▏ | 16/22390 [00:22<8:50:24, 1.42s/it] Traceback (most recent call last): File "main.py", line 250, in main(args_) File "main.py", line 221, in main args.clip_max_norm, amp) File "/home/rc/StereoMatching/stereo-transformer/utilities/train.py", line 30, in train_one_epoch for idx, data in enumerate(tbar): File "/home/rc/anaconda3/envs/DL/lib/python3.7/site-packages/tqdm/std.py", line 1180, in iter for obj in iterable: File "/home/rc/anaconda3/envs/DL/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 517, in next data = self._next_data() File "/home/rc/anaconda3/envs/DL/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1199, in _next_data return self._process_data(data) File "/home/rc/anaconda3/envs/DL/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1225, in _process_data data.reraise() File "/home/rc/anaconda3/envs/DL/lib/python3.7/site-packages/torch/_utils.py", line 429, in reraise raise self.exc_type(msg) IndexError: Caught IndexError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/rc/anaconda3/envs/DL/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop data = fetcher.fetch(index) File "/home/rc/anaconda3/envs/DL/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/rc/anaconda3/envs/DL/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/rc/StereoMatching/stereo-transformer/dataset/scene_flow.py", line 129, in getitem occ_right_fname = self.occ_data[idx].replace('left', 'right') IndexError: list index out of range

my dataset is like this :

Dataset

Scene_flow

disparity

TRAIN

A(from sceneflow frames_disparity/TRAIN/A)
B(from sceneflow frames_disparity/TRAIN/B)
C(from sceneflow frames_disparity/TRAIN/C)

TEST.....

frame_finalpass

TRAIN

A(from sceneflow frames_finalpass/TRAIN/A)
B(from sceneflow frames_finalpass/TRAIN/B)
C(from sceneflow frames_finalpass/TRAIN/C)

TEST.....

occlusion

TRAIN

left(from sceneflow FlyingThings3D_subset/train/disparity_occlusions/left)
right(from sceneflow FlyingThings3D_subset/train/disparity_occlusions/right)

TEST

left(from sceneflow FlyingThings3D_subset/val/disparity_occlusions/left)
right(from sceneflow FlyingThings3D_subset/val/disparity_occlusions/right)

STTR-light inference

I want to try new light version of this network, but as far as I understand there is no inference script example for now? And training is not parametrized so that I can get "light" version just by changing script arguments?
So to run this model I need to train with pretrain.sh (main.py with default parameters) and later use code from inference_example.ipynb with these default values?

Question about Self Attention

Hi, I have a question about the TransformerSelfAttnLayer. I see you concat left feature and right feature, and then send them to the self-attention. According to my understanding, self-attention should only care about left feature (or right feature), but will such concat introduce right feature to left feature during the self-attention, just like cross-attention?

perform bad in test dataset

it's so strange that i got this result:
image
image
it looks like there is an overfitting when i train the model, but i didn't change anything(only some ajustments with tensorboard).

About training

Excuse me, have you tested the results of training directly on the KITTI?

self attention

hi, I want to know more about self attention in your work. Why this attention is necessary in your transformer working for stereo depth estimation? How self attention contribute to depth estimation?Why not you just keep cross attention?

questions about Table 2 in the paper

Hello, thanks for the good work.
Just about the Generalization without fine-tuning evaluation of PSMNet in Table 2.
In the Table 2, the Middlebury 3 px Error of PSMNet trained on Scene Flow datatest is 12.96, while we got the 21.2, which is far from that reported in your paper. And the pre-trained model from github performances 23.24.
We use the 15 training images of Middlebury from https://vision.middlebury.edu/stereo/submit3/
Do you use a subset of these 15 images or
10 evaluation training sets with GT or
13 additional datasets with GT from https://vision.middlebury.edu/stereo/data/scenes2014/#description ?
Wondering the reason about it.
Thanks.

Is it possible to evaluate/finetune the model with batchsize > 1 case?

Hello, thanks for your great work!

All the examples you've provide seems to work with batchsize 1 as default.

I tried to change the value of batchsize >= 2 and evaluate/finetune the model but the following error happens at this line (utilities/train.py at line 30).

Code :
image

Error :
image

It seems that each data in the same batch has different shape so we cannot combine them into one tensor for more than 2 batch case.

Could you help me to run with the batchsize >= 2? I just want to see how transformer works with this case.

Thanks!

Question: Sceneflow scheme, Bug: Loss is NaN, stop training

Hello
I'm trying to pretrain network on sceneflow, however the way my folders organized is way different from that the code tries to find. Could you please tell, what data exactly did you downloaded? DispNet/FlowNet2.0 dataset subsets -> RGB images (cleanpass), Disparity, Disparity Occlusions from here?

How much cuda memory is needed to run inference?

Hello, thank you for your great work

When I tried to run inference with "kitti_finetuned_model", I got "cuda ran out of memory" error.
My laptop has 4gb gpu memory, I thought it would be enough just to try out inference.

I tried decreasing batch size to 1, but result is still the same.

Could you guide me in the right direction to solve this issue?

Regards

self-attention visualization

Hi!

Good works! Do you mind providing the code or details about how to implement the visualization of the self-attention map?

Thanks!

How I get the occlusion part in the SCARED dataset?

As I can see in your data structure, there is an occlusion folder in your data, e.g. Scene Flow, MPI Sintel, KITTI 2015, MIDDLEBURY_2014, SCARED data. You give the answer about how to get the occlusion in the Scene Flow data, however, how to get the occlusion part in the SCARED data?
Thanks for your answer!!!

How the data formed in Flyingthing3D subset

Hi there!
Thank you for your wonderful work!Now I'm trying to train the model from scratch on the SceneFlow Flyingthing3D subset. I have downloaded the RGB images, disparity maps, and occlusion maps and formed them as the sample data.
But the dataloader scene_flow.py didn't work. How should I form the dataset? Could you help me with that?

Question about the whole algorithm?

Is your proposed algorithm trained in the supervised way? I see that the input of STTR contains GT disparity and occlusion region during training phase.

The method in the paper is a great idea, but I have some questions in the process of reading

image

1.In the last cross-attention layer,According to the formula mentioned in the attention, the output of the last cross-attention layer should be the VO weighted matrix map with the size of (IW * IW), so how to use this value map for optimal transport?

2.In the optimal transport algorithm, the paper mentioned "he cost matrix M is set to the negative of the attention computed by the cross-attention module in Equation 2", but the last cross-attention layer attention output is calculated for each pixel feature by splitting the channel dimension of feature descriptors, so which one is used the negative of αh to set the cost matrix?

Thanks!

How to get the dataset SCARED?

Dear author:
When I go to the Stereo Correspondence and Reconstruction of Endoscopic Data 's website ,I can't find the download link of dataset. Can you help me?
tks!

Torch not compiled with CUDA enabled

While i am using this code "model = STTR(args).cuda().eval()" i am getting this issue
Torch not compiled with CUDA enabled

how can i compile torch 1.8.0

TRAING ERROR

when I train the model using batch_size=2, the error is below:
please help me, thank you very much!

Traceback (most recent call last):
File "main.py", line 251, in
main(args_)
File "main.py", line 222, in main
args.clip_max_norm, amp)
File "/mnt/cfs/algorithm/xianda.guo/code/stereo-transformer/utilities/train.py", line 32, in train_one_epoch
_, losses, sampled_disp = forward_pass(model, data, device, criterion, train_stats)
File "/mnt/cfs/algorithm/xianda.guo/code/stereo-transformer/utilities/foward_pass.py", line 56, in forward_pass
outputs = model(inputs)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/cfs/algorithm/xianda.guo/code/stereo-transformer/module/sttr.py", line 97, in forward
attn_weight = self.transformer(feat_left, feat_right, pos_enc)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/cfs/algorithm/xianda.guo/code/stereo-transformer/module/transformer.py", line 110, in forward
attn_weight = self._alternating_attn(feat, pos_enc, pos_indexes, hn)
File "/mnt/cfs/algorithm/xianda.guo/code/stereo-transformer/module/transformer.py", line 79, in _alternating_attn
pos_indexes)
File "/opt/conda/lib/python3.7/site-packages/torch/utils/checkpoint.py", line 211, in checkpoint
return CheckpointFunction.apply(function, preserve, *args)
File "/opt/conda/lib/python3.7/site-packages/torch/utils/checkpoint.py", line 90, in forward
outputs = run_function(*args)
File "/mnt/cfs/algorithm/xianda.guo/code/stereo-transformer/module/transformer.py", line 74, in custom_cross_attn
return module(*inputs, False)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/cfs/algorithm/xianda.guo/code/stereo-transformer/module/transformer.py", line 185, in forward
pos_indexes=pos_indexes)[0]
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/cfs/algorithm/xianda.guo/code/stereo-transformer/module/attention.py", line 106, in forward
attn_pos_feat = torch.einsum('vnec,wvec->newv', k, q_r) # NxExWxW'
File "/opt/conda/lib/python3.7/site-packages/torch/functional.py", line 299, in einsum
return _VF.einsum(equation, operands) # type: ignore[attr-defined]
RuntimeError: einsum(): operands do not broadcast with remapped shapes [original->remapped]: [71, 360, 8, 16]->[360, 8, 1, 71, 16] [213, 213, 8, 16]->[1, 8, 213, 213, 16]

Facing UserWarning after running the command -> output = model(input_data)

I am having PyTorch 1.9.0 installed with all the other requirements.
In the inference_example.ipynb file. When I run the command -> output = model(input_data)

It throws me a user_warning:-
/home/enord/.local/lib/python3.6/site-packages/torch/_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at /media/nvidia/NVME/pytorch/pytorch-v1.9.0/aten/src/ATen/native/BinaryOps.cpp:467.)
return torch.floor_divide(self, other)

And my system also comes in a hold position, the cursor of mouse stops working, the system performance lacks a lot and after that the browser collapse.
Please tell me how to resolve this as soon as possible.
I will be very thankful to you.

Train crashes inside the Attention module

Hi, Thanks for sharing this implementation.

I'm trying to reproduce the paper results by training the model on sceneflow but training is constantly crashing inside the attention model when scaling the query tensor.

Did you have this problem before?

Here is the Traceback:

Traceback (most recent call last):
File "main.py", line 250, in
main(args_)
File "main.py", line 236, in main
eval_stats = evaluate(model, criterion, data_loader_val, device, epoch, summary_writer, False)
File "/home/user/anaconda3/envs/sttr/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 26, in decorate_context
return func(*args, **kwargs)
File "/media/user/Data/user/stereo-transformer/utilities/eval.py", line 34, in evaluate
outputs, losses, sampled_disp = forward_pass(model, data, device, criterion, eval_stats, idx, logger)
File "/media/user/Data/user/stereo-transformer/utilities/foward_pass.py", line 55, in forward_pass
outputs = model(inputs)
File "/home/user/anaconda3/envs/sttr/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/media/user/Data/user/stereo-transformer/module/sttr.py", line 101, in forward
attn_weight = self.transformer(feat_left, feat_right, pos_enc)
File "/home/user/anaconda3/envs/sttr/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/media/user/Data/user/stereo-transformer/module/transformer.py", line 112, in forward
attn_weight = self._alternating_attn(feat, pos_enc, pos_indexes, hn)
File "/media/user/Data/user/stereo-transformer/module/transformer.py", line 60, in _alternating_attn
feat = checkpoint(create_custom_self_attn(self_attn), feat, pos_enc, pos_indexes)
File "/home/user/anaconda3/envs/sttr/lib/python3.6/site-packages/torch/utils/checkpoint.py", line 163, in checkpoint
return CheckpointFunction.apply(function, preserve, *args)
File "/home/user/anaconda3/envs/sttr/lib/python3.6/site-packages/torch/utils/checkpoint.py", line 74, in forward
outputs = run_function(*args)
File "/media/user/Data/user/stereo-transformer/module/transformer.py", line 56, in custom_self_attn
return module(*inputs)
File "/home/user/anaconda3/envs/sttr/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/media/user/Data/user/stereo-transformer/module/transformer.py", line 143, in forward
pos_indexes=pos_indexes)
File "/home/user/anaconda3/envs/sttr/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/media/user/Data/user/stereo-transformer/module/attention.py", line 86, in forward
q = q * scaling
UnboundLocalError: local variable 'q' referenced before assignment

Error about the “q” in the MultiheadAttentionRelative module

Hi, I met a problem when running the codes. Please refer to Line 37 to Line 63 of "module/attention.py". Here you list two conditions: one for self-attention and another for cross-attention. But the code is "if" and "elif", I wonder whether there is a third condition with "else"? I have this question because an error stated "local variable 'q' referenced before assignment" occurs, which means there might be a third condition. Could you please tell me how to solve this problem?

GitError

i think your work is charming and i want to learn more about it. when i run the code, it has some error that is:
File "main.py", line 250, in
main(args_)
File "main.py", line 197, in main
checkpoint_saver = Saver(args)
File "/home/mc/Project/stereo-transformer-main/utilities/checkpoint_saver.py", line 24, in init
self.save_experiment_config()
File "/home/mc/Project/stereo-transformer-main/utilities/checkpoint_saver.py", line 41, in save_experiment_config
repo = Repository('.')
File "/opt/anaconda3/envs/GWCNet/lib/python3.6/site-packages/pygit2/repository.py", line 1494, in init
path_backend = init_file_backend(path, flags)
_pygit2.GitError: Repository not found at .

can you give me some advise to deal with it? thank you.

Is the training code only support "batch size=1"?

Hi @mli0603, at first thanks for open source codes!!!
but when I want to use the code with a larger batchsize to train , it's seem to have some bugs in the codes.
even though I removed the randomcrop, to make sure each img in a batch is the same shape , I still get som errors like below:
Traceback (most recent call last): File "train_multi_dataset.py", line 253, in <module> main(args_) File "train_multi_dataset.py", line 222, in main args.clip_max_norm, amp) File "/home/hadoop-uavcvml/cephfs/data/lixianyang/stereo_match_lxy/STTR/utilities/train.py", line 32, in train_one_epoch _, losses, sampled_disp = forward_pass(model, data, device, criterion, train_stats) File "/home/hadoop-uavcvml/cephfs/data/lixianyang/stereo_match_lxy/STTR/utilities/foward_pass.py", line 55, in forward_pass outputs = model(inputs) File "/usr/local/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/hadoop-uavcvml/cephfs/data/lixianyang/stereo_match_lxy/STTR/module/sttr.py", line 97, in forward attn_weight = self.transformer(feat_left, feat_right, pos_enc) File "/usr/local/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/hadoop-uavcvml/cephfs/data/lixianyang/stereo_match_lxy/STTR/module/transformer.py", line 110, in forward attn_weight = self._alternating_attn(feat, pos_enc, pos_indexes, hn) File "/home/hadoop-uavcvml/cephfs/data/lixianyang/stereo_match_lxy/STTR/module/transformer.py", line 79, in _alternating_attn pos_indexes) File "/usr/local/lib64/python3.6/site-packages/torch/utils/checkpoint.py", line 163, in checkpoint return CheckpointFunction.apply(function, preserve, *args) File "/usr/local/lib64/python3.6/site-packages/torch/utils/checkpoint.py", line 74, in forward outputs = run_function(*args) File "/home/hadoop-uavcvml/cephfs/data/lixianyang/stereo_match_lxy/STTR/module/transformer.py", line 74, in custom_cross_attn return module(*inputs, False) File "/usr/local/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/hadoop-uavcvml/cephfs/data/lixianyang/stereo_match_lxy/STTR/module/transformer.py", line 185, in forward pos_indexes=pos_indexes)[0] File "/usr/local/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/hadoop-uavcvml/cephfs/data/lixianyang/stereo_match_lxy/STTR/module/attention.py", line 89, in forward k = k.contiguous().view(-1, bsz, self.num_heads, head_dim) RuntimeError: shape '[-1, 1440, 8, 16]' is invalid for input of size 2359296

How much cuda memory is needed to train?

Hi! Thank you for sharing such great work!
My question is how much cuda memory is needed for training. I read that you used a single Titan RTX GPU to train in your paper; does that mean this network needs 24G GPU to train with a batch size of 1?
Thanks~

Custom Data Inference

Hi authors, Thanks for providing the code for training and inference. I wanted to run the STTR on custom frames to see the depth output required. As seen in training data, we need to provide along with stereo images the occlusion map and initial disparity. Do you have any suggestions or algorithms that we can use to get the occlusion information from stereo images to run STTR on custom data?

Thanks and Regards
Aakash Rajpal

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.