The dimensions do not broadcast about vit-adapter HOT 3 CLOSED

czczup commented on June 9, 2024

The dimensions do not broadcast

from vit-adapter.

Comments (3)

czczup commented on June 9, 2024

Hi, great work! Trying to test it out. Maybe a bug:

Settings config: mask2former_beit_adapter_base_512_40k_cocostuff10k_ss.py checkpoint: mask2former_beit_adapter_base_512_40k_cocostuff10k.pth.tar

Error In this line in beit.py:
            attn = attn + relative_position_bias.unsqueeze(0)
The dimensions do not broadcast. If one takes an input image 1x3x128x128, then the dimensions are:
# attn.shape
# ([1, 12, 65, 65])

# relative_position_bias.unsqueeze(0).shape
# ([1, 12, 1025, 1025])

Yes, BEiT has a limitation on image resolution. When it is set to img_size=512, the input image must be 1x3x512x512.
It does not support a dynamic resolution (due to its implementation of relative position encoding), one possible way is to pad a 128x128 image to 512x512. Or if all images are 128x128, set img_size=128.

from vit-adapter.

rmihaylov commented on June 9, 2024

Thank you for your fast reply. I fixed it in the code like this (rewriting in TF2). Works like charm:

attn = attn + tf.expand_dims(relative_position_bias[:,:N,:N], 0)

from vit-adapter.

paarthmadan29 commented on June 9, 2024

Can you explain your solution in more detail please @rmihaylov . I'm using pytorch.
Thanks

from vit-adapter.

The dimensions do not broadcast about vit-adapter HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent