Is this *2 intentional? If so, what's the reason?

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

You are awesome <a class="user-mention notranslate" data-hovercard-type="user" data-ho

AttentionAgent: mysterious 2*top_k, maybe a mistake? about brain-tokyo-workshop HOT 6 CLOSED

google commented on May 17, 2024

AttentionAgent: mysterious 2*top_k, maybe a mistake?

from brain-tokyo-workshop.

Comments (6)

lerrytang commented on May 17, 2024

This code is correct. *2 because each center position has (x, y), so the next dense layer should have k*2 inputs.

from brain-tokyo-workshop.

maraoz commented on May 17, 2024

Thanks @lerrytang! I was just about to comment saying I tried removing it and the code's not working anymore :)

I guess I need to dive deeper into the attention module code to really understand what it's doing. Thanks again!

from brain-tokyo-workshop.

lerrytang commented on May 17, 2024

Again, thanks for being interested. You are always welcome for more questions :)

from brain-tokyo-workshop.

maraoz commented on May 17, 2024

You are awesome @lerrytang! If you offer so generously... I'm confused by the interaction between SelfAttention and MLPSolution... As far as I can tell, the MLPSolution is only receiving the (x,y) coordinates of the best top_k patches... and making a decision on which action to take only with that info? From reading your article I thought that the agent could 'look' at those patches to make the decision, but I'm not seeing anywhere in the code where the MLPSolution is able to access the pixel info from the patches! Is this magic or what!?!?!? :P

For example, setting top_k to 1 and printing the dimension of MLPSolution's input (print('MLPSolution input', inputs, inputs.shape)), I get:

...
MLPSolution input tensor([0.9219, 0.9219]) torch.Size([2])
MLPSolution input tensor([0.4219, 0.4844]) torch.Size([2])
MLPSolution input tensor([0.9219, 0.9219]) torch.Size([2])
MLPSolution input tensor([0.4219, 0.4844]) torch.Size([2])
MLPSolution input tensor([0.8594, 0.9219]) torch.Size([2])
...

So... where is the agent actually looking at the pixel input from the patches??!?!?

from brain-tokyo-workshop.

lerrytang commented on May 17, 2024

That's the interesting part :)
The agent has 2 parts: the self-attention visual module (

brain-tokyo-workshop/AttentionAgent/solutions/torch_solutions.py

Line 103 in a4d9e07

class SelfAttention(nn.Module):

) and the LSTM controller, MLPSolution is the latter (despite its name).

The former part gets all the RGB images and does the patch voting to get the K patches, this is the only place in the code where image info is looked at. After that we discard the non-important patches, extract features from these K selected ones, and feed the feature to the controller. About feature extraction, one can extract any feature from these K patches, but in our experiments, we simply used the locations and disregard the content info, so that's why you don't see any pixel processing code in MLPSolution.

Hope this helps.

from brain-tokyo-workshop.

maraoz commented on May 17, 2024

OMG that's amazing!! I guess I had missed that part on the article:

This is even more impressive then!! ok... I need to think about what all this means. Thanks again @lerrytang !

from brain-tokyo-workshop.

AttentionAgent: mysterious 2*top_k, maybe a mistake? about brain-tokyo-workshop HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent