Comments (18)
Hi @liuyijiang1994 ,
Thanks for the question. Actually they are basically the same. I read a paper few weeks ago. The paper discusses about the GNNs and Transformer model, which might be useful. They also have a nice summary of these works including different GNNs and different self-attention methods.
Contextualized Non-local Neural Networks for Sequence Learning
https://arxiv.org/abs/1811.08600#
from aggcn.
@tranvanhien Sorry, I can not, I realized I can never figure out these complicated experiment settings and then I decided to give up doing relation extraction task two months ago
from aggcn.
Yes, I know you made mask
by using adj
, but isn't this equivalent to src_mask
above?
from aggcn.
Hi,
For the first question, there is a description in the paper:
In practice, we treat the original adjacency matrix as an initialization so that the dependency information can be captured in the node representations for later attention calculation. The attention guided layer is included starting from the second block.
When you look at the codes, I created two types of graph convolutional layers from line 126
# gcn layer
for i in range(self.num_layers):
if i == 0:
self.layers.append(GraphConvLayer(opt, self.mem_dim, self.sublayer_first))
self.layers.append(GraphConvLayer(opt, self.mem_dim, self.sublayer_second))
else:
self.layers.append(MultiGraphConvLayer(opt, self.mem_dim, self.sublayer_first, self.heads))
self.layers.append(MultiGraphConvLayer(opt, self.mem_dim, self.sublayer_second, self.heads))
For the first block, we use the original adjacency matrix from the dependency tree. For the second block, we use the calculated adjacency matrix based on the representations (we assume they have already captured the dependency relations since they are obtained from the first block). You can refer to the code from line 170:
for i in range(len(self.layers)):
if i < 2:
outputs = self.layers[i](adj, outputs)
layer_list.append(outputs)
else:
attn_tensor = self.attn(outputs, outputs, src_mask)
attn_adj_list = [attn_adj.squeeze(1) for attn_adj in torch.split(attn_tensor, 1, dim=1)]
outputs = self.layers[i](attn_adj_list, outputs)
layer_list.append(outputs)
When i < 2, the adj represents the original dependency tree.
In Equation (2), that is a typo. Thank you so much for pointing it out! We do not need value here since we only use the query and key to calculate the correlation scores.
from aggcn.
I close this issue if you have any further questions feel free to reopen it.
from aggcn.
But why do you need to double your sub-layers? I mean your sublayer_first
and sublayer_second
?
from aggcn.
And there is not reopen button on this page
from aggcn.
For the sub-layer problem, u can refer to this TACL paper-DCGCN. Basically, the motivation behind this is to imitate convolution filters of different sizes (1x1, 3x3, etc.) in CNN.
The number of sub-layers in each block is different for TACRED. Here the first sub-layer is 2, the second is 4. You can refer to the train.py. These are hyper-parameters.
from aggcn.
Thank you~
from aggcn.
Hello, I still have some questions about the number of layers.
In your AGGCN paper, Section 3.2, you mentioned that the best setting for sentence-level relation extraction is M=2, L=5.
- Is L the sum of sublayer_first and sublayer_second?
- As I understand, M is the number of AGGCN blocks, and should be identical to the argument --num_layers in train.py. However, the argument is indicated as "Num of RNN layers" in the code. A bit confused. Is it a typo?
For the sub-layer problem, u can refer to this TACL paper-DCGCN. Basically, the motivation behind this is to imitate convolution filters of different sizes (1x1, 3x3, etc.) in CNN.
The number of sub-layers in each block is different for TACRED. Here the first sub-layer is 2, the second is 4. You can refer to the train.py. These are hyper-parameters.
from aggcn.
Hi @ardellelee,
For the questions you mentioned above:
- Is L the sum of sublayer_first and sublayer_second?
Yes, you are correct.
- As I understand, M is the number of AGGCN blocks, and should be identical to the argument --num_layers in train.py. However, the argument is indicated as "Num of RNN layers" in the code. A bit confused. Is it a typo?
Yes, it is a typo... Thank you so much for pointing it out! It should be the number of blocks which is M.
from aggcn.
Hello @Cartus , you mentioned that the best setting for sentence-level relation extraction is M=2, L=5. L the sum of sublayer_first and sublayer_second, with the first sub-layer is 2, the second is 4. But 2+4 = 5 ?
from aggcn.
Hi @speedcell4 , could you explain for me? :)
from aggcn.
Thank you.
from aggcn.
Hi @tranvanhien , that is a typo in the paper, I will fix it later. The code in this repo is the default setting for the TACRED dataset.
Thank you for pointing it out!
from aggcn.
I see. Thank you for your quick reply.
from aggcn.
Hi, Cartus @Cartus
I read your answers about the use of dependency tree above, you takes it as an initialization in your work, so it just works in the first GCN layer, right? If so, thence the second GCN layer, the adj matrix becomes the weight produced by self attention, and structure becomes to a full strongly connected directed graph, is there any difference between the calculation of self attention and the attention GCN in a full strongly connected directed graph?
Thanks for your help!
from aggcn.
@Cartus Thank you for your prompt reply, it is very helpful to me!
from aggcn.
Related Issues (20)
- 请问我怎么在semeval数据集上运行您的代码? HOT 5
- code discussion HOT 1
- some error about:"RuntimeError: cuda runtime error (100) : " HOT 4
- About "M identical blocks" HOT 8
- Why the number of classes on the SemEval2010-Task 8 is only 10? HOT 2
- some question about your paper HOT 2
- 请问代码中的mlp output layer 是用来干嘛的 HOT 1
- RuntimeError: cuda runtime error (38) : no CUDA-capable device is detected at ..\src\THC\THCGeneral.cpp:70 HOT 2
- About replacing data sets HOT 9
- I found an error: train.py: error: argument --id: expected one argument
- RuntimeError: cuda runtime error (100) : no CUDA-capable device is detected at /tmp/pip-req-build-ufslq_a9/aten/src/THC/THCGeneral.cpp:50 HOT 1
- how to get standford_head and stanford_deprel for cross-sentence data HOT 3
- F1=0 HOT 1
- the Final Score HOT 2
- how to preprocess the dataset HOT 5
- 关于AGGCN模型细节的问题 HOT 1
- Why did you configure the first densely connected layer with GraphConvLayer? HOT 1
- What is the function of tensor "denom" ? HOT 3
- environment error
- How to test the n-ary relation extraction part of the experiment?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from aggcn.