lx865712528 / emnlp2018-jmee Goto Github PK

This is the code for our EMNLP 2018 paper "Jointly Multiple Events Extraction via Attention-based Graph Information Aggregation"

Python 85.82% Perl 13.48% Shell 0.70%

emnlp2018-jmee's People

Stargazers

Watchers

emnlp2018-jmee's Issues

questions about evaluation

Regarding the accuracy of calculating the trigger word, why is it correct to only calculate the B-Tags and i-Tags separately, and not to parse the BIO tags?
Looking forward for your response, thank you！

EMNLP2018-JMEE/enet/testing.py

Lines 8 to 23 in 494451d

 def calculate_report(self, y, y_, transform=True): 

 ''' 

  calculating F1, P, R 

  :param y: golden label, list 

  :param y_: model output, list 

  :return: 

  ''' 

 if transform: 

 for i in range(len(y)): 

 for j in range(len(y[i])): 

 y[i][j] = self.voc_i2s[y[i][j]] 

 for i in range(len(y_)): 

 for j in range(len(y_[i])): 

 y_[i][j] = self.voc_i2s[y_[i][j]] 

 return precision_score(y, y_), recall_score(y, y_), f1_score(y, y_)

paper与代码问题请教

您好，读过了您的paper,关于代码有两处问题想请教：

请问这一行代码是对应paper中的公式4吗?如果是，请问公式4中对N(v)节点的聚合在代码中是如何体现的?
在Self-Attention的forward函数中, 原文公式10的部分是如何体现的?

感谢您分享的代码，希望得到您的回复！

祝好！

Can you give a example?

Thanks your sharing!
Hello! I am a beginner. In preprocessing, I want to know how to transform a sentence into an example of the JSON format you gave. Can you give me a example in detail?
Thanks!

model hyperparams

how to set model hyperparams？Can you offer a useful params list？
Thanks

Evaluate function not right

https://github.com/lx865712528/JMEE/blob/494451d5852ba724d273ee6f97602c60a5517446/enet/testing.py#L72
In this line, if I add a line of code before
assert len(arugments) == len(argumenst_)
There will be assert error.
I believe this is because in arugments there are golden arguments while only predict arugments in arguments_, which length will change dynamicly during traning.

Problem about dependency arc in json file.

Since the paper only use 3 kinds of graph edges, should we change this line from
https://github.com/lx865712528/JMEE/blob/494451d5852ba724d273ee6f97602c60a5517446/ace-05-splits/sample.json#L5
to
along/dep=32/gov=-1
reverse/dep=-1/gov=-32
loop/dep=-1/gov=-1
loop/dep=32/gov=32
in train/dev/test.json file?

questions about data

hello,
Did you use all the sentences including those with events and without events for train/test/dev , or just use the sentences with events for train/dev/test? Thank you!

Missing entity recognition component for prediction

Hi there,

Thanks for your kindly code releasing.

when I was reading the code, I found that the golden-entity-mentions data is provided in advance. Also, in forward function of EDModel, batch_golden_entities argument is required.

Does them mean when I wish to predict data without entity label, an extra NER system is needed to recognize entities before and then feed results into JMEE?

Thanks & Regards,
Mike

embeddingMatrix is never passed when building model

When building model, it seems that the loaded glove embedding is not used.
I think thats one of the reason that I can't reproduct the experiment result.
https://github.com/lx865712528/JMEE/blob/494451d5852ba724d273ee6f97602c60a5517446/enet/models/ee.py#L20
https://github.com/lx865712528/JMEE/blob/494451d5852ba724d273ee6f97602c60a5517446/enet/run/ee/runner.py#L55

1

def train(model, train_set, dev_set, test_set, optimizer_constructor, epochs, tester, parser, other_testsets): # build batch on cpu train_iter = BucketIterator(train_set, batch_size=parser.batch, train=False, shuffle=True, device=-1,sort_key=lambda x: len(x.POSTAGS))

but when i change the train to True,,, then i train it ,i found that most of train step the f1 and f, r are closed to 100%,,, so what's wrong with it?

TypeError: expected str, bytes or os.PathLike object, not NoneType

这代码写的是真坑！！！

论文代码既然开源，就要考虑运行方便，可以复现！
虽然ACE2005数据集license问题不开源，也不是你写的这么烂，这么坑的理由！

How to set parameter "loss_alpha"

Hi, I'm trying to run your code and I just found that value of parameter "loss_alpha" was not mentioned in your paper.
Could you please give me a value to set this parameter?
Thanks!

about data

can you share data preprocessing code ?

about preprocessing code

Is there no pre-processing code?
Do I need to convert the data in ace corpus into JSON file by myself?

Why GCN layer use 2*hyps["lstm_dim"], instead of hyps["gcn_dim"]

https://github.com/lx865712528/JMEE/blob/494451d5852ba724d273ee6f97602c60a5517446/enet/models/ee.py#L62

Train on sentences without containing event

Hi there,

I found that when loading corpus, JMEE use keep_events=1 option to filter out those sentences without containing event, this dramatically decrease the size of training set.

Is this step necessary? Why not keep all the event of training set?

# sentence in train set mush contains at least 1 event
#
train_set = ACE2005Dataset(self.a.train,
                           fields={"words": ("WORDS", WordsField),
                                   "pos-tags": ("POSTAGS", PosTagsField),
                                   "golden-entity-mentions": ("ENTITYLABELS", EntityLabelsField),
                                   "stanford-colcc": ("ADJM", AdjMatrixField),
                                   "golden-event-mentions": ("LABEL", LabelField),
                                   "all-events": ("EVENT", EventsField),
                                   "all-entities": ("ENTITIES", EntitiesField)},
                           keep_events=1)

# sentence in dev set can have no event
#
dev_set = ACE2005Dataset(self.a.dev,
                         fields={"words": ("WORDS", WordsField),
                                 "pos-tags": ("POSTAGS", PosTagsField),
                                 "golden-entity-mentions": ("ENTITYLABELS", EntityLabelsField),
                                 "stanford-colcc": ("ADJM", AdjMatrixField),
                                 "golden-event-mentions": ("LABEL", LabelField),
                                 "all-events": ("EVENT", EventsField),
                                 "all-entities": ("ENTITIES", EntitiesField)},
                         keep_events=0)

# sentence in test set can have no event
#
test_set = ACE2005Dataset(self.a.test,
                          fields={"words": ("WORDS", WordsField),
                                  "pos-tags": ("POSTAGS", PosTagsField),
                                  "golden-entity-mentions": ("ENTITYLABELS", EntityLabelsField),
                                  "stanford-colcc": ("ADJM", AdjMatrixField),
                                  "golden-event-mentions": ("LABEL", LabelField),
                                  "all-events": ("EVENT", EventsField),
                                  "all-entities": ("ENTITIES", EntitiesField)},
                          keep_events=0)

mislabeled data

Hi, I'm try to reproduce your model. But my result is low. I have checked these labels that my model predicted and I found a lot of labels( that was predicted to Event sub-type difference to tag "O") but was tagged to 'O' tag in the dataset. Therefore, my precision score is downgrade( I only get precison=62%) . Did you encountered with this issue. If so, how did your tackled with it. You fixed wrong label in test, dev sets or keep the original data to evaluate these score?
Hope to see your answer soon! Thank you so much!

code run？

Are you run this code？how are you set this model's params？

Is the code complete?

Hi,
I am looking for a EE algorithm that can be adopted in my program. I read your paper and think it is a good solution. In the code, I noticed that at least there are several classes like BottledXavierLinear are written as "pass", but they are actually used in the model. I guess the code is not complete? If so, could you please update the complete version and your params? I would appreciate it if you can help me.
Thank you very much!

Is the code in the gcn part incomplete?

In my observations, the entire gcn does not seem to complete its functions, including the BottledOrthogonalLinear and other classes are not implemented at all. Is there any problem？

[closed]

EMNLP2018-JMEE/enet/testing.py

Line 72 in 11b21e5

for item, item_ in zip(arguments, arguments_):

Is this function correct? it seems to me that simply comparing some pairs (even after sorting both lists) will not give real ae performance.

Is the following workaround okay?

                for item in arguments:
                    if item in arguments_:
                        ct += 1

CUDA ERROR: out of memory

when I set the batch_size = 32, after the Epoch 1 training, the validiation will CUDA Error: out of memory. My device is T100, GPU[16G]
Aslo, when set batch_size = 16, it will be OK but two slow. Is there some adjustment in your code?

F1 score dropping to zero

Hello,

I am trying to reproduce the same results using the same parameters, but when I run your code for some time, the loss keeps going down just fine and the accuracy increases but at the same time conll f1 score, precision and recall all drop to zero.

It seems that the code overfits on the dataset since it returns all 'O' for predicted labels. I know that the dataset is licensed that's why it cannot be included but it does have a lot of character offset issues which require some treatment which could be different from one person to another. Could you at least include the code used for preprocessing it so that your code becomes more end to end and to be sure that the input is consistent?

Thanks,

The printed f1 is identification or classification

This is the best result I'v got:
epoch:106|loss: 9.72400|ed_p: 0.75342|ed_r: 0.74728|ed_f1: 0.75034|ae_p: 0.37625|ae_r: 0.31840|ae_f1: 0.34492|lr:0.2621440000
the ed_f1 means trigger identification f1 or trigger classification f1?

json.decoder.JSONDecodeError: Expecting value: line 1 column 2 (char 1)

您好,我用stanfordnlp处理了数据,遇到json.decoder.JSONDecodeError问题,很多人说是json文件的问题,但是我用了sample.json也遇到同样问题.
希望得到您的解答,谢谢!
Namespace(batch=128, dev='ace-05-splits/dev.json', device='cpu', earlystop=999999, epochs=9223372036854775807, finetune=None, hps=None, l2decay=0, lr=0.001, maxnorm=3, optimizer='adam', out='out', restart=999999, seed=42, test='ace-05-splits/test.json', train='ace-05-splits/sample.json', webd='word2vec.txt')
Running on cpu
loading corpus from ace-05-splits/sample.json
Traceback (most recent call last):
File "enet/run/ee/runner.py", line 241, in
EERunner().run()
File "enet/run/ee/runner.py", line 99, in run
keep_events=1)
File "/home/lhj/coding/EMNLP2018-JMEE-master/enet/corpus/Data.py", line 191, in init
super(ACE2005Dataset, self).init(path, fields, **kwargs)
File "/home/lhj/coding/EMNLP2018-JMEE-master/enet/corpus/Corpus.py", line 20, in init
examples = self.parse_example(path, fields)
File "/home/lhj/coding/EMNLP2018-JMEE-master/enet/corpus/Data.py", line 202, in parse_example
jl = json.loads(line,encoding="utf-8")
File "/home/lhj/anaconda3/envs/jmee/lib/python3.5/json/init.py", line 319, in loads
return _default_decoder.decode(s)
File "/home/lhj/anaconda3/envs/jmee/lib/python3.5/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/home/lhj/anaconda3/envs/jmee/lib/python3.5/json/decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 2 (char 1)

	def calculate_report(self, y, y_, transform=True):
	'''
	calculating F1, P, R

	:param y: golden label, list
	:param y_: model output, list
	:return:
	'''
	if transform:
	for i in range(len(y)):
	for j in range(len(y[i])):
	y[i][j] = self.voc_i2s[y[i][j]]
	for i in range(len(y_)):
	for j in range(len(y_[i])):
	y_[i][j] = self.voc_i2s[y_[i][j]]
	return precision_score(y, y_), recall_score(y, y_), f1_score(y, y_)

lx865712528 / emnlp2018-jmee Goto Github PK

emnlp2018-jmee's People

Stargazers

Watchers

Forkers

emnlp2018-jmee's Issues

Recommend Projects

Recommend Topics

Recommend Org