lx865712528 / emnlp2018-jmee Goto Github PK
View Code? Open in Web Editor NEWThis is the code for our EMNLP 2018 paper "Jointly Multiple Events Extraction via Attention-based Graph Information Aggregation"
This is the code for our EMNLP 2018 paper "Jointly Multiple Events Extraction via Attention-based Graph Information Aggregation"
Regarding the accuracy of calculating the trigger word, why is it correct to only calculate the B-Tags and i-Tags separately, and not to parse the BIO tags?
Looking forward for your response, thank you!
EMNLP2018-JMEE/enet/testing.py
Lines 8 to 23 in 494451d
您好,读过了您的paper,关于代码有两处问题想请教:
请问这一行代码是对应paper中的公式4吗?如果是,请问公式4中对N(v)节点的聚合在代码中是如何体现的?
在Self-Attention的forward函数中, 原文公式10的部分是如何体现的?
感谢您分享的代码,希望得到您的回复!
祝好!
Thanks your sharing!
Hello! I am a beginner. In preprocessing, I want to know how to transform a sentence into an example of the JSON format you gave. Can you give me a example in detail?
Thanks!
how to set model hyperparams?Can you offer a useful params list?
Thanks
https://github.com/lx865712528/JMEE/blob/494451d5852ba724d273ee6f97602c60a5517446/enet/testing.py#L72
In this line, if I add a line of code before
assert len(arugments) == len(argumenst_)
There will be assert error.
I believe this is because in arugments
there are golden arguments while only predict arugments in arguments_
, which length will change dynamicly during traning.
Since the paper only use 3 kinds of graph edges, should we change this line from
https://github.com/lx865712528/JMEE/blob/494451d5852ba724d273ee6f97602c60a5517446/ace-05-splits/sample.json#L5
to
along/dep=32/gov=-1
reverse/dep=-1/gov=-32
loop/dep=-1/gov=-1
loop/dep=32/gov=32
in train/dev/test.json file?
hello,
Did you use all the sentences including those with events and without events for train/test/dev , or just use the sentences with events for train/dev/test? Thank you!
Hi there,
Thanks for your kindly code releasing.
when I was reading the code, I found that the golden-entity-mentions
data is provided in advance. Also, in forward
function of EDModel
, batch_golden_entities
argument is required.
Does them mean when I wish to predict data without entity label, an extra NER system is needed to recognize entities before and then feed results into JMEE?
Thanks & Regards,
Mike
When building model, it seems that the loaded glove embedding is not used.
I think thats one of the reason that I can't reproduct the experiment result.
https://github.com/lx865712528/JMEE/blob/494451d5852ba724d273ee6f97602c60a5517446/enet/models/ee.py#L20
https://github.com/lx865712528/JMEE/blob/494451d5852ba724d273ee6f97602c60a5517446/enet/run/ee/runner.py#L55
def train(model, train_set, dev_set, test_set, optimizer_constructor, epochs, tester, parser, other_testsets): # build batch on cpu train_iter = BucketIterator(train_set, batch_size=parser.batch, train=False, shuffle=True, device=-1,sort_key=lambda x: len(x.POSTAGS))
but when i change the train to True,,, then i train it ,i found that most of train step the f1 and f, r are closed to 100%,,, so what's wrong with it?
论文代码既然开源,就要考虑运行方便,可以复现!
虽然ACE2005数据集license问题不开源,也不是你写的这么烂,这么坑的理由!
Hi, I'm trying to run your code and I just found that value of parameter "loss_alpha" was not mentioned in your paper.
Could you please give me a value to set this parameter?
Thanks!
can you share data preprocessing code ?
Is there no pre-processing code?
Do I need to convert the data in ace corpus into JSON file by myself?
Hi there,
I found that when loading corpus, JMEE use keep_events=1
option to filter out those sentences without containing event, this dramatically decrease the size of training set.
Is this step necessary? Why not keep all the event of training set?
# sentence in train set mush contains at least 1 event
#
train_set = ACE2005Dataset(self.a.train,
fields={"words": ("WORDS", WordsField),
"pos-tags": ("POSTAGS", PosTagsField),
"golden-entity-mentions": ("ENTITYLABELS", EntityLabelsField),
"stanford-colcc": ("ADJM", AdjMatrixField),
"golden-event-mentions": ("LABEL", LabelField),
"all-events": ("EVENT", EventsField),
"all-entities": ("ENTITIES", EntitiesField)},
keep_events=1)
# sentence in dev set can have no event
#
dev_set = ACE2005Dataset(self.a.dev,
fields={"words": ("WORDS", WordsField),
"pos-tags": ("POSTAGS", PosTagsField),
"golden-entity-mentions": ("ENTITYLABELS", EntityLabelsField),
"stanford-colcc": ("ADJM", AdjMatrixField),
"golden-event-mentions": ("LABEL", LabelField),
"all-events": ("EVENT", EventsField),
"all-entities": ("ENTITIES", EntitiesField)},
keep_events=0)
# sentence in test set can have no event
#
test_set = ACE2005Dataset(self.a.test,
fields={"words": ("WORDS", WordsField),
"pos-tags": ("POSTAGS", PosTagsField),
"golden-entity-mentions": ("ENTITYLABELS", EntityLabelsField),
"stanford-colcc": ("ADJM", AdjMatrixField),
"golden-event-mentions": ("LABEL", LabelField),
"all-events": ("EVENT", EventsField),
"all-entities": ("ENTITIES", EntitiesField)},
keep_events=0)
Hi, I'm try to reproduce your model. But my result is low. I have checked these labels that my model predicted and I found a lot of labels( that was predicted to Event sub-type difference to tag "O") but was tagged to 'O' tag in the dataset. Therefore, my precision score is downgrade( I only get precison=62%) . Did you encountered with this issue. If so, how did your tackled with it. You fixed wrong label in test, dev sets or keep the original data to evaluate these score?
Hope to see your answer soon! Thank you so much!
Are you run this code?how are you set this model's params?
Hi,
I am looking for a EE algorithm that can be adopted in my program. I read your paper and think it is a good solution. In the code, I noticed that at least there are several classes like BottledXavierLinear are written as "pass", but they are actually used in the model. I guess the code is not complete? If so, could you please update the complete version and your params? I would appreciate it if you can help me.
Thank you very much!
In my observations, the entire gcn does not seem to complete its functions, including the BottledOrthogonalLinear and other classes are not implemented at all. Is there any problem?
EMNLP2018-JMEE/enet/testing.py
Line 72 in 11b21e5
Is this function correct? it seems to me that simply comparing some pairs (even after sorting both lists) will not give real ae performance.
Is the following workaround okay?
for item in arguments:
if item in arguments_:
ct += 1
when I set the batch_size = 32, after the Epoch 1 training, the validiation will CUDA Error: out of memory. My device is T100, GPU[16G]
Aslo, when set batch_size = 16, it will be OK but two slow. Is there some adjustment in your code?
Hello,
I am trying to reproduce the same results using the same parameters, but when I run your code for some time, the loss keeps going down just fine and the accuracy increases but at the same time conll f1 score, precision and recall all drop to zero.
It seems that the code overfits on the dataset since it returns all 'O' for predicted labels. I know that the dataset is licensed that's why it cannot be included but it does have a lot of character offset issues which require some treatment which could be different from one person to another. Could you at least include the code used for preprocessing it so that your code becomes more end to end and to be sure that the input is consistent?
Thanks,
This is the best result I'v got:
epoch:106|loss: 9.72400|ed_p: 0.75342|ed_r: 0.74728|ed_f1: 0.75034|ae_p: 0.37625|ae_r: 0.31840|ae_f1: 0.34492|lr:0.2621440000
the ed_f1 means trigger identification f1 or trigger classification f1?
您好,我用stanfordnlp处理了数据,遇到json.decoder.JSONDecodeError问题,很多人说是json文件的问题,但是我用了sample.json也遇到同样问题.
希望得到您的解答,谢谢!
Namespace(batch=128, dev='ace-05-splits/dev.json', device='cpu', earlystop=999999, epochs=9223372036854775807, finetune=None, hps=None, l2decay=0, lr=0.001, maxnorm=3, optimizer='adam', out='out', restart=999999, seed=42, test='ace-05-splits/test.json', train='ace-05-splits/sample.json', webd='word2vec.txt')
Running on cpu
loading corpus from ace-05-splits/sample.json
Traceback (most recent call last):
File "enet/run/ee/runner.py", line 241, in
EERunner().run()
File "enet/run/ee/runner.py", line 99, in run
keep_events=1)
File "/home/lhj/coding/EMNLP2018-JMEE-master/enet/corpus/Data.py", line 191, in init
super(ACE2005Dataset, self).init(path, fields, **kwargs)
File "/home/lhj/coding/EMNLP2018-JMEE-master/enet/corpus/Corpus.py", line 20, in init
examples = self.parse_example(path, fields)
File "/home/lhj/coding/EMNLP2018-JMEE-master/enet/corpus/Data.py", line 202, in parse_example
jl = json.loads(line,encoding="utf-8")
File "/home/lhj/anaconda3/envs/jmee/lib/python3.5/json/init.py", line 319, in loads
return _default_decoder.decode(s)
File "/home/lhj/anaconda3/envs/jmee/lib/python3.5/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/home/lhj/anaconda3/envs/jmee/lib/python3.5/json/decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 2 (char 1)
The command line should be "pip install -r requirements.txt", not "pip -r requirements.txt".
有人遇到这个问题吗,实在不知道怎么改了
Are you run this code?how are you set this model's params?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.