vered1986 / okr Goto Github PK

View Code? Open in Web Editor NEW

39.0 6.0 14.0 1.67 MB

OKR: A Consolidated Open Knowledge Representation for Multiple Texts

License: Other

Python 100.00%

nlp knowledge-representation

okr's Introduction

OKR: A Consolidated Open Knowledge Representation for Multiple Texts

This is the code used in the paper:

"A Consolidated Open Knowledge Representation for Multiple Texts"
Rachel Wities, Vered Shwartz, Gabriel Stanovsky, Meni Adler, Ori Shapira, Shyam Upadhyay, Dan Roth, Eugenio Martinez Camara, Iryna Gurevych and Ido Dagan. LSDSem 2017. link (TBD).

The dataset developed for the paper can be found here (TBD).

Prerequisites:

Python 2.7
numpy
bsddb
spacy
stop-words

Quick Start:

The repository contains the following directories:

src - the source files - used to load the OKR graph (common), compute inter-annotator agreement (agreement), and automatically construct the OKR object (baseline_system).
resources - used by the baseline system.
data - the annotation files used to compute the inter-annotator agreement (agreement) and the development and test set used in the baseline system (baseline).

Running the baseline system:

From src/baseline_system: python compute_baseline_subtasks.py ../../data/baseline/dev ../../data/baseline/test

In the entity mentions components, the F1 score we originaly reoprted was 0.58. We managed to raise it to 0.61 by changing spacy tokenization. If you want the original code that returns the original 0.58 score, set GET_ORIGINAL_SCORE to True in line 22 in eval_entity_mention.py.

The entailment component requires resources. The entity entailment resource files are found in the resources directory. The predicate entailment file is much larger, and we therefore provide the script to build it from the original resource (reverb_local_clsf_all.txt from here).

Detailed description of the OKR object:

TBD

okr's People

Contributors

Stargazers

Watchers

Forkers

biu-nlp hitzkrieg rachelvov orishapira gabrielstanovsky kleinay shanybar ypuzikov saraswat colinsongf afcarl surefirelin chendebiyelunwen miaohf

okr's Issues

Fix elements interchange

Predicates are sometimes wrongly interchanged.

FEMA Urges East Coast Residents to Prepare for Hurricane Sandy
FEMA Urges East Coast to aim for Hurricane Sandy

The output from PropSWrapper doesn't introduce this error.
@kleinay, can you please check what's your output for this sentence?

{'Entities': {'A1': ('Prepare', (6,)),
              'A2': ('Hurricane Sandy', (8, 9)),
              'A3': ('FEMA', (0,)),
              'A4': ('East Coast Residents', (2, 3, 4))},
 'Predicates': {'P1': {'Arguments': ['A1', 'A2', 'A3', 'A4'],
                       'Bare predicate': ('Urges to for', (1, 5, 7)),
                       'Head': {'Lemma': 'Urges',
                                'POS': 'VBZ',
                                'Surface': ('Urges', [1])},
                       'Template': '{A3} Urges {A4} to {A1} for {A2}'}},
 'Sentence': 'FEMA Urges East Coast Residents to Prepare for Hurricane Sandy'}

@OriShapira

props_wrapper: symbol A1 in template but not in list of entities; symbol P at entity name

for the following sentence: "'Hurricane Sandy is said to be the strongest hurricane that will hit the East Coast in America .", props_wrapper outputs the following:

{'Entities': {'A10': ('Sandy', (1,)),
  'A2': ('Hurricane', (0,)),
  'A3': ('hurricane', (8,)),
  'A4': ('East', (13,)),
  'A5': ('strongest', (7,)),
  'A6': ('that', (9,)),
  'A7': ('will', (10,)),
  'A8': ('Coast', (14,)),
  'A9': ('America', (16,)),
  'P2': ('hit', (11,))},
 'Predicates': {'P1': {'Arguments': ['A1', 'A2'],
   'Bare predicate': ('is said', (2, 3)),
   'Head': {'Lemma': u'say', 'POS': 'VBN', 'Surface': ('said', [3])},
   'Template': '{A2} is said {A1}'},
  'P10': {'Arguments': ('A10', 'A2'),
   'Bare predicate': ('IMPLICIT', (-1,)),
   'Head': {'Lemma': 'IMPLICIT', 'POS': 'IMPLICIT', 'Surface': 'IMPLICIT'},
   'Template': 'A10 A2'},
  'P11': {'Arguments': ('A8', 'A4'),
   'Bare predicate': ('IMPLICIT', (-1,)),
   'Head': {'Lemma': 'IMPLICIT', 'POS': 'IMPLICIT', 'Surface': 'IMPLICIT'},
   'Template': 'A8 A4'},
  'P2': {'Arguments': ['A3', 'A4'],
   'Bare predicate': ('will hit', (10, 11)),
   'Head': {'Lemma': 'hit', 'POS': 'VB', 'Surface': ('hit', [11])},
   'Template': '{A3} will hit {A4}'},
  'P3': {'Arguments': ('A5', 'A3'),
   'Bare predicate': ('IMPLICIT', (-1,)),
   'Head': {'Lemma': 'IMPLICIT', 'POS': 'IMPLICIT', 'Surface': 'IMPLICIT'},
   'Template': 'A5 A3'},
  'P4': {'Arguments': ('A6', 'A3'),
   'Bare predicate': ('IMPLICIT', (-1,)),
   'Head': {'Lemma': 'IMPLICIT', 'POS': 'IMPLICIT', 'Surface': 'IMPLICIT'},
   'Template': 'A6 A3'},
  'P5': {'Arguments': ('A7', 'A3'),
   'Bare predicate': ('IMPLICIT', (-1,)),
   'Head': {'Lemma': 'IMPLICIT', 'POS': 'IMPLICIT', 'Surface': 'IMPLICIT'},
   'Template': 'A7 A3'},
  'P6': {'Arguments': ('P2', 'A3'),
   'Bare predicate': ('IMPLICIT', (-1,)),
   'Head': {'Lemma': 'IMPLICIT', 'POS': 'IMPLICIT', 'Surface': 'IMPLICIT'},
   'Template': 'P2 A3'},
  'P7': {'Arguments': ('A4', 'A3'),
   'Bare predicate': ('IMPLICIT', (-1,)),
   'Head': {'Lemma': 'IMPLICIT', 'POS': 'IMPLICIT', 'Surface': 'IMPLICIT'},
   'Template': 'A4 A3'},
  'P8': {'Arguments': ('A8', 'A3'),
   'Bare predicate': ('IMPLICIT', (-1,)),
   'Head': {'Lemma': 'IMPLICIT', 'POS': 'IMPLICIT', 'Surface': 'IMPLICIT'},
   'Template': 'A8 A3'},
  'P9': {'Arguments': ['A4', 'A9'],
   'Bare predicate': ('in', (15,)),
   'Head': {'Lemma': 'in', 'POS': 'IN', 'Surface': 'in'},
   'Template': 'A4 in A9'}},
 'Sentence': 'Hurricane Sandy is said to be the strongest hurricane that will hit the East Coast in America .'}

Structural Problems:

in P1's template, there is a symbol A1 - but not the entities.
there is a entity named P2, and also a proposition named P2.
Algorithmic Problems:
"that" and "will" are considered as entities (instead of functional words)

PropS Wrapper missing propositions

Collected by @kleinay.
There is a bug in the props-wrapper. The json props-wrapper returned:

{'Entities': {'A1': ("Boy Scouts ' ` perversion ' files set to be released",
   (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10))},
 'Predicates': {'P1': {'Arguments': ['A1'],
   'Bare predicate': ('travel', (11,)),
   'Head': {'Lemma': 'travel', 'POS': 'VBP', 'Surface': ('travel', [11])},
   'Template': '{A1} travel'},
  'P2': {'Arguments': ['P3'],
   'Bare predicate': ('set', (7,)),
   'Head': {'Lemma': 'set', 'POS': 'VBN', 'Surface': ('set', [7])},
   'Template': 'set {P3}'}},
 'Sentence': "Boy Scouts ' ` perversion ' files set to be released travel"}

Note P3 is an argument in P2 but no proposition P3 is being declared.

determiner in an entity mention

in event 1 (car bomb) sentence 14:
"A car bomb exploded in Beirut (Achrafieh). People on the ground say it's a massive one."
The pipeline marks as one entity mention the indices [17,19] ("a", "one").

not taking the head of noun compounds

in the single sentence parsing of props_wrapper, I think there is a bug; instead of taking the head of a noun compound as the representative of the (original) long multi-token entity, it takes the first token.

see for example this props_wrapper output:
{'Entities': {'A1': ('Humen', (0,)),
'A2': ('Rakhine', (6,)),
'A3': ('Rights', (1,)),
'A4': ('Watch', (2,)),
'A5': ('satelite', (3,)),
'A6': ('images', (4,)),
'A7': ('destruction', (7,))},
'Predicates': {'P1': {'Arguments': ['A1', 'A2'],
'Bare predicate': ('show', (5,)),
'Head': {'Lemma': 'show', 'POS': 'VBP', 'Surface': ('show', [5])},
'Template': '{A1} show {A2}'},
'P2': {'Arguments': ('A1', 'A3'),
'Bare predicate': ('IMPLICIT', (-1,)),
'Head': {'Lemma': 'IMPLICIT', 'POS': 'IMPLICIT', 'Surface': 'IMPLICIT'},
'Template': '{A1} {A3}'},
'P3': {'Arguments': ('A1', 'A4'),
'Bare predicate': ('IMPLICIT', (-1,)),
'Head': {'Lemma': 'IMPLICIT', 'POS': 'IMPLICIT', 'Surface': 'IMPLICIT'},
'Template': '{A1} {A4}'},
'P4': {'Arguments': ('A1', 'A5'),
'Bare predicate': ('IMPLICIT', (-1,)),
'Head': {'Lemma': 'IMPLICIT', 'POS': 'IMPLICIT', 'Surface': 'IMPLICIT'},
'Template': '{A1} {A5}'},
'P5': {'Arguments': ('A1', 'A6'),
'Bare predicate': ('IMPLICIT', (-1,)),
'Head': {'Lemma': 'IMPLICIT', 'POS': 'IMPLICIT', 'Surface': 'IMPLICIT'},
'Template': '{A1} {A6}'},
'P6': {'Arguments': ('A2', 'A7'),
'Bare predicate': ('IMPLICIT', (-1,)),
'Head': {'Lemma': 'IMPLICIT', 'POS': 'IMPLICIT', 'Surface': 'IMPLICIT'},
'Template': '{A2} {A7}'}},
'Sentence': 'Humen Rights Watch satellite images show Rakhine destruction'}

the dependency tree states that "images" is the head of "Humen Rights Watch satellite images", but all propositions (P1 and all the implicits) take "Humen" as the argument.

Multiple listings of an argument in the output JSON

In the partial boy_scouts event (json attached), many arguments are listed more than once in a proposition.
For example, in proposition P3, argument a.4 is listed 4 times in the arguments section.

Boy_scouts.json.txt

Split entities to nouns?

@kleinay
Shany mentioned that we get lower scores on entity recognition - probably since we group noun compounds.
I think we should:

Get her evaluation code somewhere on github (probably first to her fork and then PR to Vered's). BTW, I don't think she's a member of the OKR project.
Run the evaluations to get the lower numbers on entity identification
Decide what kind of relation we want between head of noun compounds and their dependent words, see this dependency parse, for example
Re-run the evaluations and see if the numbers improve.

Weird parsing by PropS wrapper

I encountered this example - its about tweet 258907583165911040 from boy_scouts:

{'Entities': {'A1': ('Released', (10,)),
'A2': ('Perversion Files Set', (4, 6, 7)),
'A3': ('Released', (22,)),
'A4': ('Perversion Files Set', (16, 18, 19))},
'Predicates': {'P1': {'Arguments': ['A1', 'A2'],
'Bare predicate': ('To Be', (8, 9)),
'Head': {'Lemma': 'Be', 'POS': 'VB', 'Surface': ('Be', [9])},
'Template': '{A2} To Be {A1}'},
'P2': {'Arguments': ['A3', 'A4', 'A2'],
'Bare predicate': ('To Be', (20, 21)),
'Head': {'Lemma': 'Be', 'POS': 'VB', 'Surface': ('Be', [21])},
'Template': '{A2} {A4} To Be {A3}'}},
'Sentence': "Boy Scouts ' Perversion ' Files Set To Be Released : Boy Scouts ' Perversion ' Files Set To Be Released"}

Note P2 template - Why is A2 also there?

Problems arising from errors in dependency parsing

Hurricane Sandy they call it, already causing havoc across the caribbean.
{'Entities': {'A1': ('it', (4,)),
'A2': ('Hurricane Sandy', (0, 1)),
'A3': ('already', (6,)),
'A4': ('caribbean', (11,)),
'A5': ('havoc', (8,))},
'Predicates': {'P1': {'Arguments': ['A1', 'A2', 'P2'],
'Bare predicate': ('call', (3,)),
'Head': {'Lemma': 'call',
'POS': 'VBP',
'Surface': ('call', [3])},
'Template': '{A2} call {A1} {P2}'},
'P2': {'Arguments': ['A3', 'A4', 'A5'],
'Bare predicate': ('causing across', (7, 9)),
'Head': {'Lemma': '',
'POS': 'VBG',
'Surface': ('causing', [7])},
'Template': '{A3} causing {A5} across {A4}'}},
'Sentence': 'Hurricane Sandy they call it , already causing havoc across the caribbean .'}
Hurricane Sandy slogs toward U.S., 41 killed in Caribbean
{'Entities': {'A1': ('U.S.', (4,)),
'A2': ('41', (6,)),
'A3': ('Hurricane Sandy', (0, 1)),
'A4': ('Caribbean', (9,))},
'Predicates': {'P1': {'Arguments': ['A1', 'A2', 'A3'],
'Bare predicate': ('slogs toward', (2, 3)),
'Head': {'Lemma': u'slog',
'POS': 'VBZ',
'Surface': ('slogs', [2])},
'Template': '{A3} slogs toward {A1} {A2}'},
'P2': {'Arguments': ['A4'],
'Bare predicate': ('killed in', (7, 8)),
'Head': {'Lemma': u'kill',
'POS': 'VBN',
'Surface': ('killed', [7])},
'Template': 'killed in {A4}'}},
'Sentence': 'Hurricane Sandy slogs toward U.S. , 41 killed in Caribbean'}
Hurricane Sandy gone, Caribbean mourns 58 dead
Wrong sentence splitting:
- Blessings to everyone on the East Coast regarding hurricane Sandy. Be safe.
- Hurricane sandy better be nice.. East coast stay safe..
- Good luck with the storm to all you fuzzies on the east coast of the United States! furricane

Collected by @OriShapira

Nominalizations? sentences without any entities or predicates in the pipeline output

in event 1 (car bomb):
sentence 13:
'Car Bomb In Christian Area of Beirut'
sentence 15:
'Aftermath of Beirut blast - Photos: Car bomb in Beirut'
sentence 28:
'Lebanon PM Links Car Bomb to Crisis in Syria',
sentence 29:
'Car Bomb in Beirut Kills 8'
sentence 30:
"'8 dead, 78 wounded' in Beirut car bombing"
sentence 31:
'Bomb Blast Rocks Beirut, Killing at Least Eight'
sentence 36:
'Carnage in Beirut: Photos: Car bomb in Beirut'
sentence 37:
'News/ Car bomb rips through Beirut'

predict_predicate_mention method forgets to change the prop_mentions_by_key

The predict_predicate_mention method in the file eval_predicate_mention.py takes a gold graph and returns the same graph with predicted predicate mentions instead of gold ones.
However it forgets to change the prop_mentions_by_key dictionary for the newly created graph (prop_mentions_by_key remains the old value). Also it would be helpful if the method also annotates the argument_mentions found from prop_ex.

Thanks :)

https://github.com/vered1986/OKR/blob/master/src/baseline_system/eval_predicate_mention.py#L67

prepositional phrase of nouns is not captured as predicates

According to OKR V1 spec, a prepositional phrase that modifies a noun should be considered as (explicit) proposition, where the preposition is the predicate. E.g., for "A mail from John lies on the desk.", we should have a proposition:
P1: [A1] from [A2]
which in turn is the argument of the main verb:
P2: [P1] lies on [A3].

In the props_wrapper code we can see lines that should deal with these cases (lines 274-300), but in the test, the behaviour is different. For example;

{'Entities': {'A1': ('mail', (1,)),
'A2': ('desk', (7,)),
'A3': ('John', (3,))},
'Predicates': {'P1': {'Arguments': ['A1', 'A2'],
'Bare predicate': ('lies on', (4, 5)),
'Head': {'Lemma': u'lie',
'POS': 'VBZ',
'Surface': ('lies', [4])},
'Template': '{A1} lies on {A2}'},
'P2': {'Arguments': ('A1', 'A3'),
'Bare predicate': ('IMPLICIT', (-1,)),
'Head': {'Lemma': 'IMPLICIT',
'POS': 'IMPLICIT',
'Surface': 'IMPLICIT'},
'Template': '{A1} {A3}'}},
'Sentence': 'a mail from John lies on the desk .'}

Python import stop_words fails

Hi, everyone!

Thanks for sharing the repository :)

Is it possible that some files are missing or excluded by mistake?
I am running the baseline as specified in the README file:

python compute_baseline_subtasks.py ../../data/baseline/dev ../../data/baseline/test

And I am getting this:

Traceback (most recent call last):
File "compute_baseline_subtasks.py", line 19, in
from okr import *
File "../common/okr.py", line 11, in
from constants import *
File "../common/constants.py", line 1, in
import stop_words
ImportError: No module named stop_words

Repeating argument in a template

In the partial boy_scouts event (json attached), the proposition P3 has a template "{a.1} {a.3} {a.1} to be released".
An argument ID shouldn't repeat in a template (since each has its own role) even if their assigned concept are the same.

Boy_scouts.json.txt

template of implicit proposition is misformed

the arguments in a template should be surrounded by braces {}. for implicit propositions, the props_wrapper is creating template without enclosing braces.

Important argument disregarded after implicit propositions added

We came across a case where an important implicit proposition was not stated even though PropS has it:

{'Entities': {'A1': ('Hurricane', (0,)),
  'A2': ('Down', (3,)),
  'A3': ('Sandy', (1,)),
  'A4': ('on', (4,)),
  'A5': ('East', (5,))},
 'Predicates': {'P1': {'Arguments': ['A1', 'A2'],
   'Bare predicate': ('Bears', (2,)),
   'Head': {'Lemma': 'Bears', 'POS': 'VBZ', 'Surface': ('Bears', [2])},
   'Template': '{A1} Bears {A2}'},
  'P2': {'Arguments': ('A3', 'A1'),
   'Bare predicate': ('IMPLICIT', (-1,)),
   'Head': {'Lemma': 'IMPLICIT', 'POS': 'IMPLICIT', 'Surface': 'IMPLICIT'},
   'Template': '{A3} {A1}'},
  'P3': {'Arguments': ('A4', 'A2'),
   'Bare predicate': ('IMPLICIT', (-1,)),
   'Head': {'Lemma': 'IMPLICIT', 'POS': 'IMPLICIT', 'Surface': 'IMPLICIT'},
   'Template': '{A4} {A2}'},
  'P4': {'Arguments': ('A5', 'A2'),
   'Bare predicate': ('IMPLICIT', (-1,)),
   'Head': {'Lemma': 'IMPLICIT', 'POS': 'IMPLICIT', 'Surface': 'IMPLICIT'},
   'Template': '{A5} {A2}'}},
 'Sentence': 'Hurricane Sandy Bears Down on East Coast'}

Here "East Coast" is a very important noun phrase that is disregarded and not stated as an implicit proposition.
PropS, however, recognizes it:
Bears:(subj:Hurricane Sandy , prep:Down on East Coast )
(and so does the Berkeley parser).

Consider introducing some sort of unit tests

Since we're fixing bugs now, it's getting harder and harder to tell whether we're messing something elsewhere.
The proper way to deal with this is probably to introduce unit tests with an aggregated set of sentences and their expected OKR structures.
That way we can always make sure these tests pass before we merge a PR.
If we want to go further we can integrate github with Jenkins to run this process automatically.

Part of these unit tests can also include running the evaluations to get an updated metric associated with each PR.

Assertion error in props wrapper

@kleinay, can you please post an example sentence which gave you the following error -

Traceback (most recent call last):
  File "src/baseline_automatic_pipeline_system/okr_for_mds.py", line 62, in <module>
    okr_info = auto_pipeline_okr_info(tweets_strings)
  File "/home/ir/kleinay/OKR/src/baseline_automatic_pipeline_system/parse_okr_info.py", line 240, in auto_pipeline_okr_info
    parsed_sentences = parse_single_sentences(sentences)
  File "/home/ir/kleinay/OKR/src/baseline_automatic_pipeline_system/parse_okr_info.py", line 49, in parse_single_sentences
    pw.parse(sent)
  File "src/baseline_system/parsers/props_wrapper.py", line 100, in parse
    self.parse_okr()
  File "src/baseline_system/parsers/props_wrapper.py", line 109, in parse_okr
    self.parse_predicate(pred)
  File "src/baseline_system/parsers/props_wrapper.py", line 232, in parse_predicate
    dep_tree = self.get_dep_node(predicate_node)
  File "src/baseline_system/parsers/props_wrapper.py", line 176, in get_dep_node
    self.dep_tree
AssertionError: Problems matching 14; nodes matched were:[<props.dependency_tree.tree.DepTree object at 0x115a61d0>, <props.dependency_tree.tree.DepTree object at 0x115a6210>]; dep tree: [<props.dependency_tree.tree.DepTree object at 0x11577f50>, <props.dependency_tree.tree.DepTree object at 0x11577e10>, <props.dependency_tree.tree.DepTree object at 0x11577ed0>, <props.dependency_tree.tree.DepTree object at 0x11577e90>, <props.dependency_tree.tree.DepTree object at 0x11577e50>, <props.dependency_tree.tree.DepTree object at 0x11577fd0>, <props.dependency_tree.tree.DepTree object at 0x11577f10>, <props.dependency_tree.tree.DepTree object at 0x115a6050>, <props.dependency_tree.tree.DepTree object at 0x115a6090>, <props.dependency_tree.tree.DepTree object at 0x115a60d0>, <props.dependency_tree.tree.DepTree object at 0x115a6110>, <props.dependency_tree.tree.DepTree object at 0x115a6150>, <props.dependency_tree.tree.DepTree object at 0x115a6190>, <props.dependency_tree.tree.DepTree object at 0x115a61d0>, <props.dependency_tree.tree.DepTree object at 0x115a6210>, <props.dependency_tree.tree.DepTree object at 0x115a6250>, <props.dependency_tree.tree.DepTree object at 0x115a62d0>, <props.dependency_tree.tree.DepTree object at 0x115a6290>, <props.dependency_tree.tree.DepTree object at 0x115a6310>, <props.dependency_tree.tree.DepTree object at 0x115a6350>, <props.dependency_tree.tree.DepTree object at 0x115a6390>, <props.dependency_tree.tree.DepTree object at 0x115a63d0>]

Thanks!

PropSWrapper not including arguments with more than one node

We currently don’t collect entities which span more than one PropS node.

Hurricane Sandy blows out of Bahamas, after killing 43 in Caribbean, en route to US coast
Hurricane Sandy blows out Bahamas after killing

Hurricane Sandy Leaves 21 Dead in Caribbean
Hurricane Sandy Leaves 21 Dead

Collected by @OriShapira

Berkeley dependency parser fails on punctuation

Turns out that that Berkeley fails on sentences containing 3 subsequent '?'!' marks.
@kleinay overcame this problem by wrapping the single-sentence parsing stage with "try-except" block, logging and then ignoring sentences that it couldn't parse.
@kleinay, can you please post a problematic example sentence?

Thanks!

props wrapper- add sentence-id to symbols

I would like to ask for another feature in the props_wrapper component:
currnelty the parse function is getting only the sentence string - pw.parse(sent).
I would like it to get also a string representing the sentence_id - pw.parse(sent, sent_id).
This sent_id string would be appended to each symbol created by the parsers. i.e. the entities\proposition "keys".
e.g. if I give sent_id="1", then the Entities keys would be "A1_1", "A1_2" and so on, and same for Propositions.
This feature would be usefull as it allow me to treat these "symbols""keys" as global unique IDs of the entities\propositions.

Make sure PropS abstract nodes don't make it to the proposition structure

Some PropS abstract nodes make it to the output text:

Hurricane Sandy is said to be the strongest hurricane that will hit the East Coast in America.
Hurricane Sandy is said SameAs

@OriShapira

evaluation- some input tweets has no gold annotation

in the big stories, there is inconsistencies between the input file and the xmls for gold annotations. It turns out that not all tweets are annotated.
In Burma event, only 63 out of 78 tweets in the input file occur in the xml, and annotated with entities and propositions.
This results in false precision errors, since our predicted graph parses all tweets in the input file.

We should modify the evaluation to take it under consideration - only send to the parser tweets that occur in the corresponding gold file.

Artifacts from cleaning?

Some sentences seem very weird:
For example:

Boy Scouts ' ` perversion ' files set to be released travel

@rachelvov, What's the original tweet?
@kleinay, do you know the tweet id?

Implicit proposition argument ordering backwards

It seems that the order of the two arguments in the implicit propositions is backwards.
For example in:

{'Entities': {'A1': ('Hurricane', (0,)),
  'A2': ('Down', (3,)),
  'A3': ('Sandy', (1,)),
  'A4': ('on', (4,)),
  'A5': ('East', (5,))},
 'Predicates': {'P1': {'Arguments': ['A1', 'A2'],
   'Bare predicate': ('Bears', (2,)),
   'Head': {'Lemma': 'Bears', 'POS': 'VBZ', 'Surface': ('Bears', [2])},
   'Template': '{A1} Bears {A2}'},
  'P2': {'Arguments': ('A3', 'A1'),
   'Bare predicate': ('IMPLICIT', (-1,)),
   'Head': {'Lemma': 'IMPLICIT', 'POS': 'IMPLICIT', 'Surface': 'IMPLICIT'},
   'Template': '{A3} {A1}'},
  'P3': {'Arguments': ('A4', 'A2'),
   'Bare predicate': ('IMPLICIT', (-1,)),
   'Head': {'Lemma': 'IMPLICIT', 'POS': 'IMPLICIT', 'Surface': 'IMPLICIT'},
   'Template': '{A4} {A2}'},
  'P4': {'Arguments': ('A5', 'A2'),
   'Bare predicate': ('IMPLICIT', (-1,)),
   'Head': {'Lemma': 'IMPLICIT', 'POS': 'IMPLICIT', 'Surface': 'IMPLICIT'},
   'Template': '{A5} {A2}'}},
 'Sentence': 'Hurricane Sandy Bears Down on East Coast'}

We get "Sandy Hurricane", "on Down" and "East Down".
I assume they should all be the other way around.

Coreference crucial enhancement - account for predicate's arguments

Following #31 major change at splitting entities and marking implicit propositions, it is crucial to enhance proposition coreference algorithm by accounting for the arguments of the predicate (and not only the head lemma).

Computing F1 score for predicate mentions

https://github.com/vered1986/OKR/blob/master/src/baseline_system/eval_predicate_mention.py#L35

We shouldn't be using average of precision and recall for computing F1 but the harmonic mean.
Can we change/modify the method compute_predicate_mention_agreement in predicate_mention.py:
https://github.com/vered1986/OKR/blob/master/src/agreement/predicate_mention.py#L31

prefix "A" for Predicate in props_wrapper output

Got this output from props_wrapper, for the sentence:
'Several people killed or injured in an explosion at a church in northern Nigeria , officials say .'

Notice a predicate named "A3".
This causes a bug downstream.

{'Entities': {'A1': ('people', (1,)),
'A2': ('officials', (15,)),
'A3': ('or', (4,)),
'A4': ('explosion', (7,)),
'A5': ('Nigeria', (13,)),
'A6': ('Several', (0,)),
'A7': ('church', (10,))},
'Predicates': {'A3': {'Arguments': ['A4'],
'Bare predicate': ('injured in', (4, 5)),
'Head': {'Lemma': u'injure',
'POS': 'VBN',
'Surface': ('injured', [4])},
'Template': 'injured in {A4}'},
'P1': {'Arguments': ['A1'],
'Bare predicate': ('killed or injured', (2, 3, 4)),
'Head': {'Lemma': u'kill',
'POS': 'VBN',
'Surface': ('killed', [2])},
'Template': '{A1} killed or injured'},
'P2': {'Arguments': ['A2', 'A3'],
'Bare predicate': ('say', (16,)),
'Head': {'Lemma': 'say',
'POS': 'VBP',
'Surface': ('say', [16])},
'Template': '{A3} {A2} say'},
'P3': {'Arguments': ['A5'],
'Bare predicate': ('northern', (12,)),
'Head': {'Lemma': '',
'POS': 'JJ',
'Surface': ('northern', [12])},
'Template': 'northern {A5}'},
'P4': {'Arguments': ('A6', 'A1'),
'Bare predicate': ('IMPLICIT', (-1,)),
'Head': {'Lemma': 'IMPLICIT',
'POS': 'IMPLICIT',
'Surface': 'IMPLICIT'},
'Template': '{A6} {A1}'},
'P5': {'Arguments': ('A3', 'A4'),
'Bare predicate': ('IMPLICIT', (-1,)),
'Head': {'Lemma': 'IMPLICIT',
'POS': 'IMPLICIT',
'Surface': 'IMPLICIT'},
'Template': '{A3} {A4}'},
'P6': {'Arguments': ('A4', 'A7'),
'Bare predicate': ('IMPLICIT', (-1,)),
'Head': {'Lemma': 'IMPLICIT',
'POS': 'IMPLICIT',
'Surface': 'IMPLICIT'},
'Template': '{A4} {A7}'},
'P7': {'Arguments': ('A7', 'A5'),
'Bare predicate': ('IMPLICIT', (-1,)),
'Head': {'Lemma': 'IMPLICIT',
'POS': 'IMPLICIT',
'Surface': 'IMPLICIT'},
'Template': '{A7} {A5}'}},
'Sentence': 'Several people killed or injured in an explosion at a church in northern Nigeria , officials say .'}

A 'terms' Variable for ArgumentMention class

It would be very convenient if the text associated with an object of ArgumentMention(say argument_mention) can be called by something like ( argument_mention.terms).

Thanks a lot :)
https://github.com/vered1986/OKR/blob/master/src/common/okr.py#L185

vered1986 / okr Goto Github PK

okr's Introduction

OKR: A Consolidated Open Knowledge Representation for Multiple Texts

Running the baseline system:

Detailed description of the OKR object:

okr's People

Contributors

Stargazers

Watchers

Forkers

okr's Issues

Recommend Projects

Recommend Topics

Recommend Org