Comments (10)
This sentence
"Buddhists and Muslims raping and killing each other."
outputs:
'{A2} {A1} {A1} killing {A3}'
and it's left in the output JSON.
Thanks.
from okr.
@OriShapira, Can you specify in which sentence this happens?
from okr.
In tweet 258795112623116288:
"Boy Scouts files on alleged sex abusers to be released -".
from okr.
I don't think it's a bug in the PropS wrapper.
This is what I get from running it as a single sentence (note the template doesn't repeat arguments):
{'Entities': {'A1': ('Boy Scouts files on alleged sex abusers to be released -',
(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10)),
'A2': ('alleged sex abusers', (4, 5, 6)),
'A3': ('on', (3,))},
'Predicates': {'P1': {'Arguments': ['A1', 'A2', 'A3'],
'Bare predicate': ('to be released', (7, 8, 9)),
'Head': {'Lemma': u'release',
'POS': 'VBN',
'Surface': ('released', [9])},
'Template': '{A1} {A3} {A2} to be released'}},
'Sentence': 'Boy Scouts files on alleged sex abusers to be released -'}
from okr.
Maybe A1 should be something like "Boy Scouts files", so that the template '{A1} {A3} {A2} to be released' would make more sense.
@kleinay, is this what might be causing the JSON generator to unexpectedly use A1 twice?
from okr.
Side note - if I run the PW on a cleaner version of the sentence (replacing the dash at the end with a full stop), I get something slightly saner (see below).
Maybe we should consider a cleaning phase for the tweets to remove this kind of noise that messes up automatic parsers.
{'Entities': {'A1': ('alleged sex abusers', (4, 5, 6)),
'A2': ('Boy Scouts', (0, 1))},
'Predicates': {'P1': {'Arguments': ['A1', 'A2', 'P2'],
'Bare predicate': ('files on', (2, 3)),
'Head': {'Lemma': u'file',
'POS': 'VBZ',
'Surface': ('files', [2])},
'Template': '{A2} files on {A1} {P2}'},
'P2': {'Arguments': ['A2'],
'Bare predicate': ('to be released', (7, 8, 9)),
'Head': {'Lemma': u'release',
'POS': 'VBN',
'Surface': ('released', [9])},
'Template': '{A2} to be released'}},
'Sentence': 'Boy Scouts files on alleged sex abusers to be released .'}
from okr.
Okay, I introduced a small fix, this looks better now on my side (see below).
The "on" error is from wrong underlying parsing, caused by the dash at the end.
@kleinay, after you merge the PR we can test if the issue of repeating arguments is also resolved.
{'Entities': {'A1': ('Boy Scouts files', (0, 1, 2)),
'A2': ('sex abusers', (5, 6)),
'A3': ('on', (3,))},
'Predicates': {'P1': {'Arguments': ['A1', 'A2', 'A3'],
'Bare predicate': ('to be released', (7, 8, 9)),
'Head': {'Lemma': u'release',
'POS': 'VBN',
'Surface': ('released', [9])},
'Template': '{A1} {A3} {A2} to be released'}},
'Sentence': 'Boy Scouts files on alleged sex abusers to be released -'}
from okr.
Following Gabi's fixes, I cannot see if the problem persist - The example-tweet is not making a duplicate-argument template anymore, but this is still in principal a possible scnerio. @OriShapira , please update if you can see the same problem anywhere after the fix.
from okr.
In principal, the scenerio of duplicated argument-slots in same template is currently possible, due to the fact that the argument-aligment algorithm we use is trivial - we assign all argument-mention refering to the same concept to the same argument slot (i.e proposition-level argument - denoted by a.1, a.2 etc.). Thus, in predicates where two argument refer the same concept (e.g. "Bob picture himself"), the template would contain the same argument slot ("a.1 picture a.1"), which is forbidden.
I commited a fix to handle this scenrio explicitly. @OriShapira , if you encounter this problem, we would run the auto-pipeline again with the fix to verify it solves the problem.
from okr.
this is fixed now I think, @OriShapira try to create summarization please.
from okr.
Related Issues (20)
- Artifacts from cleaning? HOT 5
- Weird parsing by PropS wrapper HOT 3
- Split entities to nouns? HOT 5
- Problems arising from errors in dependency parsing
- PropSWrapper not including arguments with more than one node HOT 1
- Make sure PropS abstract nodes don't make it to the proposition structure HOT 1
- Fix elements interchange HOT 3
- Consider introducing some sort of unit tests HOT 2
- Python import stop_words fails HOT 2
- determiner in an entity mention HOT 1
- Nominalizations? sentences without any entities or predicates in the pipeline output HOT 1
- template of implicit proposition is misformed HOT 1
- Coreference crucial enhancement - account for predicate's arguments HOT 4
- Implicit proposition argument ordering backwards HOT 1
- Important argument disregarded after implicit propositions added HOT 1
- props_wrapper: symbol A1 in template but not in list of entities; symbol P at entity name HOT 1
- not taking the head of noun compounds HOT 2
- evaluation- some input tweets has no gold annotation
- prefix "A" for Predicate in props_wrapper output HOT 1
- prepositional phrase of nouns is not captured as predicates HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from okr.