minhhuong / info_integration Goto Github PK
View Code? Open in Web Editor NEWInvalidation of wrong sameAs links
Invalidation of wrong sameAs links
Experiments show that equality by string value of two objects does not suffice to conclude if they are really equal, or, in other words, refer to the same entity.
We must prove that this hold: sameAs(u1, u2) ^ p(u1, o1) ^ p(u2, o2) ^ o1 == o2
.
o1
may equal o2
in many senses, for instance:
Therefore, the equality ==
must be replaced by a general method named synVal(o1, o2)
to take into account all possible forms of equality a pair of objects may hold.
At the moment, detect_false_sameas
only returns a dictionary of correct sameAs links. Besides this, we also need the number of incorrect sameAs links, in order to see if our implemented code is able to detect wrong sameAs links from a validation set.
What we aim to compute: the number of detected false sameAs links / the total number of sameAs links added. So to return the false sameAs links is crucial to show that our code somewhat works at least.
N.B.: A sameAs link is wrong if this doesn't hold: sameAs(s1, s2) ^ p(s1, o1) ^ p(s2, o2) ^ equals(o1, o2)
for a common property p of s1 and s2 (and it is correct if the condition holds for every common property of s1 and s2). But if s1 and s2 don't share any common property, we cannot conclude anything about the correctness of the same-as links between them (reason: Open World assumption)
Notes: should fix it because it shouldn't say any same-as link in the golden standard as false (which it does now). This affects correctness.
Currently, the objects of functional property are compared "as they are" i.e. we only count if two objects are exactly the same. However, if typos not taken into account, we may (sadly) miss the correct condition. We should replace perfect equality ==
with a type of edit distance (e.g. Levenshtein, Jaro, Jaccard).
The number of sameAs links after the injection, including the correct (gold standard) and incorrect links (added by us), is not coherent.
One simple test shows:
Number of erroneous links to be added: 708
Number of sameAs before injection: 1416
Number of sameAs after injection: 1481
So we see that even if we expect to have 708 wrong links to be added, at the end we only have 1481 links (including correct & wrong ones). I doubt the error is in inject
function because the number of links random_uri
returns is correct.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.