Giter VIP home page Giter VIP logo

info_integration's People

Contributors

armitakhn avatar minhhuong avatar

Watchers

 avatar  avatar

Forkers

armitakhn

info_integration's Issues

Definition of "equality": implement a synVal function

Experiments show that equality by string value of two objects does not suffice to conclude if they are really equal, or, in other words, refer to the same entity.

We must prove that this hold: sameAs(u1, u2) ^ p(u1, o1) ^ p(u2, o2) ^ o1 == o2.

o1 may equal o2 in many senses, for instance:

  • they can be URI's referring to the same entity (thus an implicit sameAs)
  • they can be dates written in different formats
  • their string value can have typos.
  • etc.

Therefore, the equality == must be replaced by a general method named synVal(o1, o2) to take into account all possible forms of equality a pair of objects may hold.

SameAs validation should also return the number of wrong sameAs

At the moment, detect_false_sameas only returns a dictionary of correct sameAs links. Besides this, we also need the number of incorrect sameAs links, in order to see if our implemented code is able to detect wrong sameAs links from a validation set.

What we aim to compute: the number of detected false sameAs links / the total number of sameAs links added. So to return the false sameAs links is crucial to show that our code somewhat works at least.

N.B.: A sameAs link is wrong if this doesn't hold: sameAs(s1, s2) ^ p(s1, o1) ^ p(s2, o2) ^ equals(o1, o2) for a common property p of s1 and s2 (and it is correct if the condition holds for every common property of s1 and s2). But if s1 and s2 don't share any common property, we cannot conclude anything about the correctness of the same-as links between them (reason: Open World assumption)

Notes: should fix it because it shouldn't say any same-as link in the golden standard as false (which it does now). This affects correctness.

Use string matching similarity measure instead of "perfect" equality

Currently, the objects of functional property are compared "as they are" i.e. we only count if two objects are exactly the same. However, if typos not taken into account, we may (sadly) miss the correct condition. We should replace perfect equality == with a type of edit distance (e.g. Levenshtein, Jaro, Jaccard).

Odd result of injector module

The number of sameAs links after the injection, including the correct (gold standard) and incorrect links (added by us), is not coherent.

One simple test shows:

Number of erroneous links to be added: 708
Number of sameAs before injection: 1416
Number of sameAs after injection: 1481

So we see that even if we expect to have 708 wrong links to be added, at the end we only have 1481 links (including correct & wrong ones). I doubt the error is in inject function because the number of links random_uri returns is correct.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.