Giter VIP home page Giter VIP logo

Comments (2)

mberr avatar mberr commented on September 21, 2024

Hi @LuisaWerner ,

I will try to give a broader overview before answering your specific questions below.

Overview

First, create_inverse_triple=True is a flag of a triples factory (or more precisely, one of its base classes, KGInfo), and means that we will add inverse triples when creating training instances: for each training triples $(h, r, t)$, we will add a triple $(t, r^{-1}, t)$, where $r^{-1}$ is a new relation symbol.

Since this creates new relation symbols, we must also learn representations for them, i.e., the model itself will train twice as many relation representations. Thus, the num_relations attribute of a triples factory will be twice the number of original relations, cf. here

For evaluation, we do not need to add inverse relations. Rather, a model which has been trained with inverse relations, will (by default) use them to make head predictions, i.e., transform a head prediction $(?, r, t)$ to a tail prediction with the inverse relations $(t, r^{-1}, ?)$. The relevant code for this is here.

Besides these "artificial" inverse relations, datasets may also contain "natural" inverse relations, the presence of which may lead to unexpected results, cf., e.g., Dettmers et al., 2017. Examples of such "contaminated" datasets are FB15k or WN18, and thus "cleaned" dataset versions have been created, called FB15k-237 / WN18-RR. PyKEEN contains code for this deduplication process in the pykeen.triples.leakage module; this file also shows sample code to reproduce FB15k-237 from FB15k. The Sealant object used in this process can also be used to further investigate intermediate results about duplicate / inverse relation pair candidates.

Answers

What happens during training and testing if I set create_inverse_triples = True when training and testing on the loaded data with a pykeen pipeline?

When it is set for the training factory, this will

  • create additional "artificial" relations and train representations for them (i.e., the model gets more parameters)
  • the training loop will create additional training instances (~triples), cf, e.g., here; this means that in each epoch, we essentially see the triples twice - once in their natural form $(h, r, t)$, and once as the inverse $(t, r^{-1}, h)$.

For an evaluation factory, setting the flag will have no effect; I am not sure if we would see an error raised if the flag is set; if not, it would be better to change this. 🙂

does it depend on the number of inverse relations in my dataset how I should use create_inverse_triples?

No; the create_inverse_triples flag does only controls the creation of additional (artificial) inverse relations. It does not affect existing "natural" ones.

Is there a function in pykeen that can automatically examine my data in advance to see if some relations are inverse ?

Yes, the methods in the pykeen.triples.leakage module, in particular, unleak for a high-level API for removing inverse and duplicate relations from a dataset, and Sealant for a more fine-grained control / introspection of candidate relation pairs.

from pykeen.

LuisaWerner avatar LuisaWerner commented on September 21, 2024

Thanks a lot for your fast and detailed answer. This helps me a lot understanding the creation of inverse triples.

For an evaluation factory, setting the flag will have no effect; I am not sure if we would see an error raised if the flag is set; if not, it would be better to change this. 🙂
Regarding your note, I tested this and importing the test triples with create_inverse_tripes=True didn't throw an error or warning.
The code for reproducing this looks as follows:

train = TriplesFactory.from_path(data_path / 'train.txt', , create_inverse_triples=True)
valid = TriplesFactory.from_path(data_path / 'valid.txt',  create_inverse_triples=True)
test = TriplesFactory.from_path(data_path / 'test.txt',, create_inverse_triples=True)

result = pipeline(
        training=train,
        validation=valid,
        testing=test,
        model="ConvE",  
        model_kwargs=dict(predict_with_sigmoid=True),
        loss=BCEWithLogitsLoss,  
        training_loop='sLCWA', 
        negative_sampler='bernoulli',
        epochs=10,
        stopper='early'
    )
    result.save_to_directory(model_path)

from pykeen.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.