Hi, I have a question regarding the create

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Question about the use of `create_inverse_triples` about pykeen HOT 2 CLOSED

LuisaWerner commented on September 21, 2024

Question about the use of `create_inverse_triples`

from pykeen.

Comments (2)

mberr commented on September 21, 2024

Hi @LuisaWerner ,

I will try to give a broader overview before answering your specific questions below.

Overview

First, create_inverse_triple=True is a flag of a triples factory (or more precisely, one of its base classes, KGInfo), and means that we will add inverse triples when creating training instances: for each training triples $(h, r, t)$, we will add a triple $(t, r^{-1}, t)$, where $r^{-1}$ is a new relation symbol.

Since this creates new relation symbols, we must also learn representations for them, i.e., the model itself will train twice as many relation representations. Thus, the num_relations attribute of a triples factory will be twice the number of original relations, cf. here

For evaluation, we do not need to add inverse relations. Rather, a model which has been trained with inverse relations, will (by default) use them to make head predictions, i.e., transform a head prediction $(?, r, t)$ to a tail prediction with the inverse relations $(t, r^{-1}, ?)$. The relevant code for this is here.

Besides these "artificial" inverse relations, datasets may also contain "natural" inverse relations, the presence of which may lead to unexpected results, cf., e.g., Dettmers et al., 2017. Examples of such "contaminated" datasets are FB15k or WN18, and thus "cleaned" dataset versions have been created, called FB15k-237 / WN18-RR. PyKEEN contains code for this deduplication process in the pykeen.triples.leakage module; this file also shows sample code to reproduce FB15k-237 from FB15k. The Sealant object used in this process can also be used to further investigate intermediate results about duplicate / inverse relation pair candidates.

Answers

What happens during training and testing if I set create_inverse_triples = True when training and testing on the loaded data with a pykeen pipeline?

When it is set for the training factory, this will

create additional "artificial" relations and train representations for them (i.e., the model gets more parameters)
the training loop will create additional training instances (~triples), cf, e.g., here; this means that in each epoch, we essentially see the triples twice - once in their natural form $(h, r, t)$, and once as the inverse $(t, r^{-1}, h)$.

For an evaluation factory, setting the flag will have no effect; I am not sure if we would see an error raised if the flag is set; if not, it would be better to change this. 🙂

does it depend on the number of inverse relations in my dataset how I should use create_inverse_triples?

No; the create_inverse_triples flag does only controls the creation of additional (artificial) inverse relations. It does not affect existing "natural" ones.

Is there a function in pykeen that can automatically examine my data in advance to see if some relations are inverse ?

Yes, the methods in the pykeen.triples.leakage module, in particular, unleak for a high-level API for removing inverse and duplicate relations from a dataset, and Sealant for a more fine-grained control / introspection of candidate relation pairs.

from pykeen.

LuisaWerner commented on September 21, 2024

Thanks a lot for your fast and detailed answer. This helps me a lot understanding the creation of inverse triples.

For an evaluation factory, setting the flag will have no effect; I am not sure if we would see an error raised if the flag is set; if not, it would be better to change this. 🙂
Regarding your note, I tested this and importing the test triples with create_inverse_tripes=True didn't throw an error or warning.
The code for reproducing this looks as follows:

train = TriplesFactory.from_path(data_path / 'train.txt', , create_inverse_triples=True)
valid = TriplesFactory.from_path(data_path / 'valid.txt',  create_inverse_triples=True)
test = TriplesFactory.from_path(data_path / 'test.txt',, create_inverse_triples=True)

result = pipeline(
        training=train,
        validation=valid,
        testing=test,
        model="ConvE",  
        model_kwargs=dict(predict_with_sigmoid=True),
        loss=BCEWithLogitsLoss,  
        training_loop='sLCWA', 
        negative_sampler='bernoulli',
        epochs=10,
        stopper='early'
    )
    result.save_to_directory(model_path)

from pykeen.

Question about the use of `create_inverse_triples` about pykeen HOT 2 CLOSED

Comments (2)

Overview

Answers

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent