Comments (2)
Hi @LuisaWerner ,
I will try to give a broader overview before answering your specific questions below.
Overview
First, create_inverse_triple=True
is a flag of a triples factory (or more precisely, one of its base classes, KGInfo
), and means that we will add inverse triples when creating training instances: for each training triples
Since this creates new relation symbols, we must also learn representations for them, i.e., the model itself will train twice as many relation representations. Thus, the num_relations
attribute of a triples factory will be twice the number of original relations, cf. here
For evaluation, we do not need to add inverse relations. Rather, a model which has been trained with inverse relations, will (by default) use them to make head predictions, i.e., transform a head prediction
Besides these "artificial" inverse relations, datasets may also contain "natural" inverse relations, the presence of which may lead to unexpected results, cf., e.g., Dettmers et al., 2017. Examples of such "contaminated" datasets are FB15k
or WN18
, and thus "cleaned" dataset versions have been created, called FB15k-237
/ WN18-RR
. PyKEEN contains code for this deduplication process in the pykeen.triples.leakage
module; this file also shows sample code to reproduce FB15k-237
from FB15k
. The Sealant
object used in this process can also be used to further investigate intermediate results about duplicate / inverse relation pair candidates.
Answers
What happens during training and testing if I set create_inverse_triples = True when training and testing on the loaded data with a pykeen pipeline?
When it is set for the training factory, this will
- create additional "artificial" relations and train representations for them (i.e., the model gets more parameters)
- the training loop will create additional training instances (~triples), cf, e.g., here; this means that in each epoch, we essentially see the triples twice - once in their natural form
$(h, r, t)$ , and once as the inverse$(t, r^{-1}, h)$ .
For an evaluation factory, setting the flag will have no effect; I am not sure if we would see an error raised if the flag is set; if not, it would be better to change this. 🙂
does it depend on the number of inverse relations in my dataset how I should use create_inverse_triples?
No; the create_inverse_triples
flag does only controls the creation of additional (artificial) inverse relations. It does not affect existing "natural" ones.
Is there a function in pykeen that can automatically examine my data in advance to see if some relations are inverse ?
Yes, the methods in the pykeen.triples.leakage
module, in particular, unleak
for a high-level API for removing inverse and duplicate relations from a dataset, and Sealant
for a more fine-grained control / introspection of candidate relation pairs.
from pykeen.
Thanks a lot for your fast and detailed answer. This helps me a lot understanding the creation of inverse triples.
For an evaluation factory, setting the flag will have no effect; I am not sure if we would see an error raised if the flag is set; if not, it would be better to change this. 🙂
Regarding your note, I tested this and importing the test triples with create_inverse_tripes=True
didn't throw an error or warning.
The code for reproducing this looks as follows:
train = TriplesFactory.from_path(data_path / 'train.txt', , create_inverse_triples=True)
valid = TriplesFactory.from_path(data_path / 'valid.txt', create_inverse_triples=True)
test = TriplesFactory.from_path(data_path / 'test.txt',, create_inverse_triples=True)
result = pipeline(
training=train,
validation=valid,
testing=test,
model="ConvE",
model_kwargs=dict(predict_with_sigmoid=True),
loss=BCEWithLogitsLoss,
training_loop='sLCWA',
negative_sampler='bernoulli',
epochs=10,
stopper='early'
)
result.save_to_directory(model_path)
from pykeen.
Related Issues (20)
- from pykeen.pipeline import pipeline, pipeline issue HOT 3
- Evaluating metrics on many subsets with multiple models HOT 2
- Shape Mismatch upon initializing pretrained ComplEx embeddings HOT 2
- TransE - CUDA out of memory HOT 3
- Importing model_resolver HOT 2
- Getting Embeddings of the Entity and Relations HOT 13
- RGCN Hyper parameter optimization error HOT 1
- MatKG HOT 1
- HPO_Pipeline fails on AutoSF models HOT 1
- Unable to reproduce TransE experiment
- EarlyStopper: show progress bar
- Cosine Annealing with Warm Restart LR Scheduler recieving an unexpected kwarg `T_i` HOT 1
- OOM Crash on MPS/Apple silicon HOT 2
- Reason for omitting validation inference triples from filtering when doing test evaluation in inductive lp example HOT 2
- tqdm progressbar is still shown although setting `use_tqdm=False`
- create_inverse_triples=True fails for the ILPC datasets
- pip installation broken (1.8.1) HOT 2
- `LabelBasedInitializer` sample code may be missing an argument `triples_factory` HOT 2
- issue in "Full Inductive LP Example" HOT 4
- "Class Inheritance Diagram" of documentation about prediction in pyKeen does not show correctly HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pykeen.