Comments (7)
Hi @PhaelIshall .
This is likely caused by a problem with the FB15k-237 dataset: It contains entities in test, which do not occur in any train triple. In our pre-defined splits, we drop such entities, since it is not exactly clear how to score triples with these entities. Thus, you cannot score triples with entities which did not occur in training set.
Moreover, you have to make sure that the triples factory which you used for training, and the one you use for inference use the same label-to-ID mapping. The easiest way is to use our pre-defined datasets, which take care of this, e.g. by
dataset = Fb15k237()
result = pipeline(
...
training_triples_factory=dataset.training,
validation_triples_factory=dataset.validation,
testing_triples_factory=dataset.testing,
)
tf = dataset.testing
Otherwise, make sure to provide the training factory's label-to-id mapping to the test one by:
training = result.model.triples_factory
tf = TriplesFactory(path='test.txt', entity_to_id=training.entity_to_id, relation_to_id=training.relation_to_id)
This also has the effect that triples containing unknown labels are dropped (you will see a warning about this).
P.S.: There are method to infer representations for entities which did not exist at training time. Often you refer to these methods as "inductive learning methods" as opposed to transductive training where all entities are known at training time. Some approaches:
from pykeen.
@mberr Thank you for your answer. I have tried both methods and they still result in the same error (still KeyError at '/m/05hyf'). Is it possible that some entities that are not in training have not been dropped in the predefined splits?
Could you please verify that
"/m/05hyf" in training.entity_to_id.keys() # should be True
If it is not true, you might have picked one of the entities not present in train. In that case, please try another one from training.entity_to_id.keys()
.
from pykeen.
Hi @PhaelIshall .
I just noticed that you use triples
(label-based) and not mapped_triples
(id-based). The filtering takes only place when mapping the triples, i.e. transforming label-based to ID-based triples.
A quick fix for your problem is the following modified code directly using the ID-based triples:
dataset = Fb15k237()
result = pipeline(
...
training_triples_factory=dataset.training,
validation_triples_factory=dataset.validation,
testing_triples_factory=dataset.testing,
)
model = result.model
results = []
id_to_label = invert_mapping(mapping=dataset.training.entity_to_id)
for triple in dataset.testing.mapped_triples:
# predict the tail from the head and relation
scores = model.predict_scores_all_tails(hr_batch=triple[None, :2].to(model.device))
# keep only top-1
top_scores, top_ids = scores.topk(k=1)
# send to cpu
top_scores, top_ids = top_scores.item(), top_ids.item()
# translate to label
results.append((top_ids, id_to_label[top_ids], top_scores))
I agree that this is an annoying inconsistency, and we either need to update the description of predict_tails
, or make sure that the triples
attribute gets filtered as well.
I am tagging @cthoyt here, since he has been involved in predict_tails
.
from pykeen.
@mberr Thank you for your answer. I have tried both methods and they still result in the same error (still KeyError at '/m/05hyf'). Is it possible that some entities that are not in training have not been dropped in the predefined splits?
from pykeen.
@mberr This time I did not try to predict from that head/relationship and instead just iterated through the testing set triples as you showed and made a prediction for every pair there. It seems that this pair is in the pykeen test set and not the training set. My understanding was that there are no such triples in the test set split in pykeen?
from pykeen.
@PhaelIshall Could you maybe share (again) a (minimal) code example to reproduce the error you are getting now?
from pykeen.
@mberr Sorry, here is the code I am running that results in an error when used trying to predict on the test set.
dataset = Fb15k237()
result = pipeline(
...
training_triples_factory=dataset.training,
validation_triples_factory=dataset.validation,
testing_triples_factory=dataset.testing,
)
tf = dataset.testing
triples = tf.triples
results = []
for t in range(len(triples)):
predictions = model.predict_tails(str(triples[t][0]), str(triples[t][1])).to_numpy() # predict the tail from the head and relation
results.append(predictions[0]) # to only add the top-1 prediction
Results in the same error KeyError: '/m/05hyf'
The only way to prevent this is to add an if statement in the loop as you mentioned
for t in range(len(triples)):
if triples[t][0] in training.entity_to_id.keys(): # check that the head is in the training set
predictions = model.predict_tails(str(triples[t][0]), str(triples[t][1])).to_numpy() # predict the tail from the head and relation
results.append(predictions[0]) # to only add the top-1 prediction
This means that in the Pykeen splits for Fb15k237, the entities that are not in training are not dropped?
from pykeen.
Related Issues (20)
- Possible issue with model evaluation when using datasets with inverse triples HOT 1
- RGCN RuntimeError: trying to backward through graph a second time. (has parameters but no reset_parameters) HOT 2
- QuatE: GPU memory is not released per epoch HOT 3
- Training loop does not update relation representations when continuing training HOT 2
- from pykeen.pipeline import pipeline, pipeline issue HOT 3
- Evaluating metrics on many subsets with multiple models HOT 2
- Shape Mismatch upon initializing pretrained ComplEx embeddings HOT 2
- TransE - CUDA out of memory HOT 3
- Importing model_resolver HOT 2
- Getting Embeddings of the Entity and Relations HOT 13
- RGCN Hyper parameter optimization error HOT 1
- MatKG HOT 1
- HPO_Pipeline fails on AutoSF models HOT 1
- Unable to reproduce TransE experiment
- EarlyStopper: show progress bar
- Cosine Annealing with Warm Restart LR Scheduler recieving an unexpected kwarg `T_i` HOT 1
- OOM Crash on MPS/Apple silicon HOT 2
- Reason for omitting validation inference triples from filtering when doing test evaluation in inductive lp example HOT 2
- tqdm progressbar is still shown although setting `use_tqdm=False`
- create_inverse_triples=True fails for the ILPC datasets
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pykeen.