When trying to use my own data I get the following error. I am not very experienced an

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Issue with random number generation in TriplesFactory.split,about pykeen/pykeen

Comments (9)

ChristopherMarais commented on May 24, 2024 1

Thank you! I noticed that it works now when I use the 1.0.5-dev version.

from pykeen.

ChristopherMarais commented on May 24, 2024

when I manually split my data I am able to use it with the following code :

from pykeen.triples import TriplesFactory
from pykeen.pipeline import pipeline

training = TriplesFactory(path=work_path + '/train.txt')
testing = TriplesFactory(
    path=work_path + '/test.txt',
    entity_to_id=training.entity_to_id,
    relation_to_id=training.relation_to_id,
)

pipeline_result = pipeline(
    training_triples_factory=training,
    testing_triples_factory=testing,
    model='TransE',
)
pipeline_result.save_to_directory('test_pre_stratified_transe')

There seems to be an issue with the split function built into TriplesFactory.

from pykeen.

cthoyt commented on May 24, 2024

@ChristopherMarais this might be an issue with the type of python you're using. Are you on 32 bit? It would be helpful if you could report the version of OS you're using, the version of Python, and also PyKEEN

It could be the case that windows numpy defaults to 32 bit integers (reference: dask/dask-ml#230 (comment)). In that case, the fix for this bug would be to specify the datatype for the random number generator explicitly.

This might have slipped through the cracks because we haven't done any testing on Windows. Most people using PyKEEN would like to take advantage of GPUs, which are only available on Linux. However, we could set up CI for AppVeyor, since we aren't exactly pedantic towards the usage of GPUs. Sorry, probably too much information! Your feedback is appreciated and I hope we can get this working for you and anyone else who might run into this issue

from pykeen.

ChristopherMarais commented on May 24, 2024

I am using :
Windows 10
Python 3.8.3

I wasn't aware that I would not be able to access my GPU from windows. I assumed that if I got all the GPU related packages running like cudatools etc. on my anaconda environment that it would be capable of using my GPU.

what would be the recommended system requirements for using PyKEEN?

from pykeen.

cthoyt commented on May 24, 2024

Hi @ChristopherMarais thanks for letting me know. I'm not familiar with getting PyTorch up and running on Windows - would you mind sharing how you did it? For example, are you using conda? This would also be useful for us to share with other uses of PyKEEN. Then, in #95 I will try to make sure we have AppVeyor running on each push

from pykeen.

ChristopherMarais commented on May 24, 2024

I am using conda yes. I used the following command to install pytorch in an environment:

conda install -c pytorch pytorch

I can't remember entirely if I had to install cudatools separately too ( I have made so many environments recently).
I do remember having to install some packages before being able to install PyKEEN.
I have attached my exported environment .yml file as an attachment if you want to try and test it.
It should also show you which packages I have installed.

pykeen_env-yml.txt

when I used the following command I was able to copy the environment to another PC:

conda env create -f pykeen_env-yml.txt

when I run the examples in a jupyter notebook through jupyter lab not all of them work and I can't see that it does end up using my GPU so it might not fully work, however I do end up creating embeddings for many of my own datasets I just have to stratify them 'manually' before running them through the proposed pre-stratified example in the docs.

As a side note, (I know I should actually make a new issue for this) is there a way for me to control the size of the embeddings being created. I see that they all end up being vectors that are 50 elements long.
I see here https://www.aclweb.org/anthology/I17-2006/ that embedding dimensions could possibly have an effect. I would like to test that through PyKEEN too.

from pykeen.

mberr commented on May 24, 2024

@ChristopherMarais @cthoyt I think I found a solution to the issue, cf. #98

from pykeen.

mberr commented on May 24, 2024

@ChristopherMarais

As a side note, (I know I should actually make a new issue for this) is there a way for me to control the size of the embeddings being created. I see that they all end up being vectors that are 50 elements long.
I see here https://www.aclweb.org/anthology/I17-2006/ that embedding dimensions could possibly have an effect. I would like to test that through PyKEEN too.

You can change the dimension(s) of the embeddings when creating the model instance. Depending on the interaction model, there may be more than one dimension (e.g. a separate dimension for relation embeddings). You can check the documentation of the individual models for more information, cf. https://pykeen.readthedocs.io/en/latest/reference/models.html

When using the pipeline function you can pass them via model_kwargs, e.g.

pipeline_result = pipeline(
    training_triples_factory=training,
    testing_triples_factory=testing,
    model='TransE',
    model_kwargs=dict(embedding_dim=64),
)

to have 64-dimensional entity and relation embeddings.

from pykeen.

cthoyt commented on May 24, 2024

@ChristopherMarais thanks for the help and motivation, you'll see that we've got testing working for Windows now as of #95!

from pykeen.

Issue with random number generation in TriplesFactory.split about pykeen HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent