The original graph2vec uses the feature both prev and next extracted features in WL<br

extracted_features might different from the original graph2vec,about benedekrozemberczki/graph2vec

Comments (11)

ZiruiYan commented on July 18, 2024

In training, graph2vec just use center not the nei_list, so the set of words will be same. However, I wonder whether the order of extracted_features, which is different in two versions, will hurt the results.

from graph2vec.

shaulz commented on July 18, 2024

Hi ZiruiYan,
Not sure if this is related, but when I execute on a provided sample, results are not good.
To validate the result, I have added the following code:
`def test(model, document_collections):

ranks = []

second_ranks = []

for doc_id in range(len(document_collections)):

    inferred_vector = model.infer_vector(document_collections[doc_id].words)

    sims = model.docvecs.most_similar([inferred_vector], topn=len(model.docvecs))

    rank = [docid for docid, sim in sims].index("g_" + str(doc_id))

    ranks.append(rank)

    second_ranks.append(sims[1])

print(sorted(collections.Counter(ranks).items())`

Basically inferred vectors should be most similar to themselves, (0,50) would be ideal. What I am getting is absolutely random:
[(0, 1), (1, 2), (2, 3), (4, 1), (5, 2), (8, 2), (10, 2), (11, 2), (14, 1), (16, 3), (19, 1), (21, 1), (22, 2), (23, 1), (24, 1), (25, 2), (27, 1), (28, 2), (29, 1), (32, 2), (34, 1), (35, 3), (36, 1), (37, 1), (39, 1), (40, 3), (41, 1), (43, 1), (44, 1), (45, 2), (50, 3)]

Can you provide a fix for the code, so it will work same as original?

from graph2vec.

benedekrozemberczki commented on July 18, 2024

The provided samples are synthetic data.For inferring you have to use a large learning rate.

from graph2vec.

shaulz commented on July 18, 2024

Can you please point to the data set and learning rate to be used so the inferring will show reasonable results?

from graph2vec.

benedekrozemberczki commented on July 18, 2024

Synthetic data means ER graphs. A learning rate above 0.05 helps.

from graph2vec.

shaulz commented on July 18, 2024

So if I try on nci1 set from the original paper and learning rate 0.05, inferring should be OK right?
Do you have any not synthetic data sets in the json format as the software expects?

from graph2vec.

benedekrozemberczki commented on July 18, 2024

On a server in Edinburgh, I will look it up for you, but doing the transformation is not so complicated.

…

On Sat, 21 Sep 2019, 12:37 shaulz, ***@***.***> wrote: So if I try on nci1 set from the original paper and learning rate 0.05, inferring should be OK right? Do you have any not synthetic data sets in the json format as the software expects? — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#14?email_source=notifications&email_token=AEETMF6RLCVIHNKS3CLUDHDQKZEWJA5CNFSM4IXRNFR2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7IVGBY#issuecomment-533811975>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEETMF7GJNVPAJFZUBZNCHTQKZEWJANCNFSM4IXRNFRQ> .

from graph2vec.

shaulz commented on July 18, 2024

Thanks!!! Really appreciate this. Also want to see the inference working. Do not know if authors of the original paper tried to check the inference.

from graph2vec.

benedekrozemberczki commented on July 18, 2024

I know someone who tried and for them the mean problem was the: 1. Learning rate. 2. Making sure that the feature spaces overlap.

…

On Sat, 21 Sep 2019 at 13:17, shaulz ***@***.***> wrote: Thanks!!! Really appreciate this. Also want to see the inference working. Do not know if authors of the original paper tried to check the inference. — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#14?email_source=notifications&email_token=AEETMF4XT64TKQVSPXTZIFTQKZJJXA5CNFSM4IXRNFR2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7IV5CI#issuecomment-533814921>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEETMFZTGDBAK6H7UAVR4SDQKZJJXANCNFSM4IXRNFRQ> .

from graph2vec.

shaulz commented on July 18, 2024

Hi Benedek, good news, found a bug in my test procedure.
Now I am getting almost perfect inferring results on the synthetic set: [(0, 50), (1, 1)]
using parameters --learning-rate 0.05 --down-sampling 0.001 --epochs 500

The correct test code:
`def test(model, document_collections):

ranks = []

second_ranks = []

for doc_id in range(len(document_collections)):

    inferred_vector = model.infer_vector(document_collections[doc_id].words)

    sims = model.docvecs.most_similar(positive=[inferred_vector], topn=len(model.docvecs))

    rank = [docid for docid, sim in sims].index(document_collections[doc_id].tags[0])

    ranks.append(rank)

    second_ranks.append(sims[1])

print(sorted(collections.Counter(ranks).items()))`

from graph2vec.

shaulz commented on July 18, 2024

Something unrelated, suppose my node data is multidimensional i.e. has more than one label.
Any idea of how to use graph2vec in such case? Of course I can run separately on each label and merge results into a single TaggedDocument before calling Doc2Vec. Any other options?

from graph2vec.

extracted_features might different from the original graph2vec about graph2vec HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent