Giter VIP home page Giter VIP logo

hin2vec's People

Contributors

csiesheep avatar tabris223 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

hin2vec's Issues

memeory Error

hi csiesheep , i just have 3569nodes , my computer's RAM is 16GB then i got the memory error ,it's that mean the programming is not fit for the real data? and i think ramdom walk will not consume that large RAM? so i don't know why it happened
Traceback (most recent call last):
File "F:/project/network/hin2vec/main_py.py", line 158, in
parser.print_help()
File "F:/project/network/hin2vec/main_py.py", line 32, in main
print (tmp_walk_fname)
File "F:\project\network\hin2vec\ds\network.py", line 287, in random_walks
self.create_node_choices()
File "F:\project\network\hin2vec\ds\network.py", line 267, in create_node_choices
node_choices[from_id] += [(to_id, edge_id)] * int(w*10)
MemoryError

Modify code in main_py.py

更改代码:将main_py.py第99行的edge = ','.join([id2edge_class[int(id_)] for id_ in ids])
更改为:edge = ','.join([id2edge_class[int(id_)] for id_ in ids.split(',')])
然后执行:
python main_py.py res/karate_club_edges.txt node_vectors.txt metapath_vectors.txt -l 1000 -d 2 -w 2

Multiple metapaths

Hi! Very thanks for your codes and data.Howerever, I have a problem of multiple metapaths. The HIN usually contains more types of nodes and what should I do to get each node type embedding? For example, author (A)、 paper (P) and conference (C) are all in the DBLP dataset. I have a try to list them to one edge_list file as following:
1 A 2 P A-P
2 P 3 C P-C
But the program is in an endless loop. Could you tell me how to process this type of dataset? Much thanks for you.

KeyError while generating random walks

Hi,
I was trying to run hin2vec on a very large heterogeneous graph and I got following errors:

# ......
# ......
31301 2
31302 3
31303 3
31304 2
Generate random walks...
/tmp/tmpDB3NOI
Traceback (most recent call last):
  File "main_py.py", line 160, in <module>
    sys.exit(main(args[0], args[1], args[2], options))
  File "main_py.py", line 34, in main
    for walk in g.random_walks(options.walk_num, options.walk_length):
  File "/home/zjj/Projects/hin2vec/ds/network.py", line 295, in random_walks
    walk = self.a_random_walk(node, length)
  File "/home/zjj/Projects/hin2vec/ds/network.py", line 249, in a_random_walk
    if len(self.graph[node]) == 0:
KeyError: 30583

Thank you

About output?

hello,
I have a question about output. For example, Whether the node vector in the output distinguishes the type of each node?

In a trained node_vector file, why does the dim of the last node vector get shorter?

According to the code and instructions provided by you, I can run the program normally, but there is a problem:
In a trained node_vector file, why does the dim of the last node vector get shorter?
In the output node _vector file, the generated node (the last node) has an inconsistent vector dimension.

I don't have any documentation in the code that says what this node means for vectors, so I want to know what's wrong with it.
I now remove the last line in the vector file and the result is fine, but that's not a good way to solve the problem.
Please pay attention to this problem.

Error in main_py.py

Hi,
I got the following error when I ran "python main_py.py res/karate_club_edges.txt node_vectors.txt metapath_vectors.txt -l 1000 -d 2 -w 2"

Load a HIN...
U 34
{'U-U': 0}
0 16
1 3
2 1
3 2
4 5
5 2
6 3
7 2
8 9
9 6
10 10
11 6
12 3
13 4
14 4
15 4
16 5
17 4
18 2
19 4
20 3
21 12
22 2
23 17
24 2
25 2
26 2
27 2
28 2
29 5
30 3
31 4
32 3
33 2
Generate random walks...
/tmp/tmp7_cLNl
Reading nodes 340000
Reading paths 670000
0 0(count:339660, inverse:False)
1 0,0(count:339320, inverse:False)
training bytes: 1592525
distinct node count: 34
distinct path count: 2
start training
0.000735 330000/340000 (97.06%)
Finished. Total time: 1.18 minutes
Dump vectors...
Traceback (most recent call last):
File "main_py.py", line 160, in
sys.exit(main(args[0], args[1], args[2], options))
File "main_py.py", line 67, in main
output_path2vec(g, tmp_path_vec_fname, path_vec_fname)
File "main_py.py", line 99, in output_path2vec
edge = ','.join([id2edge_class[int(id_)] for id_ in ids])
ValueError: invalid literal for int() with base 10: ','

Thank you.

The main program is not generating vectors for all the nodes

3278 1
3279 1
3281 1
3282 1
3284 1
3285 1
3288 1
3290 1
Generate random walks...
Learn representations...
model_c/bin/hin2vec -size 100 -train /tmp/tmpzIusou -alpha 0.025000 -output /tmp/tmpjma27l-output_mp /tmp/tmpSPiNpI -window 4 -negative 5 -threads 1 -no_circle 1 -sigmoid_reg 0
Starting training using file /tmp/tmpzIusou
Node size: 3266
Nodes in train file: 34889
0 meta-path:0 17799
1 meta-path:00 709
Meta-path size: 2
Meta-paths in train file: 18508
Alpha: 0.010666 Progress(30006/34889): 86.00% Words/thread/sec: 330.22k
save node vectors
save mp vectors
Dump vectors...

As shown in this output I got from main.py, I have 3921 nodes in my edges file(containing only U-U directional edges), but the program only loaded 3266 nodes and generated 3266 vectors. The strangest thing is that the number of nodes loaded is pretty random, but never 3291. The percentage also stopped at 86% instead of 100%. The parameters I used are -l 1000 -d 100 -w 4.
A few days ago the program worked fine with the same data set with multiple kinds of edges with parameters -l 1000 -d 2 -w 2. Does anyone knows what is the problem here? (a Segmentation fault did happen once when I tried to produce vectors of dimension 128, could that be the reason? )

multiple processes Error in main_py.py

When i use main_py.py to run the example and add the option --num_processes = 2 ,there is an Error as follows:

Load a HIN...
U 34
{'U-U': 0}
0 16
1 3
2 1
3 2
4 5
5 2
6 3
7 2
8 9
9 6
10 10
11 6
12 3
13 4
14 4
15 4
16 5
17 4
18 2
19 4
20 3
21 12
22 2
23 17
24 2
25 2
26 2
27 2
28 2
29 5
30 3
31 4
32 3
33 2
Generate random walks...
c:\users\user\appdata\local\temp\tmpfhxsje
Reading nodes 30000
Reading paths 90000
0 0(count:33660, inverse:False)
1 0,0(count:33320, inverse:False)
2 0,0,0(count:32980, inverse:False)
training bytes: 158853
distinct node count: 34
distinct path count: 3
start training
Traceback (most recent call last):
File "C:/develop/pythonWorkspace/hin2vec/main_py.py", line 164, in
sys.exit(main(graph_fname, node_vec_fname, path_vec_fname, options))
File "C:/develop/pythonWorkspace/hin2vec/main_py.py", line 60, in main
k_hop_neighbors=neighbors,
File "C:\develop\pythonWorkspace\hin2vec\model\mp2vec_s.py", line 161, in train
p.start()
File "C:\develop\python27\lib\multiprocessing\process.py", line 130, in start
self._popen = Popen(self)
File "C:\develop\python27\lib\multiprocessing\forking.py", line 277, in init
dump(process_obj, to_child, HIGHEST_PROTOCOL)
File "C:\develop\python27\lib\multiprocessing\forking.py", line 199, in dump
ForkingPickler(file, protocol).dump(obj)
File "C:\develop\python27\lib\pickle.py", line 224, in dump
self.save(obj)
File "C:\develop\python27\lib\pickle.py", line 331, in save
self.save_reduce(obj=obj, *rv)
File "C:\develop\python27\lib\pickle.py", line 425, in save_reduce
save(state)
File "C:\develop\python27\lib\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "C:\develop\python27\lib\pickle.py", line 655, in save_dict
self._batch_setitems(obj.iteritems())
File "C:\develop\python27\lib\pickle.py", line 687, in _batch_setitems
save(v)
File "C:\develop\python27\lib\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "C:\develop\python27\lib\pickle.py", line 568, in save_tuple
save(element)
File "C:\develop\python27\lib\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "C:\develop\python27\lib\multiprocessing\forking.py", line 67, in dispatcher
self.save_reduce(obj=obj, *rv)
File "C:\develop\python27\lib\pickle.py", line 401, in save_reduce
save(args)
File "C:\develop\python27\lib\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "C:\develop\python27\lib\pickle.py", line 554, in save_tuple
save(element)
File "C:\develop\python27\lib\pickle.py", line 300, in save
self.save_global(obj)
File "C:\develop\python27\lib\pickle.py", line 754, in save_global
(obj, module, name))
pickle.PicklingError: Can't pickle <class 'c_double_Array_100'>: it's not found as main.c_double_Array_100

Please help me how to deal with it .Thx for your time !

segmentation fault (core dumped) on Ubuntu

Ubuntu 16.04.3 LTS
get segmentation fault when executing this command:

model_c/bin/hin2vec -size 100 -train temp_walk -alpha 0.025000 -output temp_node -output_mp temp_path -window 7 -negative 5 -threads 3 -no_circle 1 -sigmoid_reg 0

temp_walk is the temp file uesd to store generated random walks.

and the output is like

...
506 meta-path:0230303 63
507 meta-path:0302303 43
Meta-path size: 508
Meta-paths in train file: 63096600
Segmentation fault (core dumped)

about dataset?

If the meta path is APA, then the data set can be directly written as 1A 1 A APA, where 1-1 is obtained by multiplication of the matrix ?

Seek the Link-prediction implementation source code

Dear esteemed author
I learned your article HIN2Vec, and now I want to reproduce your experiment with my own data, which can help me really feel the idea of the whole article.Could you give me the source code of link-prediction ?
Please kindly let me know if this is possible.
Thank you for your consideration.

Got Segmentation fault: 11 when run sample on macOS 10.13

Starting training using file /var/folders/d0/xcwd2m5911g6jxynzqp224_00000gp/T/tmp1UAB_1
Node size: 34
Nodes in train file: 340000
0 meta-path:0 339660
1 meta-path:00 339320
Meta-path size: 2
Meta-paths in train file: 678980
Segmentation fault: 11

use hin2vec to link predicition

I don't know how to do it, for example, if delete 20% edges to construct test set, some nodes might be deleted too. How can we get those nodes' embedding?

Does anyone have some idea?
Thanks!!!

input format

Hello author, I have some doubts about the input format. In the example you gave, node1_name node2_type node2_name node2_type edge_type Why node1_name is followed by node2_type.

Index out of range

if for a node all edges' weight is smaller than 0.1, than this node is in HIN.graph but HIN.node_choices[node] is [], which would cause IndexError

Datasets

Hi,
could you please upload the datasets which you have used in the paper. It makes easy to reproduce the results, and compare with the baselines.
Thanks.

'model_c' 不是内部或外部命令,也不是可运行的程序

I'm sorry to disturb you. I encountered the following error when running your program. I don't know how to modify it. Please help me correct it.

model_c/bin/hin2vec -size 2 -train C:\Users\10736\AppData\Local\Temp\tmpgr594377 -alpha 0.025000 -output C:\Users\10736\AppData\Local\Temp\tmp05wd05ki -output_mp C:\Users\10736\AppData\Local\Temp\tmpijetvs6q -w
indow 2 -negative 5 -threads 1 -no_circle 1 -sigmoid_reg 0
'model_c' 不是内部或外部命令,也不是可运行的程序
或批处理文件。

How to calculate the gradient of metapath embedding?

If we use binary step function, the gradient of step function is always zero.

I think there is no backpropagation for the metapath embedding vector.

However, there is a update in metapath vector and I don't understand the flag "is_deepwalk".

What is that?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.