Giter VIP home page Giter VIP logo

graphembedding's Introduction

GraphEmbedding

GitHub Issues CI status codecov Codacy Badge Disscussion

Method

Model Paper Note
DeepWalk [KDD 2014]DeepWalk: Online Learning of Social Representations 【Graph Embedding】DeepWalk:算法原理,实现和应用
LINE [WWW 2015]LINE: Large-scale Information Network Embedding 【Graph Embedding】LINE:算法原理,实现和应用
Node2Vec [KDD 2016]node2vec: Scalable Feature Learning for Networks 【Graph Embedding】Node2Vec:算法原理,实现和应用
SDNE [KDD 2016]Structural Deep Network Embedding 【Graph Embedding】SDNE:算法原理,实现和应用
Struc2Vec [KDD 2017]struc2vec: Learning Node Representations from Structural Identity 【Graph Embedding】Struc2Vec:算法原理,实现和应用

How to run examples

  1. clone the repo and make sure you have installed tensorflow or tensorflow-gpu on your local machine.
  2. run following commands
python setup.py install
cd examples
python deepwalk_wiki.py

DisscussionGroup & Related Projects

公众号:浅梦学习笔记

微信:deepctrbot

Usage

The design and implementation follows simple principles(graph in,embedding out) as much as possible.

Input format

we use networkxto create graphs.The input of networkx graph is as follows: node1 node2 <edge_weight>

DeepWalk

G = nx.read_edgelist('../data/wiki/Wiki_edgelist.txt',create_using=nx.DiGraph(),nodetype=None,data=[('weight',int)])# Read graph

model = DeepWalk(G,walk_length=10,num_walks=80,workers=1)#init model
model.train(window_size=5,iter=3)# train model
embeddings = model.get_embeddings()# get embedding vectors

LINE

G = nx.read_edgelist('../data/wiki/Wiki_edgelist.txt',create_using=nx.DiGraph(),nodetype=None,data=[('weight',int)])#read graph

model = LINE(G,embedding_size=128,order='second') #init model,order can be ['first','second','all']
model.train(batch_size=1024,epochs=50,verbose=2)# train model
embeddings = model.get_embeddings()# get embedding vectors

Node2Vec

G=nx.read_edgelist('../data/wiki/Wiki_edgelist.txt',
                        create_using = nx.DiGraph(), nodetype = None, data = [('weight', int)])#read graph

model = Node2Vec(G, walk_length = 10, num_walks = 80,p = 0.25, q = 4, workers = 1)#init model
model.train(window_size = 5, iter = 3)# train model
embeddings = model.get_embeddings()# get embedding vectors

SDNE

G = nx.read_edgelist('../data/wiki/Wiki_edgelist.txt',create_using=nx.DiGraph(),nodetype=None,data=[('weight',int)])#read graph

model = SDNE(G,hidden_size=[256,128]) #init model
model.train(batch_size=3000,epochs=40,verbose=2)# train model
embeddings = model.get_embeddings()# get embedding vectors

Struc2Vec

G = nx.read_edgelist('../data/flight/brazil-airports.edgelist',create_using=nx.DiGraph(),nodetype=None,data=[('weight',int)])#read graph

model = Struc2Vec(G, 10, 80, workers=4, verbose=40, ) #init model
model.train(window_size = 5, iter = 3)# train model
embeddings = model.get_embeddings()# get embedding vectors

graphembedding's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

graphembedding's Issues

ZeroDivisionError

在将node2vec代码应用在新的数据集上,存在edge weights为0的情况, 出现ZeroDivisionError. 调整walker.py的line 185,186代码, 添加try except statement, 用append item to empty list的形式替代原来的list comprehension,仍然报错,不知道是什么原因。谢谢

cannot find random_walks.pkl

in deepwalk.py

`
def train(self, embed_size=128, window_size=5, workers=3, iter=5, **kwargs):

    sentences = pd.read_pickle('random_walks.pkl')
    kwargs["sentences"] = sentences
    kwargs["min_count"] = kwargs.get("min_count", 0)
    kwargs["size"] = embed_size
    kwargs["sg"] = 1  # skip gram
    kwargs["hs"] = 1  # deepwalk use Hierarchical Softmax
    kwargs["workers"] = workers
    kwargs["window"] = window_size
    kwargs["iter"] = iter

`
cannot find random_walks.pkl

could you provide it ?
Thanks!

nodevec

node2vec keyerror(' ',' ')???

如何使用GPU加速

你好,在运行node2vec时候节点多就跑的很慢,请问怎么才能使用GPU加速??感谢

unsupported operand type(s) for +: 'int' and 'str'

I got this issue using struc2vec and node2vec methods
`---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
in
1 model_struc2vec = ge.Struc2Vec(G, 10, 80, workers=4, verbose=40, ) #init model
----> 2 model_struc2vec.train(window_size = 5, iter = 3)# train model
3 embeddings_struc2vec = model_struc3vec.get_embeddings()# get embedding vectors

/anaconda3/envs/python36/lib/python3.6/site-packages/ge-0.0.0-py3.6.egg/ge/models/struc2vec.py in train(self, embed_size, window_size, workers, iter)
114 print("Learning representation...")
115 model = Word2Vec(sentences, size=embed_size, window=window_size, min_count=0, hs=1, sg=1, workers=workers,
--> 116 iter=iter)
117 print("Learning representation done!")
118 self.w2v_model = model

/anaconda3/envs/python36/lib/python3.6/site-packages/gensim-3.6.0-py3.6-macosx-10.7-x86_64.egg/gensim/models/word2vec.py in init(self, sentences, corpus_file, size, alpha, window, min_count, max_vocab_size, sample, seed, workers, min_alpha, sg, hs, negative, ns_exponent, cbow_mean, hashfxn, iter, null_word, trim_rule, sorted_vocab, batch_words, compute_loss, callbacks, max_final_vocab)
765 callbacks=callbacks, batch_words=batch_words, trim_rule=trim_rule, sg=sg, alpha=alpha, window=window,
766 seed=seed, hs=hs, negative=negative, cbow_mean=cbow_mean, min_alpha=min_alpha, compute_loss=compute_loss,
--> 767 fast_version=FAST_VERSION)
768
769 def _do_train_epoch(self, corpus_file, thread_id, offset, cython_vocab, thread_private_mem, cur_epoch,

/anaconda3/envs/python36/lib/python3.6/site-packages/gensim-3.6.0-py3.6-macosx-10.7-x86_64.egg/gensim/models/base_any2vec.py in init(self, sentences, corpus_file, workers, vector_size, epochs, callbacks, batch_words, trim_rule, sg, alpha, window, seed, hs, negative, ns_exponent, cbow_mean, min_alpha, compute_loss, fast_version, **kwargs)
757 raise TypeError("You can't pass a generator as the sentences argument. Try an iterator.")
758
--> 759 self.build_vocab(sentences=sentences, corpus_file=corpus_file, trim_rule=trim_rule)
760 self.train(
761 sentences=sentences, corpus_file=corpus_file, total_examples=self.corpus_count,

/anaconda3/envs/python36/lib/python3.6/site-packages/gensim-3.6.0-py3.6-macosx-10.7-x86_64.egg/gensim/models/base_any2vec.py in build_vocab(self, sentences, corpus_file, update, progress_per, keep_raw_vocab, trim_rule, **kwargs)
941 trim_rule=trim_rule, **kwargs)
942 report_values['memory'] = self.estimate_memory(vocab_size=report_values['num_retained_words'])
--> 943 self.trainables.prepare_weights(self.hs, self.negative, self.wv, update=update, vocabulary=self.vocabulary)
944
945 def build_vocab_from_freq(self, word_freq, keep_raw_vocab=False, corpus_count=None, trim_rule=None, update=False):

/anaconda3/envs/python36/lib/python3.6/site-packages/gensim-3.6.0-py3.6-macosx-10.7-x86_64.egg/gensim/models/word2vec.py in prepare_weights(self, hs, negative, wv, update, vocabulary)
1820 # set initial input/projection and hidden weights
1821 if not update:
-> 1822 self.reset_weights(hs, negative, wv)
1823 else:
1824 self.update_weights(hs, negative, wv)

/anaconda3/envs/python36/lib/python3.6/site-packages/gensim-3.6.0-py3.6-macosx-10.7-x86_64.egg/gensim/models/word2vec.py in reset_weights(self, hs, negative, wv)
1837 for i in xrange(len(wv.vocab)):
1838 # construct deterministic seed from word AND seed argument
-> 1839 wv.vectors[i] = self.seeded_vector(wv.index2word[i] + str(self.seed), wv.vector_size)
1840 if hs:
1841 self.syn1 = zeros((len(wv.vocab), self.layer1_size), dtype=REAL)

TypeError: unsupported operand type(s) for +: 'int' and 'str'
`

关于tensorflow的to_float问题

您好,我在尝试运行您的sdne_wiki.py代码时,提示了如下错误信息:
AttributeError: module 'tensorflow' has no attribute 'to_float'
我的python版本为3.6.7,tf.version = "2.0.0-alpha0",我猜测可能是我的版本太新问题,请问我可以如何跑起来您的代码呢?谢谢!

joblib backend问题

Preprocess transition probs...
[Parallel(n_jobs=30)]: Using backend MultiprocessingBackend with 30 concurrent workers.
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/ubuntu/anaconda3/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 567, in call
return self.func(*args, **kwargs)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 225, in call
for func, args, kwargs in self.items]
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 225, in
for func, args, kwargs in self.items]
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/ge/walker.py", line 88, in _simulate_walks
walk_length=walk_length, start_node=v))
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/ge/walker.py", line 56, in node2vec_walk
next_node = cur_nbrs[alias_sample(alias_edges[edge][0],
KeyError: (9836, 7324)
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "blog_node.py", line 69, in
model = Node2Vec(G, 10, 80, workers=30,p=0.25,q=2 )
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/ge/models/node2vec.py", line 39, in init
num_walks=num_walks, walk_length=walk_length, workers=workers, verbose=1)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/ge/walker.py", line 72, in simulate_walks
partition_num(num_walks, workers))
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 934, in call
self.retrieve()
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 833, in retrieve
self._output.extend(job.get(timeout=self.timeout))
File "/home/ubuntu/anaconda3/lib/python3.7/multiprocessing/pool.py", line 657, in get

我是下载之后把joblib的backend改成multiprocessing,但其实不管是默认的还是其他的,都会报错说keyerror是random一个数据点。我用的是http://socialcomputing.asu.edu/datasets/BlogCatalog3 这个数据集,不管是stru2vec还是node2vec都会遇到这样问题。请教

ValueError: Input contains NaN

Hello!

Thank you for providing this wonderful tool for study. I changed the second option into all for LINE (line 48 of GraphEmbedding/examples/line_wiki.py), and encountered the following error:

....
97/97 - 1s - loss: nan - first_order_loss: nan - second_order_loss: 0.0503
Epoch 48/50
97/97 - 1s - loss: nan - first_order_loss: nan - second_order_loss: 0.0480
Epoch 49/50
97/97 - 1s - loss: nan - first_order_loss: nan - second_order_loss: 0.0485
Epoch 50/50
97/97 - 1s - loss: nan - first_order_loss: nan - second_order_loss: 0.0472
Training classifier using 80.00% nodes...
Traceback (most recent call last):
  File "line_wiki.py", line 52, in <module>
    evaluate_embeddings(embeddings)
  File "line_wiki.py", line 19, in evaluate_embeddings
    clf.split_train_evaluate(X, Y, tr_frac)
  File "/export/d1/shuaiw/GraphEmbedding/env/lib/python3.6/site-packages/ge-0.0.0-py3.6.egg/ge/classify.py", line 66, in split_train_evaluate
  File "/export/d1/shuaiw/GraphEmbedding/env/lib/python3.6/site-packages/ge-0.0.0-py3.6.egg/ge/classify.py", line 34, in train
  File "/export/d1/shuaiw/GraphEmbedding/env/lib/python3.6/site-packages/scikit_learn-0.22.2.post1-py3.6-linux-x86_64.egg/sklearn/multiclass.py", line 239, in fit
    for i, column in enumerate(columns))
  File "/export/d1/shuaiw/GraphEmbedding/env/lib/python3.6/site-packages/joblib-0.13.0-py3.6.egg/joblib/parallel.py", line 917, in __call__
    if self.dispatch_one_batch(iterator):
  File "/export/d1/shuaiw/GraphEmbedding/env/lib/python3.6/site-packages/joblib-0.13.0-py3.6.egg/joblib/parallel.py", line 759, in dispatch_one_batch
    self._dispatch(tasks)
  File "/export/d1/shuaiw/GraphEmbedding/env/lib/python3.6/site-packages/joblib-0.13.0-py3.6.egg/joblib/parallel.py", line 716, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/export/d1/shuaiw/GraphEmbedding/env/lib/python3.6/site-packages/joblib-0.13.0-py3.6.egg/joblib/_parallel_backends.py", line 182, in apply_async
    result = ImmediateResult(func)
  File "/export/d1/shuaiw/GraphEmbedding/env/lib/python3.6/site-packages/joblib-0.13.0-py3.6.egg/joblib/_parallel_backends.py", line 549, in __init__
    self.results = batch()
  File "/export/d1/shuaiw/GraphEmbedding/env/lib/python3.6/site-packages/joblib-0.13.0-py3.6.egg/joblib/parallel.py", line 225, in __call__
    for func, args, kwargs in self.items]
  File "/export/d1/shuaiw/GraphEmbedding/env/lib/python3.6/site-packages/joblib-0.13.0-py3.6.egg/joblib/parallel.py", line 225, in <listcomp>
    for func, args, kwargs in self.items]
  File "/export/d1/shuaiw/GraphEmbedding/env/lib/python3.6/site-packages/scikit_learn-0.22.2.post1-py3.6-linux-x86_64.egg/sklearn/multiclass.py", line 79, in _fit_binary
    estimator.fit(X, y)
  File "/export/d1/shuaiw/GraphEmbedding/env/lib/python3.6/site-packages/scikit_learn-0.22.2.post1-py3.6-linux-x86_64.egg/sklearn/linear_model/_logistic.py", line 1527, in fit
    accept_large_sparse=solver != 'liblinear')
  File "/export/d1/shuaiw/GraphEmbedding/env/lib/python3.6/site-packages/scikit_learn-0.22.2.post1-py3.6-linux-x86_64.egg/sklearn/utils/validation.py", line 755, in check_X_y
    estimator=estimator)
  File "/export/d1/shuaiw/GraphEmbedding/env/lib/python3.6/site-packages/scikit_learn-0.22.2.post1-py3.6-linux-x86_64.egg/sklearn/utils/validation.py", line 578, in check_array
    allow_nan=force_all_finite == 'allow-nan')
  File "/export/d1/shuaiw/GraphEmbedding/env/lib/python3.6/site-packages/scikit_learn-0.22.2.post1-py3.6-linux-x86_64.egg/sklearn/utils/validation.py", line 60, in _assert_all_finite
    msg_dtype if msg_dtype is not None else X.dtype)
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

Could anyone take a look and see if that can be fixed? Thank you very much!

deepwalk效果很差(deepwalk effect is poor)

我使用这个作者提供的网络嵌入代码做实验,花了将近一个月的时间,都没有出效果,都快崩溃了。检查了无数次自己的算法是否有问题,最后才发现是这个作者提供的deepwalk代码有问题,大家如果是要用deepwalk,请使用https://github.com/phanein/deepwalk

I experimented with the embedding code provided by this author, and it took me nearly a month, and it didn't work, and I almost crashed. When I checked my algorithm for a number of times, I finally found that there was a problem with the deepwalk code provided by this author. If you want to use deepwalk, please use https://github.com/phanein/deepwalk.

ImportError

ImportError: No module named 'tensorflow.python.keras'

No module named 'joblib'

请问这是什么原因?
Traceback (most recent call last):
File "deepwalk_wiki.py", line 4, in
from ge.classify import read_node_label, Classifier
File "//anaconda/lib/python3.5/site-packages/ge-0.0.0-py3.5.egg/ge/init.py", line 1, in
File "//anaconda/lib/python3.5/site-packages/ge-0.0.0-py3.5.egg/ge/models/init.py", line 1, in
File "//anaconda/lib/python3.5/site-packages/ge-0.0.0-py3.5.egg/ge/models/deepwalk.py", line 20, in
File "//anaconda/lib/python3.5/site-packages/ge-0.0.0-py3.5.egg/ge/walker.py", line 7, in
ImportError: No module named 'joblib'

使用SDNE的时候报内存不足是为什么

就是使用SDNE进行跑的时候,发现内存不足,我的内存有128G,这是问什么,我也已经将网络调小了,
image
请问还有什么办法,以及这个所以需要的内存是怎么算的,谢谢

代码逻辑的疑问

if not self.use_rejection_sampling:
alias_edges = {}
for edge in G.edges():
alias_edges[edge] = self.get_alias_edge(edge[0], edge[1])
self.alias_edges = alias_edges
这是walk.py里面根据前一个节点t和当前节点v动态计算概率的代码,但为什么要放在if not self.use_rejection_sampling:这个条件下,不用负采样也应该有这个计算,然后更新alias_edges吧

关于deepwalk的随机采样问题

    def deepwalk_walk(self, walk_length, start_node):

        walk = [start_node]

        while len(walk) < walk_length:
            cur = walk[-1]
            cur_nbrs = list(self.G.neighbors(cur))
            if len(cur_nbrs) > 0:
                walk.append(random.choice(cur_nbrs))
            else:
                break
        return walk

个人觉得上面的函数是否不太妥当,完全没用到,各个结点间的转移概率。各个节点间的转移概率其实是可以统计得到的,是否用上会更好?

SDNE

WARNING:tensorflow:From /home/ant/.conda/envs/lxh_tf/lib/python3.7/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling Base
ResourceVariable.init (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.Instructions for updating:
If using Keras pass *_constraint arguments to layers.
Traceback (most recent call last):
File "sdne_wiki.py", line 54, in
model = SDNE(G, hidden_size=[256, 128],)
File "/home/ant/researchInstitute/luoxianhao/ge/shenweichen/graphEmbedding/ge/models/sdne.py", line 93, in init
self.reset_model()
File "/home/ant/researchInstitute/luoxianhao/ge/shenweichen/graphEmbedding/ge/models/sdne.py", line 101, in reset_model
self.model.compile(opt, [l_2nd(self.beta), l_1st(self.alpha)])
File "/home/ant/.conda/envs/lxh_tf/lib/python3.7/site-packages/tensorflow_core/python/training/tracking/base.py", line 457, in _method_wrapper
result = method(self, *args, **kwargs)
File "/home/ant/.conda/envs/lxh_tf/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training.py", line 373, in compile
self._compile_weights_loss_and_weighted_metrics()
File "/home/ant/.conda/envs/lxh_tf/lib/python3.7/site-packages/tensorflow_core/python/training/tracking/base.py", line 457, in _method_wrapper
result = method(self, *args, **kwargs)
File "/home/ant/.conda/envs/lxh_tf/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training.py", line 1652, in compile_weights_loss
and_weighted_metrics self.total_loss = self._prepare_total_loss(masks)
File "/home/ant/.conda/envs/lxh_tf/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training.py", line 1712, in _prepare_total_loss
per_sample_losses = loss_fn.call(y_true, y_pred)
File "/home/ant/.conda/envs/lxh_tf/lib/python3.7/site-packages/tensorflow_core/python/keras/losses.py", line 216, in call
return self.fn(y_true, y_pred, **self.fn_kwargs)
File "/home/ant/researchInstitute/luoxianhao/ge/shenweichen/graphEmbedding/ge/models/sdne.py", line 37, in loss_2nd
b
[y_true != 0] = beta
TypeError: 'Tensor' object does not support item assignment

编译环境:tf1.15

如何修改下面这行代码
b_[y_true != 0] = beta

Problem with installation

Hello, I have a problem at python3.7

error: python-dateutil 2.8.1 is installed but python-dateutil<2.8.1,>=2.1 is required by {'botocore'}

Full log

eurvanov@eurvanov-HP-ProBook-430-G5:~/python/adeo/market-radar/synonyms-service/research/GraphEmbedding$ python setup.py install
running install
running bdist_egg
running egg_info
writing ge.egg-info/PKG-INFO
writing dependency_links to ge.egg-info/dependency_links.txt
writing requirements to ge.egg-info/requires.txt
writing top-level names to ge.egg-info/top_level.txt
reading manifest file 'ge.egg-info/SOURCES.txt'
writing manifest file 'ge.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
creating build/bdist.linux-x86_64/egg
creating build/bdist.linux-x86_64/egg/ge
copying build/lib/ge/utils.py -> build/bdist.linux-x86_64/egg/ge
copying build/lib/ge/classify.py -> build/bdist.linux-x86_64/egg/ge
copying build/lib/ge/alias.py -> build/bdist.linux-x86_64/egg/ge
copying build/lib/ge/init.py -> build/bdist.linux-x86_64/egg/ge
creating build/bdist.linux-x86_64/egg/ge/models
copying build/lib/ge/models/node2vec.py -> build/bdist.linux-x86_64/egg/ge/models
copying build/lib/ge/models/deepwalk.py -> build/bdist.linux-x86_64/egg/ge/models
copying build/lib/ge/models/struc2vec.py -> build/bdist.linux-x86_64/egg/ge/models
copying build/lib/ge/models/init.py -> build/bdist.linux-x86_64/egg/ge/models
copying build/lib/ge/models/line.py -> build/bdist.linux-x86_64/egg/ge/models
copying build/lib/ge/models/sdne.py -> build/bdist.linux-x86_64/egg/ge/models
copying build/lib/ge/walker.py -> build/bdist.linux-x86_64/egg/ge
byte-compiling build/bdist.linux-x86_64/egg/ge/utils.py to utils.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/ge/classify.py to classify.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/ge/alias.py to alias.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/ge/init.py to init.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/ge/models/node2vec.py to node2vec.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/ge/models/deepwalk.py to deepwalk.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/ge/models/struc2vec.py to struc2vec.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/ge/models/init.py to init.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/ge/models/line.py to line.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/ge/models/sdne.py to sdne.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/ge/walker.py to walker.cpython-37.pyc
creating build/bdist.linux-x86_64/egg/EGG-INFO
copying ge.egg-info/PKG-INFO -> build/bdist.linux-x86_64/egg/EGG-INFO
copying ge.egg-info/SOURCES.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying ge.egg-info/dependency_links.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying ge.egg-info/requires.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying ge.egg-info/top_level.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
zip_safe flag not set; analyzing archive contents...
creating 'dist/ge-0.0.0-py3.7.egg' and adding 'build/bdist.linux-x86_64/egg' to it
removing 'build/bdist.linux-x86_64/egg' (and everything under it)
Processing ge-0.0.0-py3.7.egg
Removing /home/eurvanov/anaconda3/envs/mr-research/lib/python3.7/site-packages/ge-0.0.0-py3.7.egg
Copying ge-0.0.0-py3.7.egg to /home/eurvanov/anaconda3/envs/mr-research/lib/python3.7/site-packages
ge 0.0.0 is already the active version in easy-install.pth

Installed /home/eurvanov/anaconda3/envs/mr-research/lib/python3.7/site-packages/ge-0.0.0-py3.7.egg
Processing dependencies for ge==0.0.0
Searching for python-dateutil>=2.1
Reading https://pypi.org/simple/python-dateutil/
Downloading https://files.pythonhosted.org/packages/d4/70/d60450c3dd48ef87586924207ae8907090de0b306af2bce5d134d78615cb/python_dateutil-2.8.1-py2.py3-none-any.whl#sha256=75bb3f31ea686f1197762692a9ee6a7550b59fc6ca3a1f4b5d7e32fb98e2da2a
Best match: python-dateutil 2.8.1
Processing python_dateutil-2.8.1-py2.py3-none-any.whl
Installing python_dateutil-2.8.1-py2.py3-none-any.whl to /home/eurvanov/anaconda3/envs/mr-research/lib/python3.7/site-packages
writing requirements to /home/eurvanov/anaconda3/envs/mr-research/lib/python3.7/site-packages/python_dateutil-2.8.1-py3.7.egg/EGG-INFO/requires.txt
Adding python-dateutil 2.8.1 to easy-install.pth file

Installed /home/eurvanov/anaconda3/envs/mr-research/lib/python3.7/site-packages/python_dateutil-2.8.1-py3.7.egg
Searching for botocore<1.14.0,>=1.13.26
Reading https://pypi.org/simple/botocore/
Downloading https://files.pythonhosted.org/packages/8a/93/ea2ec042794dfda186348df02c6057223a8bbc21c055124fbe3e16925441/botocore-1.13.26-py2.py3-none-any.whl#sha256=9fefb42c6d4fa0079a52b49e5491fa0738cca63649f68be180b3ed6c253d2622
Best match: botocore 1.13.26
Processing botocore-1.13.26-py2.py3-none-any.whl
Installing botocore-1.13.26-py2.py3-none-any.whl to /home/eurvanov/anaconda3/envs/mr-research/lib/python3.7/site-packages
writing requirements to /home/eurvanov/anaconda3/envs/mr-research/lib/python3.7/site-packages/botocore-1.13.26-py3.7.egg/EGG-INFO/requires.txt
Adding botocore 1.13.26 to easy-install.pth file

Installed /home/eurvanov/anaconda3/envs/mr-research/lib/python3.7/site-packages/botocore-1.13.26-py3.7.egg
error: python-dateutil 2.8.1 is installed but python-dateutil<2.8.1,>=2.1 is required by {'botocore'}

关于Struc2vec构建相似度图代码的一些问题

您好,我在使用struc2vec构建结构相似度的代码时发现有一些问题。

具体来说,当opt2_reduce_sim_calc开启的时候,get_vertices函数拿到的是对于与每个节点自己相似的邻居,这里的这个相似性是单向的。也就是假如a与b相似,则a的邻居中有b,若b与a也相似,则b的邻居中也有a(类似于有向图),然而在后面_get_layer_rep方法中,是把这种相似度当作无向情况来处理的,也就是只考虑了opt2_reduce_sim_calc为False的情况。
此时,当opt2为True时,由于a和b的相似邻居中都有对方,而构建边的时候,会为每个点保存“入”和“出”的两条边,这样就会导致重复的边。换句话说,我认为_get_layer_rep在opt2_reduce_sim_calc选项为True的时候,行为是有错误的。

期待您的回复
最后感谢您开源这部分代码,极大的方便了我的工作,节约了时间,谢谢

似乎alias采样的使用是有问题的

在node2vec中,你们在对邻居采样的时候,对权重(概率)做了归一化
`

    alias_nodes = {}

    for node in G.nodes():

        unnormalized_probs = [G[node][nbr].get('weight', 1.0)

                              for nbr in G.neighbors(node)]

        norm_const = sum(unnormalized_probs)

        normalized_probs = [

            float(u_prob)/norm_const for u_prob in unnormalized_probs]

        alias_nodes[node] = create_alias_table(normalized_probs)`

那么所有的概率都将小于1,使用alias采样时所有的概率都会分到small里,那么就和不使用是没区别的。 这样的话似乎就是等概率采样了?

Confusing code

Hi, I am trying to re-implement SDNE code but I got stuck when reading code. Could you explain for me these below lines:
-

edge_weight = graph[v1][v2].get('weight', 1)
: what is meaning of get('weight',1). Why does it has 1 value here?

Thank your helping!

关于top_k_list

请教:
classify.py 中 定义top_k_list = [len(l) for l in Y]
top_k_list的元素就是测试集Y中每个对应元素的长度?
top_k_list是什么作用呢?

SDNE

_create_A_L函数

这步A_data+A_data长度为node_size的2倍, 而shape=(node_size, node_size) 本地测试的时候报错了。
    A_ = sp.csr_matrix((A_data + A_data, (A_row_index + A_col_index, A_col_index + A_row_index)),
                       shape=(node_size, node_size))

    D = sp.diags(A_.sum(axis=1).flatten().tolist()[0])
    L = D - A_
    return A, L`

report the results on all datasets

Results of node2vec, deewalk, line, sdne and struc2vec on all datasets. Hope this will help anyone who is interested in this project.

wiki

Alg micro macro samples weighted acc NMI
node2vec 0.7447 0.6771 0.7193 0.7450 0.6279 0.3536
deepwalk 0.7307 0.6579 0.7058 0.7296 0.6091 0.3416
line 0.5059 0.2461 0.4536 0.4523 0.3160 0.0798
sdne 0.6916 0.5119 0.6528 0.6718 0.5530 0.1801
struc2vec 0.4512 0.1249 0.3933 0.3383 0.2308 0.0516

brazil

Alg micro macro samples weighted acc NMI
node2vec 0.1481 0.1579 0.1481 0.1648 0.1481 0.0442
deepwalk 0.1852 0.1694 0.1852 0.2004 0.1852 0.0471
line 0.4444 0.4167 0.4444 0.4753 0.4444 0.2822
sdne 0.5926 0.5814 0.5926 0.5928 0.5926 0.4041
struc2vec 0.7778 0.7739 0.7778 0.7762 0.7778 0.3906

europe

Alg micro macro samples weighted acc NMI
node2vec 0.4125 0.4156 0.4125 0.4209 0.4125 0.0155
deepwalk 0.4375 0.4358 0.4375 0.4347 0.4375 0.0180
line 0.5000 0.4983 0.5000 0.5016 0.5000 0.1186
sdne 0.5000 0.4818 0.5000 0.4916 0.5000 0.1714
struc2vec 0.5375 0.5247 0.5375 0.5294 0.5375 0.0783

usa

Alg micro macro samples weighted acc NMI
node2vec 0.5420 0.5278 0.5420 0.5351 0.5420 0.0822
deepwalk 0.5504 0.5394 0.5504 0.5472 0.5504 0.0910
line 0.4160 0.4032 0.4160 0.4175 0.4160 0.1660
sdne 0.6092 0.5819 0.6092 0.5971 0.6092 0.2028
struc2vec 0.5210 0.5040 0.5210 0.5211 0.5210 0.0702

SDNE Examples unable to run

I've tried tensorflow versions 1.15 and >2 and get this error. Were there breaking changes to this repo? If you or anyone else don't get these errors could you share you environment configuration?

(py3SDNE3) mac0632:examples patrick.mullen$ python sdne_wiki.py
WARNING:tensorflow:From /Users/patrick.mullen/aws/py3SDNE3/lib/python3.7/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
Traceback (most recent call last):
  File "sdne_wiki.py", line 49, in <module>
    model = SDNE(G, hidden_size=[256, 128],)
  File "/Users/patrick.mullen/aws/py3SDNE3/lib/python3.7/site-packages/ge-0.0.0-py3.7.egg/ge/models/sdne.py", line 93, in __init__
  File "/Users/patrick.mullen/aws/py3SDNE3/lib/python3.7/site-packages/ge-0.0.0-py3.7.egg/ge/models/sdne.py", line 101, in reset_model
  File "/Users/patrick.mullen/aws/py3SDNE3/lib/python3.7/site-packages/tensorflow_core/python/training/tracking/base.py", line 457, in _method_wrapper
    result = method(self, *args, **kwargs)
  File "/Users/patrick.mullen/aws/py3SDNE3/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training.py", line 373, in compile
    self._compile_weights_loss_and_weighted_metrics()
  File "/Users/patrick.mullen/aws/py3SDNE3/lib/python3.7/site-packages/tensorflow_core/python/training/tracking/base.py", line 457, in _method_wrapper
    result = method(self, *args, **kwargs)
  File "/Users/patrick.mullen/aws/py3SDNE3/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training.py", line 1652, in _compile_weights_loss_and_weighted_metrics
    self.total_loss = self._prepare_total_loss(masks)
  File "/Users/patrick.mullen/aws/py3SDNE3/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training.py", line 1712, in _prepare_total_loss
    per_sample_losses = loss_fn.call(y_true, y_pred)
  File "/Users/patrick.mullen/aws/py3SDNE3/lib/python3.7/site-packages/tensorflow_core/python/keras/losses.py", line 216, in call
    return self.fn(y_true, y_pred, **self._fn_kwargs)
  File "/Users/patrick.mullen/aws/py3SDNE3/lib/python3.7/site-packages/ge-0.0.0-py3.7.egg/ge/models/sdne.py", line 36, in loss_2nd
  File "<__array_function__ internals>", line 6, in ones_like
  File "/Users/patrick.mullen/aws/py3SDNE3/lib/python3.7/site-packages/numpy-1.18.2-py3.7-macosx-10.14-x86_64.egg/numpy/core/numeric.py", line 278, in ones_like
    res = empty_like(a, dtype=dtype, order=order, subok=subok, shape=shape)
  File "<__array_function__ internals>", line 6, in empty_like
  File "/Users/patrick.mullen/aws/py3SDNE3/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 736, in __array__
    " array.".format(self.name))
NotImplementedError: Cannot convert a symbolic Tensor (2nd_target:0) to a numpy array.

LINE采样问题

line.py中111行到137行,建立结点同名表的时候,norm_prob的总和是1,在create_alias_table函数里将norm_prob转换为均值为1。请问为什么在创建边同名表的时候,norm_prob的均值是1?

    def _gen_sampling_table(self):

        # create sampling table for vertex
        power = 0.75
        numNodes = self.node_size
        node_degree = np.zeros(numNodes)  # out degree
        node2idx = self.node2idx

        for edge in self.graph.edges():
            node_degree[node2idx[edge[0]]
                        ] += self.graph[edge[0]][edge[1]].get('weight', 1.0)

        total_sum = sum([math.pow(node_degree[i], power)
                         for i in range(numNodes)])
        norm_prob = [float(math.pow(node_degree[j], power)) /
                     total_sum for j in range(numNodes)]

        self.node_accept, self.node_alias = create_alias_table(norm_prob)

        # create sampling table for edge
        numEdges = self.graph.number_of_edges()
        total_sum = sum([self.graph[edge[0]][edge[1]].get('weight', 1.0)
                         for edge in self.graph.edges()])
        norm_prob = [self.graph[edge[0]][edge[1]].get('weight', 1.0) *
                     numEdges / total_sum for edge in self.graph.edges()]

        self.edge_accept, self.edge_alias = create_alias_table(norm_prob)

classify

大佬您好,请问那个TopKRanker类是干什么的

节点多的话跑不通,吃内存

48万个节点,32G内存跑不同,在_create_A_L上报内存不够。
_create_A_L种构造矩阵的方式是否可优化,稀疏矩阵这样存储太浪费。

关于fastdtw的问题

Traceback (most recent call last):

File "/Users/yangfengyu/Desktop/GraphEmbedding-master/examples/line_wiki.py", line 4, in
from ge.classify import read_node_label, Classifier
File "/Users/yangfengyu/Desktop/GraphEmbedding-master/ge/init.py", line 1, in
from .models import *
File "/Users/yangfengyu/Desktop/GraphEmbedding-master/ge/models/init.py", line 5, in
from .struc2vec import Struc2Vec
File "/Users/yangfengyu/Desktop/GraphEmbedding-master/ge/models/struc2vec.py", line 28, in
from fastdtw import fastdtw
ModuleNotFoundError: No module named 'fastdtw'

我在本地进行了实验,但是一直报错如上。
请问fastdtw模块是项目内的还是第三方的库,我在项目里全局搜索没有找到。
期待回复!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.