benedekrozemberczki / gemsec Goto Github PK

View Code? Open in Web Editor NEW

252.0 15.0 50.0 10.65 MB

The TensorFlow reference implementation of 'GEMSEC: Graph Embedding with Self Clustering' (ASONAM 2019).

License: GNU General Public License v3.0

Python 100.00%

clustering m-nmf deepwalk node2vec word2vec tensorflow gemsec facebook deezer community-detection

gemsec's Issues

Can't get converged result

I am using the following parameters for a huge network of 240k+ nodes:

--dimensions 3 \
--num-of-walks 20 \
--random-walk-length 160 \
--cluster-number 10

I can't seem to get the iteration to converge as judging by the diverging loss:

Epoch 1. initiated.


+-------+---+
| Epoch | 1 |
+=======+===+
+-------+---+
+------+-------+
| Loss | 3.065 |
+======+=======+
+------+-------+
+------------+-------+
| Modularity | 0.004 |
+============+=======+
+------------+-------+

Epoch 2. initiated.


+-------+---+
| Epoch | 2 |
+=======+===+
+-------+---+
+------+-------+
| Loss | 3.399 |
+======+=======+
+------+-------+
+------------+-------+
| Modularity | 0.012 |
+============+=======+
+------------+-------+

Epoch 3. initiated.


+-------+---+
| Epoch | 3 |
+=======+===+
+-------+---+
+------+-------+
| Loss | 3.613 |
+======+=======+
+------+-------+
+------------+-------+
| Modularity | 0.015 |
+============+=======+
+------------+-------+

Epoch 4. initiated.


+-------+---+
| Epoch | 4 |
+=======+===+
+-------+---+
+------+-------+
| Loss | 3.818 |
+======+=======+
+------+-------+
+------------+-------+
| Modularity | 0.018 |
+============+=======+
+------------+-------+

Epoch 5. initiated.


+-------+---+
| Epoch | 5 |
+=======+===+
+-------+---+
+------+-------+
| Loss | 4.004 |
+======+=======+
+------+-------+
+------------+-------+
| Modularity | 0.018 |
+============+=======+
+------------+-------+

Epoch 6. initiated.


+-------+---+
| Epoch | 6 |
+=======+===+
+-------+---+
+------+-------+
| Loss | 4.172 |
+======+=======+
+------+-------+
+------------+-------+
| Modularity | 0.025 |
+============+=======+
+------------+-------+

Epoch 7. initiated.


+-------+---+
| Epoch | 7 |
+=======+===+
+-------+---+
+------+------+
| Loss | 4.31 |
+======+======+
+------+------+
+------------+-------+
| Modularity | 0.024 |
+============+=======+
+------------+-------+

Epoch 8. initiated.


+-------+---+
| Epoch | 8 |
+=======+===+
+-------+---+
+------+-------+
| Loss | 4.444 |
+======+=======+
+------+-------+
+------------+-------+
| Modularity | 0.026 |
+============+=======+
+------------+-------+

Epoch 9. initiated.


+-------+---+
| Epoch | 9 |
+=======+===+
+-------+---+
+------+------+
| Loss | 4.56 |
+======+======+
+------+------+
+------------+-------+
| Modularity | 0.025 |
+============+=======+
+------------+-------+

Epoch 10. initiated.


+-------+----+
| Epoch | 10 |
+=======+====+
+-------+----+
+------+-------+
| Loss | 4.663 |
+======+=======+
+------+-------+
+------------+-------+
| Modularity | 0.024 |
+============+=======+
+------------+-------+

Epoch 11. initiated.


+-------+----+
| Epoch | 11 |
+=======+====+
+-------+----+
+------+-------+
| Loss | 4.766 |
+======+=======+
+------+-------+
+------------+-------+
| Modularity | 0.026 |
+============+=======+
+------------+-------+

Epoch 12. initiated.


+-------+----+
| Epoch | 12 |
+=======+====+
+-------+----+
+------+-------+
| Loss | 4.855 |
+======+=======+
+------+-------+
+------------+-------+
| Modularity | 0.028 |
+============+=======+
+------------+-------+

Epoch 13. initiated.


+-------+----+
| Epoch | 13 |
+=======+====+
+-------+----+
+------+-------+
| Loss | 4.925 |
+======+=======+
+------+-------+
+------------+-------+
| Modularity | 0.029 |
+============+=======+
+------------+-------+

Epoch 14. initiated.


+-------+----+
| Epoch | 14 |
+=======+====+
+-------+----+
+------+-------+
| Loss | 4.992 |
+======+=======+
+------+-------+
+------------+-------+
| Modularity | 0.032 |
+============+=======+
+------------+-------+

Epoch 15. initiated.


+-------+----+
| Epoch | 15 |
+=======+====+
+-------+----+
+------+-------+
| Loss | 5.043 |
+======+=======+
+------+-------+
+------------+-------+
| Modularity | 0.035 |
+============+=======+
+------------+-------+

Epoch 16. initiated.


+-------+----+
| Epoch | 16 |
+=======+====+
+-------+----+
+------+-------+
| Loss | 5.082 |
+======+=======+
+------+-------+
+------------+-------+
| Modularity | 0.035 |
+============+=======+
+------------+-------+

Epoch 17. initiated.


+-------+----+
| Epoch | 17 |
+=======+====+
+-------+----+
+------+-------+
| Loss | 5.114 |
+======+=======+
+------+-------+
+------------+------+
| Modularity | 0.04 |
+============+======+
+------------+------+

Epoch 18. initiated.


+-------+----+
| Epoch | 18 |
+=======+====+
+-------+----+
+------+-------+
| Loss | 5.138 |
+======+=======+
+------+-------+
+------------+-------+
| Modularity | 0.051 |
+============+=======+
+------------+-------+

Epoch 19. initiated.


+-------+----+
| Epoch | 19 |
+=======+====+
+-------+----+
+------+-------+
| Loss | 5.155 |
+======+=======+
+------+-------+
+------------+-------+
| Modularity | 0.059 |
+============+=======+
+------------+-------+

Epoch 20. initiated.


+-------+----+
| Epoch | 20 |
+=======+====+
+-------+----+
+------+-------+
| Loss | 5.161 |
+======+=======+
+------+-------+
+------------+-------+
| Modularity | 0.073 |
+============+=======+
+------------+-------+

and I get this weird embedding with a big blob of nodes in the middle and a few nodes surrounding the middle blob:

How should I tune the parameters in this case?

License clarification

What is the intended license for this software?

TypeError: 'NodeView' object does not support item assignment

mldl@ub1604:~/ub16_prj/GEMSEC$ python src/embedding_clustering.py
Model initialization started.
Traceback (most recent call last):
File "src/embedding_clustering.py", line 22, in
create_and_run_model(args)
File "src/embedding_clustering.py", line 11, in create_and_run_model
model = GEMSECWithRegularization(args, graph)
File "/home/mldl/ub16_prj/GEMSEC/src/model.py", line 32, in init
self.degrees, self.walks = self.walker.do_walks()
File "/home/mldl/ub16_prj/GEMSEC/src/calculation_helper.py", line 155, in do_walks
random.shuffle(self.nodes)
File "/usr/lib/python2.7/random.py", line 291, in shuffle
x[i], x[j] = x[j], x[i]
TypeError: 'NodeView' object does not support item assignment

Some package requirements cannot be met.

Hello,
I tried to install tensorflow-gpu package version 1.12.0 but there are no versions available below 2.2.0 using pip.
Is 1.12.0 a must or does newer versions of the package work?
What is the current version of tensorflow-gpu used in the project?

How do I konw the relationship of node name and node ID?

Very interesting work.
For my understanding, the graph is represented by edges like (node_id1, node_id2). However, I wonder how do we know the entity name of the node_id1 and node_id2? I didn't find a file that describe the relation of entity name and entity id.
Thanks!

How to config Project

I have problem when i run the project.
According to readme.md, I config project at python 3.5, but pip install pandas fail. It has error: Module NotFound "Pandas"
Please let me know the environment to run this project.
We are the student of University of Information Technology - Ho Chi Minh National University. We are researching your paper to do homework of final project.
I hope you reply soon. Thank you for your help.

Reorganize repository and add setup.py

I'm keen to try out your package, and I'm really interested in reproducibility. Would you mind a pull request that reorganizes the src directory to follow the standard Python package layout (just having a subfolder called src/gemsec and adding a setup.py so this code can be pip installed?

FileNotFoundError: [Errno 2] No such file or directory: './output/logs/politician.json'

Excuse me ,Do you know why does this measure appear after I run the code, even after I created a politician.json file?

tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[289] = 236440 is not in [0, 228922)

Hi, I have a huge network of 240k+ nodes and I am using the following commands to try embedding it:

python3 ../GEMSEC/src/embedding_clustering.py --input in.csv --embeddi
ng-output out.csv --dimensions 3

and I got the following error:

100%|██████████| 228922/228922 [02:20<00:00, 1626.28it/s]
100%|██████████| 228922/228922 [02:06<00:00, 1803.60it/s]
100%|██████████| 228922/228922 [02:20<00:00, 1628.21it/s]
100%|██████████| 228922/228922 [02:33<00:00, 1493.90it/s]
100%|██████████| 228922/228922 [02:28<00:00, 1544.23it/s]

100%|██████████| 10921592/10921592 [06:13<00:00, 29266.91it/s]
WARNING:tensorflow:From /home/aznb/GEMSEC/src/model.py:118: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead
.

2019-10-13 20:08:10.947046: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX
512F FMA
2019-10-13 20:08:10.981722: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3199290000 Hz
2019-10-13 20:08:10.984370: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x4b6c170 executing computations on platform Host. Devices:
2019-10-13 20:08:10.984438: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
2019-10-13 20:08:11.116679: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_x
la_cpu_global_jit was not set.  If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU.  To confirm that XLA is active, pass --vmodule=
xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
  0%|          | 1/228922 [00:00<47:01:06,  1.35it/s]
Model initialization started.
  
Random walk series 1. initiated.
  
  
Random walk series 2. initiated.
  
  
Random walk series 3. initiated.
  
  
Random walk series 4. initiated.
  
  
Random walk series 5. initiated.

Weight calculation started.
  
  
Model Initialized.

Epoch 1. initiated.

Traceback (most recent call last):
  File "/home/aznb/.linuxbrew/Cellar/python/3.7.4_1/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call
    return fn(*args)
  File "/home/aznb/.linuxbrew/Cellar/python/3.7.4_1/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/aznb/.linuxbrew/Cellar/python/3.7.4_1/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[289] = 236440 is not in [0, 228922)
         [[{{node sampled_softmax_loss/embedding_lookup_1}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "../GEMSEC/src/embedding_clustering.py", line 22, in <module>
    create_and_run_model(args)
  File "../GEMSEC/src/embedding_clustering.py", line 18, in create_and_run_model
    model.train()
  File "/home/aznb/GEMSEC/src/model.py", line 134, in train
    _, loss = session.run([self.train_op , self.loss], feed_dict=feed_dict)
  File "/home/aznb/.linuxbrew/Cellar/python/3.7.4_1/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 950, in run
    run_metadata_ptr)
  File "/home/aznb/.linuxbrew/Cellar/python/3.7.4_1/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1173, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/aznb/.linuxbrew/Cellar/python/3.7.4_1/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1350, in _do_run
    run_metadata)
  File "/home/aznb/.linuxbrew/Cellar/python/3.7.4_1/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1370, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[289] = 236440 is not in [0, 228922)
         [[node sampled_softmax_loss/embedding_lookup_1 (defined at /home/aznb/GEMSEC/src/layers.py:55) ]]

Errors may have originated from an input operation.
Input Source operations connected to node sampled_softmax_loss/embedding_lookup_1:
 Variable_2/read (defined at /home/aznb/GEMSEC/src/layers.py:28)

Original stack trace for 'sampled_softmax_loss/embedding_lookup_1':
  File "../GEMSEC/src/embedding_clustering.py", line 22, in <module>
    create_and_run_model(args)
  File "../GEMSEC/src/embedding_clustering.py", line 11, in create_and_run_model
    model = GEMSECWithRegularization(args, graph)
  File "/home/aznb/GEMSEC/src/model.py", line 37, in __init__
    self.build()
  File "/home/aznb/GEMSEC/src/model.py", line 73, in build
    self.loss = self.walker_layer()+self.gamma*self.cluster_layer(self.walker_layer)+self.regularizer_layer(self.walker_layer)
  File "/home/aznb/GEMSEC/src/layers.py", line 55, in __call__
    sampled_values = self.sampler)
  File "/home/aznb/.linuxbrew/Cellar/python/3.7.4_1/lib/python3.7/site-packages/tensorflow/python/ops/nn_impl.py", line 2024, in sampled_softmax_loss
    seed=seed)
  File "/home/aznb/.linuxbrew/Cellar/python/3.7.4_1/lib/python3.7/site-packages/tensorflow/python/ops/nn_impl.py", line 1557, in _compute_sampled_logits
    biases, all_ids, partition_strategy=partition_strategy)
  File "/home/aznb/.linuxbrew/Cellar/python/3.7.4_1/lib/python3.7/site-packages/tensorflow/python/ops/embedding_ops.py", line 315, in embedding_lookup
    transform_fn=None)
  File "/home/aznb/.linuxbrew/Cellar/python/3.7.4_1/lib/python3.7/site-packages/tensorflow/python/ops/embedding_ops.py", line 133, in _embedding_lookup_and_transform
    array_ops.gather(params[0], ids, name=name), ids, max_norm)
  File "/home/aznb/.linuxbrew/Cellar/python/3.7.4_1/lib/python3.7/site-packages/tensorflow/python/util/dispatch.py", line 180, in wrapper
    return target(*args, **kwargs)
  File "/home/aznb/.linuxbrew/Cellar/python/3.7.4_1/lib/python3.7/site-packages/tensorflow/python/ops/array_ops.py", line 3475, in gather
    return gen_array_ops.gather_v2(params, indices, axis, name=name)
  File "/home/aznb/.linuxbrew/Cellar/python/3.7.4_1/lib/python3.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 4097, in gather_v2
    batch_dims=batch_dims, name=name)
  File "/home/aznb/.linuxbrew/Cellar/python/3.7.4_1/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "/home/aznb/.linuxbrew/Cellar/python/3.7.4_1/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/home/aznb/.linuxbrew/Cellar/python/3.7.4_1/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3616, in create_op
    op_def=op_def)
  File "/home/aznb/.linuxbrew/Cellar/python/3.7.4_1/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 2005, in __init__
    self._traceback = tf_stack.extract_stack()

Any clue about what happened?

about node id of dataset

hello, thanks for your great work. I have one question about datasets of your several related code implementations. you said Nodes should be indexed starting with 0. Does this mean that the node code in the dataset is an index after the original node is encoded by its serial number?

error on model DeepWalk

python src/embedding_clustering.py --model DeepWalk
.....

be removed in a future version.
Instructions for updating:
Create a tf.sparse.SparseTensor and use tf.sparse.to_dense instead.
2018-12-30 14:51:33.065695: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-30 14:51:33.149835: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-12-30 14:51:33.150255: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6575
pciBusID: 0000:01:00.0
totalMemory: 10.92GiB freeMemory: 10.38GiB
2018-12-30 14:51:33.150270: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2018-12-30 14:51:33.339045: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-12-30 14:51:33.339081: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2018-12-30 14:51:33.339089: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2018-12-30 14:51:33.339301: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10039 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
Model Initialized.

Epoch 1. initiated.

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5908/5908 [00:16<00:00, 361.82it/s]

Traceback (most recent call last):
File "src/embedding_clustering.py", line 22, in
create_and_run_model(args)
File "src/embedding_clustering.py", line 18, in create_and_run_model
model.train()
File "/home/ub16c9/ub16_prj/GEMSEC/src/model.py", line 150, in train
self.modularity_score, assignments = classical_modularity_calculator(self.graph, self.final_embeddings, self.args)
File "/home/ub16c9/ub16_prj/GEMSEC/src/calculation_helper.py", line 125, in classical_modularity_calculator
modularity = community.modularity(assignments,graph)
File "/home/ub16c9/ub16_prj/GEMSEC/.venv/lib/python3.5/site-packages/community/community_louvain.py", line 119, in modularity
com = partition[node]
KeyError: 0
(.venv) ub16c9@ub16c9-gpu:~/ub16_prj/GEMSEC$

how to solve this problem?

when I run the example script in the readme, it raise the error

Model initialization started.
Traceback (most recent call last):
  File "src/embedding_clustering.py", line 22, in <module>
    create_and_run_model(args)
  File "src/embedding_clustering.py", line 11, in create_and_run_model
    model = GEMSECWithRegularization(args, graph)
  File "/data1/huangzp/GEMSEC/src/model.py", line 32, in __init__
    self.degrees, self.walks = self.walker.do_walks()
  File "/data1/huangzp/GEMSEC/src/calculation_helper.py", line 167, in do_walks
    random.shuffle(self.nodes)
  File "/opt/anaconda2/envs/python3/lib/python3.6/random.py", line 275, in shuffle
    x[i], x[j] = x[j], x[i]
TypeError: 'NodeView' object does not support item assignment

Excuse me，do you have pytorch version in GEMSEC ?

I tried to change from Tensorflow to pytorch but failed.T_T
Thank you for your kind consideration of this request.

GEMSEC in Torch

This is not a raise for an error but just a question if there is a Torch implementation of GEMSEC. If not, I would like to work on it and push a pull request to Pytorch Geometric to add GEMSEC as a community detection feature.

Hope to hear from you soon.

Best

benedekrozemberczki / gemsec Goto Github PK

gemsec's Issues

Can't get converged result

License clarification

TypeError: 'NodeView' object does not support item assignment

Some package requirements cannot be met.

How do I konw the relationship of node name and node ID?

How to config Project

Reorganize repository and add setup.py

FileNotFoundError: [Errno 2] No such file or directory: './output/logs/politician.json'

tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[289] = 236440 is not in [0, 228922)

about node id of dataset

error on model DeepWalk

how to solve this problem?

Excuse me，do you have pytorch version in GEMSEC ?

GEMSEC in Torch

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent