twjiang / graphsage-pytorch Goto Github PK

View Code? Open in Web Editor NEW

604.0 10.0 148.0 18.53 MB

A PyTorch implementation of GraphSAGE. This package contains a PyTorch implementation of GraphSAGE.

Python 99.63% Shell 0.37%

graphsage-pytorch's Introduction

A PyTorch implementation of GraphSAGE

This package contains a PyTorch implementation of GraphSAGE.

Authors of this code package:

Tianwen Jiang ([email protected]), Tong Zhao ([email protected]), Daheng Wang ([email protected]).

Environment settings

python==3.6.8
pytorch==1.0.0

Basic Usage

Main Parameters:

--dataSet     The input graph dataset. (default: cora)
--agg_func    The aggregate function. (default: Mean aggregater)
--epochs      Number of epochs. (default: 50)
--b_sz        Batch size. (default: 20)
--seed        Random seed. (default: 824)
--unsup_loss  The loss function for unsupervised learning. ('margin' or 'normal', default: normal)
--config      Config file. (default: ./src/experiments.conf)
--cuda        Use GPU if declared.

Learning Method

The user can specify a learning method by --learn_method, 'sup' is for supervised learning, 'unsup' is for unsupervised learning, and 'plus_unsup' is for jointly learning the loss of supervised and unsupervised method.

Example Usage

To run the unsupervised model on Cuda:

python -m src.main --epochs 50 --cuda --learn_method unsup

graphsage-pytorch's People

Contributors

Stargazers

Watchers

Forkers

grossular qi-li-0410 supermousse ttklm20 angusmonroe yang-yifan coffeeclh shengyupei shutianxu yzh5239 psyche11 1byxero ammieqi lolash amitroy7781 spinaotey littlefish12 decoder746 sorrowyn zhengliu212 phychaos aurora-yuan eecrazy ifkid coco11563 mldl zc-work gokunwu yiwu1996 t0n4r lyudongliang rwebs scd158 stevenbliu cigali liuyunwu coolmaksat rzj-1997 ssfc doriswzg kezhende chun-hua alkjoj rajesh67 curiouskomodo cwhyee sxxtyz yang-kunhao wenzhihao666 zhangjiekui wangkunn yunfeiqi noctillion huizhang2017 gak1729 w-void zky362550824 tlntin wangyuxiang8 luishengjie greenary-john littlewangyu yiqingzhangee sbseo giorgiopeng eugeneyu97 jamal-dev hongjinwu temmanuel-l samueltober stat-eklee gasdaf lotbear jason8kang miaomiaoxiaobai blacker521 linhduongtuan warmdog jisngprk mracattle ai-research-group-publication graphalg zhaolulul atuqiao tiger-tiger compor maximli usmanmaqbool qhshi worldseer toufunao javassun someintuition yueyedeai zhenxingsea ytchx1999 whitephosphorus4 zeusmail qianyxxx superzhen625

graphsage-pytorch's Issues

咨询下大佬个问题，以监督学习的方式训练网络的时候，训练样本貌似也放进了同batch的不是附近的节点，看监督学习训练的损失是直接用激活函数得到logit后得到损失，这样是不是不太那啥？是不是应该只用正样本吧（监督方式训练的时候）

如题，看您是哈工大我就没写英文

How to adapt for weighted graph

@twjiang I read the code and think that it is meant for a unweighted graph. Could you tell how to change the code for a weighted graph

some questions about the models.py

Hi! I really like your work and try to utilize it on my own data.
But I met some question, after I turned my data into the similar format with Cora data in your original code , it showed some error:

DEVICE: cpu
GraphSage with Supervised Learning
----------------------EPOCH 0----------------------
Step [1/14], Loss: 2.0212, Dealed Nodes [77/274]
Step [2/14], Loss: 1.0991, Dealed Nodes [132/274]
Step [3/14], Loss: 0.6850, Dealed Nodes [175/274]
Step [4/14], Loss: 0.6396, Dealed Nodes [203/274]
Step [5/14], Loss: 0.5019, Dealed Nodes [217/274]
Step [6/14], Loss: 0.4753, Dealed Nodes [238/274]
Traceback (most recent call last):

File "", line 1, in
runfile('/Users/jishilun/Desktop/graphSAGE-portable/src/main2.py', wdir='/Users/jishilun/Desktop/graphSAGE-portable/src')

File "/anaconda3/lib/python3.7/site-packages/spyder_kernels/customize/spydercustomize.py", line 704, in runfile
execfile(filename, namespace)

File "/anaconda3/lib/python3.7/site-packages/spyder_kernels/customize/spydercustomize.py", line 108, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)

File "/Users/jishilun/Desktop/graphSAGE-portable/src/main2.py", line 85, in
graphsage.classification = apply_model(data,ds,graphsage,classification,unsupervised_loss,args.b_sz,args.unsup_loss,device,args.learn_method)

File "/Users/jishilun/Desktop/graphSAGE-portable/src/utils.py", line 127, in apply_model
nodes_batch = np.asarray(list(unsupervised_loss.extend_nodes(nodes_batch, num_neg=num_neg)))

File "/Users/jishilun/Desktop/graphSAGE-portable/src/models.py", line 147, in extend_nodes
assert set(self.target_nodes) <= set(self.unique_nodes_batch)

AssertionError

Here, my data is not as big as Cora data, and there is less than 800 nodes in my network.
And when I split a smaller test_dataset and valid_data by change the code:
def _split_data(self, num_nodes, test_split = 3, val_split = 6):
--->
def _split_data(self, num_nodes, test_split = 8, val_split = 10):
it can be trained more time, but after some epochs, it showed the same error again

The implementation may be incorrect because the two embedding layers are very sparse.

Recently I am using your pytorch-graphSage, and I am running "python -m src.main --epochs 50 --learn_method unsup" with your default setting in the default dataset. I also removed line73-76 of main.py and run with the above command.

In both settings, I found that cur_hidden_embs in line 267 of models.py is very sparse, so that the resulted embed feature is also very sparse. Instead, cur_hidden_embs should be dense. I was wondering if something is wrong.

where is the embedding metrix?

Should not put the whole feature 2708*1433 into the forward of model GraphSage

File not found error.

Hi!
I just cloned the repository and run it. After one whole epoch, I received an error, which says that:
FileNotFoundError: [Errno 2] No such file or directory: 'models/model_best_debug_ep0_0.8758.torch'
Here's the command to run it:
python -m src.main --epochs 50 --cuda --learn_method unsup
And here is the copy of error info:
Step [68/68], Loss: 0.0432, Dealed Nodes [1355/1355]
Validation F1: 0.8913525498891351
Test F1: 0.8758314855875832
Traceback (most recent call last):
File "/home/huangmd/anaconda3/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/huangmd/anaconda3/lib/python3.7/runpy.py", line 85, in run_code
exec(code, run_globals)
File "/home/huangmd/graphSAGE-pytorch/src/main.py", line 76, in
args.max_vali_f1 = evaluate(dataCenter, ds, graphSage, classification, device, args.max_vali_f1, args.name, epoch)
File "/home/huangmd/graphSAGE-pytorch/src/utils.py", line 52, in evaluate
torch.save(models, 'models/model_best{}ep{}{:.4f}.torch'.format(name, cur_epoch, test_f1))
File "/home/huangmd/anaconda3/lib/python3.7/site-packages/torch/serialization.py", line 327, in save
with _open_file_like(f, 'wb') as opened_file:
File "/home/huangmd/anaconda3/lib/python3.7/site-packages/torch/serialization.py", line 212, in _open_file_like
return _open_file(name_or_buffer, mode)
File "/home/huangmd/anaconda3/lib/python3.7/site-packages/torch/serialization.py", line 193, in init
super(_open_file, self).init(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'models/model_best_debug_ep0_0.8758.torch'.

I run this repo under torch 1.4.0; python 3.7; ubuntu 18.04 with 2080ti.
Hoping for a solution to this issue, thx!

how can we use it on mr or R8 datasets ?

can you add mr or r8 dataset or any dataset that is a regular .txt format

no model_save function?

I am trying to run your example commend:
python -m src.main --epochs 50 --cuda --learn_method unsup
However, there is an error that:

...
Step [66/68], Loss: 7.4922, Dealed Nodes [1355/1355]
Step [67/68], Loss: 7.5110, Dealed Nodes [1355/1355]
Step [68/68], Loss: 7.4139, Dealed Nodes [1355/1355]
Training Classification ...
Loading embeddings from trained GraphSAGE model.
Embeddings loaded.
Validation F1: 0.5720620842572062
Test F1: 0.5753880266075388
Traceback (most recent call last):
File "/home/xxx/anaconda2/envs/xxx/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/xxx/anaconda2/envs/xxx/lib/python3.7/runpy.py", line 85, in run_code
exec(code, run_globals)
File "/home/xxx/xxx/xxx/graphSAGE-pytorch/src/main.py", line 74, in
classification, args.max_vali_f1 = train_classification(dataCenter, graphSage, classification, ds, device, args.max_vali_f1, args.name)
File "/home/xxx/xxx/xxx/graphSAGE-pytorch/src/utils.py", line 110, in train_classification
max_vali_f1 = evaluate(dataCenter, ds, graphSage, classification, device, max_vali_f1, name, epoch)
File "/home/xxx/xxx/xxx/graphSAGE-pytorch/src/utils.py", line 52, in evaluate
torch.save(models, 'models/model_best{}ep{}{:.4f}.torch'.format(name, cur_epoch, test_f1))
File "/home/xxx/anaconda2/envs/xxx/lib/python3.7/site-packages/torch/serialization.py", line 224, in save
return _with_file_like(f, "wb", lambda f: _save(obj, f, pickle_module, pickle_protocol))
File "/home/xxx/anaconda2/envs/xxx/lib/python3.7/site-packages/torch/serialization.py", line 147, in _with_file_like
f = open(f, mode)
FileNotFoundError: [Errno 2] No such file or directory: 'models/model_best_debug_ep0_0.5754.torch'

It seems that there is no save model function to models directory. Am I right?

请问新节点如何生成和获取嵌入向量？

对于新节点，在知道它与其他节点关系的情况下，如何生成和获取嵌入向量？这块有相关的代码介绍吗

Not convergenced.

unsup question

您好，跑无监督graphsage时，遇到了这样一个问题，但是其他sup的时候都没有问题，这是什么情况呢？

Edges not directed

Hi noticed during the Dataloaders stage , the edges are deemed to be bi-directional.

How do i incorporate one-way direction only

Low memory utilization

N/A 29C P0 30W / 250W | 1525MiB / 16280MiB | 0% Default

RuntimeError: Parent directory models does not exist.

Traceback (most recent call last):
File "D:\lunwen2\githubprogram--gnn\graphsage\graphSAGE-pytorch-master\main.py", line 76, in
args.max_vali_f1 = evaluate(dataCenter, ds, graphSage, classification, device, args.max_vali_f1, args.name, epoch)
File "D:\lunwen2\githubprogram--gnn\graphsage\graphSAGE-pytorch-master\src\utils.py", line 52, in evaluate
torch.save(models, 'models/model_best_{}ep{}{:.4f}.torch'.format(name, cur_epoch, test_f1))
File "D:\lunwen2\githubprogram--gnn\graphsage\graphSAGE-pytorch-master\venv\lib\site-packages\torch\serialization.py", line 440, in save
with _open_zipfile_writer(f) as opened_zipfile:
File "D:\lunwen2\githubprogram--gnn\graphsage\graphSAGE-pytorch-master\venv\lib\site-packages\torch\serialization.py", line 315, in _open_zipfile_writer
return container(name_or_buffer)
File "D:\lunwen2\githubprogram--gnn\graphsage\graphSAGE-pytorch-master\venv\lib\site-packages\torch\serialization.py", line 288, in init
super().init(torch._C.PyTorchFileWriter(str(name)))
RuntimeError: Parent directory models does not exist.

deleted

Not convergence.

I ran the code, but didn't get convergence. The loss of graphsage keep 7.3-7.5 not decreased.

Could not find a version that satisfies the requirement torch==1.0.1.post2

env: pyhton 3.6.8
PC: window 10
IS torch==1.0.1.post2 not available on Windows？
很抱歉，我的找不到torch==1.0.1.post2，尝试安装1.0.0、1.0.1、1.1.0都会报错
错误信息如下：
DEVICE: cpu
Traceback (most recent call last):
File "C:/Users/14656/Desktop/Program/GNN/Github/graphSAGE-pytorch-master/src/main.py", line 46, in
config = pyhocon.ConfigFactory.parse_file(args.config)
File "C:\Users\14656\miniconda3\envs\graphSAGE-pytorch-master\lib\site-packages\pyhocon\config_parser.py", line 102, in parse_file
raise e
File "C:\Users\14656\miniconda3\envs\graphSAGE-pytorch-master\lib\site-packages\pyhocon\config_parser.py", line 97, in parse_file
with codecs.open(filename, 'r', encoding=encoding) as fd:
File "C:\Users\14656\miniconda3\envs\graphSAGE-pytorch-master\lib\codecs.py", line 897, in open
file = builtins.open(filename, mode, buffering)
FileNotFoundError: [Errno 2] No such file or directory: './src/experiments.conf'

assert len(embs) == len(nodes) AssertionError

assert len(embs) == len(nodes)
AssertionError

How to perform the inductive learning?

This implementation seems to be transductive learning, which is reflected in Line-279, models.py. Self.adj_lists includes all nodes and edges of the graph. How to conduct the inductive learning? Thank you!

RuntimeError: stack expects a non-empty TensorList

cmd: python -m src.main --epochs 50 --learn_method unsup

information:

DEVICE: cpu
GraphSage with Net Unsupervised Learning
----------------------EPOCH 0-----------------------
Step [1/68], Loss: 10.8708, Dealed Nodes [1024/1355]
Traceback (most recent call last):
File "/data/webdev/xiaokaichen/anaconda3/envs/py37/lib/python3.7/runpy.py", line 193, in run_module_as_main
"main", mod_spec)
File "/data/webdev/xiaokaichen/anaconda3/envs/py37/lib/python3.7/runpy.py", line 85, in run_code
exec(code, run_globals)
File "/data/webdev/xiaokaichen/code/graphSAGE-pytorch/src/main.py", line 72, in
graphSage, classification = apply_model(dataCenter, ds, graphSage, classification, unsupervised_loss, args.b_sz, args.unsup_loss, device, args.learn_method)
File "/data/webdev/xiaokaichen/code/graphSAGE-pytorch/src/utils.py", line 186, in apply_model
nn.utils.clip_grad_norm(model.parameters(), 5)
File "/data/webdev/xiaokaichen/anaconda3/envs/py37/lib/python3.7/site-packages/torch/nn/utils/clip_grad.py", line 30, in clip_grad_norm
total_norm = torch.norm(torch.stack([torch.norm(p.grad.detach(), norm_type) for p in parameters]), norm_type)
RuntimeError: stack expects a non-empty TensorList

typo in line 73 of main.py?

Do you want to check =='unsup' to allow classification training as it is now?

Or should it rather be !='unsup'?

It does not make sense to me that when we specify 'unsup' that classification is still used for training. But I may be missing something?

Thanks