thudm / cogdl Goto Github PK

CogDL: A Comprehensive Library for Graph Deep Learning (WWW 2023)

License: MIT License

Python 93.24% C++ 2.46% Cuda 4.12% Shell 0.05% Makefile 0.02% C 0.11%

graph-neural-networks pytorch graph-embedding node-classification graph-classification link-prediction leaderboard gnn-model

cogdl's Introduction

CogDL is a graph deep learning toolkit that allows researchers and developers to easily train and compare baseline or customized models for node classification, graph classification, and other important tasks in the graph domain.

We summarize the contributions of CogDL as follows:

Efficiency: CogDL utilizes well-optimized operators to speed up training and save GPU memory of GNN models.
Ease of Use: CogDL provides easy-to-use APIs for running experiments with the given models and datasets using hyper-parameter search.
Extensibility: The design of CogDL makes it easy to apply GNN models to new scenarios based on our framework.

❗ News

The CogDL paper was accepted by WWW 2023. Find us at WWW 2023! We also release the new v0.6 release which adds more examples of graph self-supervised learning, including GraphMAE, GraphMAE2, and BGRL.
A free GNN course provided by CogDL Team is present at this link. We also provide a discussion forum for Chinese users.
The new v0.5.3 release supports mixed-precision training by setting \textit{fp16=True} and provides a basic example written by Jittor. It also updates the tutorial in the document, fixes downloading links of some datasets, and fixes potential bugs of operators.

News History

The new v0.5.2 release adds a GNN example for ogbn-products and updates geom datasets. It also fixes some potential bugs including setting devices, using cpu for inference, etc.
The new v0.5.1 release adds fast operators including SpMM (cpu version) and scatter_max (cuda version). It also adds lots of datasets for node classification which can be found in this link. 🎉
The new v0.5.0 release designs and implements a unified training loop for GNN. It introduces DataWrapper to help prepare the training/validation/test data and ModelWrapper to define the training/validation/test steps. 🎉
The new v0.4.1 release adds the implementation of Deep GNNs and the recommendation task. It also supports new pipelines for generating embeddings and recommendation. Welcome to join our tutorial on KDD 2021 at 10:30 am - 12:00 am, Aug. 14th (Singapore Time). More details can be found in https://kdd2021graph.github.io/. 🎉
The new v0.4.0 release refactors the data storage (from Data to Graph) and provides more fast operators to speed up GNN training. It also includes many self-supervised learning methods on graphs. BTW, we are glad to announce that we will give a tutorial on KDD 2021 in August. Please see this link for more details. 🎉
CogDL supports GNN models with Mixture of Experts (MoE). You can install FastMoE and try MoE GCN in CogDL now!
The new v0.3.0 release provides a fast spmm operator to speed up GNN training. We also release the first version of CogDL paper in arXiv. You can join our slack for discussion. 🎉🎉🎉
The new v0.2.0 release includes easy-to-use experiment and pipeline APIs for all experiments and applications. The experiment API supports automl features of searching hyper-parameters. This release also provides OAGBert API for model inference (OAGBert is trained on large-scale academic corpus by our lab). Some features and models are added by the open source community (thanks to all the contributors 🎉).
The new v0.1.2 release includes a pre-training task, many examples, OGB datasets, some knowledge graph embedding methods, and some graph neural network models. The coverage of CogDL is increased to 80%. Some new APIs, such as Trainer and Sampler, are developed and being tested.
The new v0.1.1 release includes the knowledge link prediction task, many state-of-the-art models, and optuna support. We also have a Chinese WeChat post about the CogDL release.

Getting Started

Requirements and Installation

Python version >= 3.7
PyTorch version >= 1.7.1

Please follow the instructions here to install PyTorch (https://github.com/pytorch/pytorch#installation).

When PyTorch has been installed, cogdl can be installed using pip as follows:

pip install cogdl

Install from source via:

pip install git+https://github.com/thudm/cogdl.git

Or clone the repository and install with the following commands:

git clone [email protected]:THUDM/cogdl.git
cd cogdl
pip install -e .

Usage

API Usage

You can run all kinds of experiments through CogDL APIs, especially experiment. You can also use your own datasets and models for experiments. A quickstart example can be found in the quick_start.py. More examples are provided in the examples/.

from cogdl import experiment

# basic usage
experiment(dataset="cora", model="gcn")

# set other hyper-parameters
experiment(dataset="cora", model="gcn", hidden_size=32, epochs=200)

# run over multiple models on different seeds
experiment(dataset="cora", model=["gcn", "gat"], seed=[1, 2])

# automl usage
def search_space(trial):
    return {
        "lr": trial.suggest_categorical("lr", [1e-3, 5e-3, 1e-2]),
        "hidden_size": trial.suggest_categorical("hidden_size", [32, 64, 128]),
        "dropout": trial.suggest_uniform("dropout", 0.5, 0.8),
    }

experiment(dataset="cora", model="gcn", seed=[1, 2], search_space=search_space)

Command-Line Usage

You can also use python scripts/train.py --dataset example_dataset --model example_model to run example_model on example_data.

--dataset, dataset name to run, can be a list of datasets with space like cora citeseer. Supported datasets include 'cora', 'citeseer', 'pumbed', 'ppi', 'wikipedia', 'blogcatalog', 'flickr'. More datasets can be found in the cogdl/datasets.
--model, model name to run, can be a list of models like gcn gat. Supported models include 'gcn', 'gat', 'graphsage', 'deepwalk', 'node2vec', 'hope', 'grarep', 'netmf', 'netsmf', 'prone'. More models can be found in the cogdl/models.

For example, if you want to run GCN and GAT on the Cora dataset, with 5 different seeds:

python scripts/train.py --dataset cora --model gcn gat --seed 0 1 2 3 4

Expected output:

Variant	test_acc	val_acc
('cora', 'gcn')	0.8050±0.0047	0.7940±0.0063
('cora', 'gat')	0.8234±0.0042	0.8088±0.0016

If you have ANY difficulties to get things working in the above steps, feel free to open an issue. You can expect a reply within 24 hours.

❗ FAQ

How to contribute to CogDL?

If you have a well-performed algorithm and are willing to implement it in our toolkit to help more people, you can first open an issue and then create a pull request, detailed information can be found here.

Before committing your modification, please first run pre-commit install to setup the git hook for checking code format and style using black and flake8. Then the pre-commit will run automatically on git commit! Detailed information of pre-commit can be found here.

How to enable fast GNN training?

CogDL provides a fast sparse matrix-matrix multiplication operator called [GE-SpMM](https://arxiv.org/abs/2007.03179) to speed up training of GNN models on the GPU. The feature will be automatically used if it is available. Note that this feature is still in testing and may not work under some versions of CUDA.

How to run parallel experiments with GPUs on several models?

If you want to run parallel experiments on your server with multiple GPUs on multiple models, GCN and GAT, on the Cora dataset:

$ python scripts/train.py --dataset cora --model gcn gat --hidden-size 64 --devices 0 1 --seed 0 1 2 3 4

Expected output:

Variant	Acc
('cora', 'gcn')	0.8236±0.0033
('cora', 'gat')	0.8262±0.0032

How to use models from other libraries?

If you are familiar with other popular graph libraries, you can implement your own model in CogDL using modules from PyTorch Geometric (PyG). For the installation of PyG, you can follow the instructions from PyG (https://github.com/rusty1s/pytorch_geometric/#installation). For the quick-start usage of how to use layers of PyG, you can find some examples in the [examples/pyg](https://github.com/THUDM/cogdl/tree/master/examples/pyg/).

How to make a successful pull request with unit test

To have a successful pull request, you need to have at least (1) your model implementation and (2) a unit test.

You might be confused why your pull request was rejected because of 'Coverage decreased ...' issue even though your model is working fine locally. This is because you have not included a unit test, which essentially runs through the extra lines of code you added. The Travis CI service used by Github conducts all unit tests on the code you committed and checks how many lines of the code have been checked by the unit tests, and if a significant portion of your code has not been checked (insufficient coverage), the pull request is rejected.

So how do you do a unit test?

Let's say you implement a GNN model in a script models/nn/abcgnn.py that does the task of node classification. Then, you need to add a unit test inside the script tests/tasks/test_node_classification.py (or whatever relevant task your model does).
To add the unit test, you simply add a function test_abcgnn_cora() (just follow the format of the other unit tests already in the script), fill it with required arguments and the last line in the function 'assert 0 <= ret["Acc"] <= 1' is the very basic sanity check conducted by the unit test.
After modifying tests/tasks/test_node_classification.py, commit it together with your models/nn/abcgnn.py and your pull request should pass.

CogDL Team

CogDL is developed and maintained by Tsinghua, ZJU, DAMO Academy, and ZHIPU.AI.

The core development team can be reached at [email protected].

Citing CogDL

Please cite our paper if you find our code or results useful for your research:

@inproceedings{cen2023cogdl,
    title={CogDL: A Comprehensive Library for Graph Deep Learning},
    author={Yukuo Cen and Zhenyu Hou and Yan Wang and Qibin Chen and Yizhen Luo and Zhongming Yu and Hengrui Zhang and Xingcheng Yao and Aohan Zeng and Shiguang Guo and Yuxiao Dong and Yang Yang and Peng Zhang and Guohao Dai and Yu Wang and Chang Zhou and Hongxia Yang and Jie Tang},
    booktitle={Proceedings of the ACM Web Conference 2023 (WWW'23)},
    year={2023}
}

cogdl's People

Contributors

Stargazers

Watchers

Forkers

awesome-archive zxlzr flylearning qinjr cenyk1230 changanliu zxhhh97 icyxiaowenyi fereyfang seeker1943 beesitech debrawang samzhaoziran jimmyjunucas xyuan dennisshaw zbyzby11 wellwang simwiki littlebadrobot tianjiansmile qibinc ldw-sh-cn milllllk fishredleaf liuweiping2020 wupuqu kyzhouhzau javastudenttwo shijintong limingdata liuchuang0059 kaiqiao1992 zhangch9 chaosqian neng245547874 nifannn gnn2qsu psy2013github hsinyu7330 shenyi666666 amorsun bznkxs songzhen-neu huaxz1986 jiaoyining zbn123 xs-li simba2017 rain0831 crack521 joee1995 gaimjkp qianrenjian liuxinkai94 itaymanes hamigua2019 greenerz barcawy michael-wzhu liboyuty w1074098501 shawntl detached-whale destinyjlu brucew91 pinkney03 sengxian sorrowyn yaoxingcheng chaoshengt zxin1023 qazcy1983 sxw814457915 drpengsong yunyoonaer silvaco chengzhipiao jennyjrwong tmacmilan hmartelb li-ziang hbhswl yaofeng1998 fannzi irisli17 kwyoke sahandfer xssstory shawnrs-dl jiayilijayee brickser frouioui gxiaodong wxr1998 malin2223 spkgyk tiagomantunes alvinwen428 bluep0int

cogdl's Issues

cogdl on linux tesla K40c

HI, my environment is linux, tesla K40c, pytorch1.4, cuda101, python3.7.
When I run python gcn.py, the error information is
Traceback (most recent call last):
File "gcn.py", line 29, in
task = build_task(args, dataset=dataset, model=model)
File "/home/hsc/Desktop/wyt/cogdl-master/cogdl/tasks/init.py", line 54, in build_task
return TASK_REGISTRY[args.task](args, dataset=dataset, model=model)
File "/home/hsc/Desktop/wyt/cogdl-master/cogdl/tasks/node_classification.py", line 35, in init
args.num_features = dataset.num_features
File "/home/hsc/Desktop/wyt/anaconda3/envs/pytorch_1.4/lib/python3.7/site-packages/torch_geometric/data/dataset.py", line 117, in num_features
return self.num_node_features
File "/home/hsc/Desktop/wyt/anaconda3/envs/pytorch_1.4/lib/python3.7/site-packages/torch_geometric/data/dataset.py", line 112, in num_node_features
return self[0].num_node_features
File "/home/hsc/Desktop/wyt/anaconda3/envs/pytorch_1.4/lib/python3.7/site-packages/torch_geometric/data/dataset.py", line 189, in getitem
data = data if self.transform is None else self.transform(data)
File "/home/hsc/Desktop/wyt/anaconda3/envs/pytorch_1.4/lib/python3.7/site-packages/torch_geometric/transforms/target_indegree.py", line 27, in call
deg = degree(col, data.num_nodes)
File "/home/hsc/Desktop/wyt/anaconda3/envs/pytorch_1.4/lib/python3.7/site-packages/torch_geometric/utils/degree.py", line 20, in degree
out = torch.zeros((num_nodes), dtype=dtype, device=index.device)
RuntimeError: CUDA error: no kernel image is available for execution on the device

K40c has a compute capability of 3.5. Does it not support PyTorch 1.4?
Thanks!

Add GraphSAGE (sample) on Reddit

The current GraphSAGE implementation sample neighbors for every nodes in one batch. This only works for toy datasets (e.g., Cora), not for larger datasets (e.g., Reddit)

random.seed

I want to use the function display_data.py, I get the error:

Traceback (most recent call last):
  File "display_data.py", line 75, in <module>
    random.seed(args.seed)
  File "/Users/wangzhikai/.conda/envs/cogdl/lib/python3.7/random.py", line 126, in seed
    super().seed(a)
TypeError: unhashable type: 'list'

How to run the unsupervised graphsage by using command?

It seems cannot use the command "--task unsupervised_node_classification --dataset cora --model graphsage --seed 0 1 2 3 4" to run the unsupervised graphsage

About the performance

Hi, thanks for the great job!
I simplify this project and remove the torch dependency, and plan to implement it under tf.
Now I have finished the algorithms of emb folder. These are pure Python implementations. But my test results is poor. I find the default parameters in codes differ from the readme. So, could you please clarify the parameters of the results displayed on your website？
Thanks a lot.

Add GATNE

kuogeniubi!

results on shapenet part segmentation using dgcnn

so, is there any result when using dgcnn for pointcloud segmention, hope you add to your leadboard, thanks.

python scripts/train.py --task unsupervised_node_classification --dataset wikipedia --model gcn gat deepwalk node2vec hope netmf netsmf prone

python scripts/train.py --task unsupervised_node_classification --dataset wikipedia --model gcn gat deepwalk node2vec hope netmf netsmf prone
我是运行的上面的命令
出现如下的错误。环境都安装好了 macOS +torch1.4.0 无cuda,在笔记本本地跑的。
`Traceback (most recent call last):
File "scripts/train.py", line 75, in
results = [main(args) for args in variant_args_generator(args, variants)]
File "scripts/train.py", line 75, in
results = [main(args) for args in variant_args_generator(args, variants)]
File "scripts/train.py", line 26, in main
task = build_task(args)
File "/Users/XXX/cogdl-master/cogdl/tasks/init.py", line 48, in build_task
return TASK_REGISTRYargs.task
File "/Users/XXX/cogdl-master/cogdl/tasks/unsupervised_node_classification.py", line 56, in init
self.model = build_model(args)
File "/Users/XXX/cogdl-master/cogdl/models/init.py", line 108, in build_model
return MODEL_REGISTRY[args.model].build_model_from_args(args)
File "/Users/XXX/cogdl-master/cogdl/models/nn/gcn.py", line 71, in build_model_from_args
return cls(args.num_features, args.hidden_size, args.num_classes, args.dropout)
File "/Users/XXX/cogdl-master/cogdl/models/nn/gcn.py", line 76, in init
self.gc1 = GraphConvolution(nfeat, nhid)
File "/Users/zengyujian/cogdl-master/cogdl/models/nn/gcn.py", line 20, in init
self.weight = Parameter(torch.FloatTensor(in_features, out_features))
TypeError: new() received an invalid combination of arguments - got (NoneType, int), but expected one of:

(torch.device device)
(torch.Storage storage)
(Tensor other)
(tuple of ints size, torch.device device)
didn't match because some of the arguments have invalid types: (NoneType, int)
(object data, torch.device device)
didn't match because some of the arguments have invalid types: (NoneType, int)`

Problem with testing graph classification models

Hello,
I have been trying out different graph classification models on the available datasets. Running some models on a number of datasets for this task generates 'RuntimeError: There were no tensor arguments to this function' in the beginning or middle of the training phase. In addition, I haven't been able to observe the accuracy that is provided in the README file from training the given models. So I'm wondering what the problem could be, whether it is a bug or a dependancy problem, since I'm also trying to get the results for my newly added graph classification model. Thank you

when i want to install CogDL via:git clone [email protected]:THUDM/cogdl.git , there are some questions

❓ Questions & Help

Cloning into 'cogdl'...
[email protected]: Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

Refactor Data, Dataset

Hi,

Our current implementations of data preprocessing and data loading are borrowed from pyg. This part needs refactor before release.

Having trouble using youtube dataset to do unsupervised node classification

when use youtube dataset to do unsupervised node classification. It will throw error:

"...cogdl\cogdl\tasks\unsupervised_node_classification.py", line 54, in __init__
    self.num_nodes, self.num_classes = self.data.y.shape
AttributeError: 'NoneType' object has no attribute 'shape'

Possible label leakage problem with the link prediction task

This happens for undirected networks like PPI. The link prediction task class

https://github.com/THUDM/cogdl/blob/4ed7838018400377dae9da30017399f56585208f/cogdl/tasks/link_prediction.py#L116

reads the adjacency matrix directly without removing duplicates, this means the the edge list would have (x, y) and (y, x) at the same time for every edge.

(x, y) and (y, x) are referring to the same edge, producing the same cosine similarity for x and y in evaluation later. However, the training-test splitting process treats them as independent edges. As a result, a great portion of the generated test edges are also in the training set, just with a reversed order of two ends.

I have tried resolving this label leakage issue, and I can only get about 0.8 ROC-AUC on PPI instead of over 0.9 reported in your leader board.

怎么获取模型的acc、recall、F1或者分类报告？

非常感谢你们开发并开源cogdl，请问怎样获得结点分类模型分分类报告而不是只显示acc呢？

十分感谢！！

Computer configuration

What are the requirements for the configuration of the computer and the pyTorch version of the library?

Issue about unsupervised_graph_classification

❓ Questions & Help

Hi,I just installed cogdl and tried to run a demo,but I found is seems that unsupervised_graph_classification model and dataset are missing,for example ,gin and infograph
So,maybe something is wrong with my code?
My code is :
experiment(task="unsupervised_graph_classification", dataset="proteins", model="infograph")

Split package requirements

Not all packages in setup.py of cogdl are actually needed for end-users, and setuptools supports split requirements into install_requires, setup_requires and tests_requires.

It will great to move packages like pytest and spinx out of install_requires, the former should be in tests_requires, and the latter should be removed because doc folder has its own requirements.txt.

"missing 1 required positional argument: num_nodes" when running graphsage model

I am sorry to bother you, could you help me? when i run the command:
python scripts/train.py -dt cora --model graphsage -t node_classification, i get this error:

Traceback (most recent call last):
  File "/home/xxx/anaconda3/envs/gpu/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/home/xxx/anaconda3/envs/gpu/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/home/xxx/cogdl/scripts/parallel_train.py", line 34, in main
    result = task.train()
  File "/home/xxx/cogdl/cogdl/tasks/node_classification.py", line 52, in train
    self._train_step()
  File "/home/xxx/cogdl/cogdl/tasks/node_classification.py", line 80, in _train_step
    self.model(self.data.x, self.data.edge_index)[self.data.train_mask],
  File "/home/xxx/anaconda3/envs/gpu/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/xxx/cogdl/cogdl/models/nn/graphsage.py", line 86, in forward
    x = self.convs[i](x, edge_index_sp)
  File "/home/xxx/anaconda3/envs/gpu/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
TypeError: forward() missing 1 required positional argument: 'num_nodes'

Learn and save embedding files for customized dataset WITHOUT running evaluation tasks.

Thanks for open sourcing this wonderful repo. Is it possible to directly learn and save embedding files for customized dataset WITHOUT running evaluation tasks? Can I do this via command line ? Thanks

Undefined name: all_train used before it is defined

https://github.com/THUDM/cogdl/blob/master/cogdl/tasks/node_classification_sample.py#L106

documents

@neozhangthe1
Hi, I'm very curious about your project. I would like to consult with your the problem what is the difference from the geometrics.
And I found that the documents are incomplete and lace some introduction about how to use. And I expect the update the documents.
thank you!

how to use cogdl in pycharm?

when i want to install CogDL via:git clone [email protected]:THUDM/cogdl.git , there are some question

❓ Questions & Help

Cloning into 'cogdl'...
[email protected]: Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

Internal Error in formatting

Anyone knows what this error is about?

metapath2vec --schema

Hi,
When I choose schema "0-1-0,0-1-2-1-0”, not all nodes are included in walks.
It will throw error:
KeyError: "word '6931' not in vocabulary"
How to solve this problem?Thanks!

AttributeError: 'NoneType' object has no attribute 'dim'

python train.py --task node_classification --dataset wikipedia --model gcn

AttributeError: 'NoneType' object has no attribute 'dim'

ModuleNotFoundError

Traceback (most recent call last):
File "train.py", line 14, in
from cogdl import options
ModuleNotFoundError: No module named 'cogdl'

What's the matter, please

oagbert model Killed

🐛 Bug

I'm using oagbert for sentence embedding.

To Reproduce

I want to obtain the embedding of a list of sentences ( len(corpus)=124) so I use (as the example) the following code:

tokenizer, bert_model = oagbert()
tokens = tokenizer(corpus, return_tensors="pt", padding=True)
batch_embeddings = bert_model(**tokens)
embeddings = embeddings[1]

Expected behavior

embedding should be the a torch.Size([124, 768]) but instead the bert_model(**tokens) says:

Killed

Some Advice

1.Adding docs for preparing data

Like what Euler does: Preparing-Data

2.Support exporting embedding for nodes

3.Description for Graph Structure

supporting bipartite graph?
supervised or unsupervised?

unsupervised node classification evaluation

Hi,

when I run the example of combaining model,dataset and task in the tutorial, the pycharm return "Using backend: pytorch" and there is nothing about the ret.

Error:
Using backend: pytorch

[Question] Which node classification to use for which model?

Hi,
Which one of the 4 node classification tasks should be used for the model 'deep walk', 'node2vec','NetMF', and 'NetSMF'?.

wrong negative sampling in LINE? and some other suggestions

In LINE's code, there might be a minor error in negative sampling part (if I understand it correctly).
https://github.com/THUDM/cogdl/blob/a69a969020b8aa41cfcd8ac54511984bc5b32d62/cogdl/models/emb/line.py#L133-L137

If index j for negative samples start at 1, then the number of negative sample should be self.negative-1. For example, if you set self.negative=5, 0 is not the negative sample (since it is skipped in the for loop) and 1,2,3,4 are negative samples drawn by alias algorithm. And I also checked original implementation by Jian Tang, the range of negative sampling is set as negative+1 (please see below).
https://github.com/tangjianpku/LINE/blob/d5f840941e0f4026090d1b1feeaf15da38e2b24b/linux/line.cpp#L332-L348

Some other suggestions:

It seems that cora, citeseer, pubmed are not supported in current version. I tried to run on these datasets, but error occurs saying that datasets are not supported. It would be better to provide documentation on how to run tasks on these datasets, or how to add new datasets (e.g., data formats, paths, naming conventions) by myself to run tasks
It would be better if there are more details on: stats of datasets (e.g., labeled or not), which datasets are supported for which tasks.
It would be better to provide some docstrings/comments in source code indicating the meaning of some variables (e.g., input, output).

how to run unsupervised_node_classification task on graphsage model

Dear author, when I tried run unsupervised_node_classification task on graphsage model, this error is shown as follow:

Traceback (most recent call last):
File "train.py", line 76, in
results = pool.map(main, variant_args_generator())
File "/home/mli/.localpython/lib/python3.6/multiprocessing/pool.py", line 266, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/home/mli/.localpython/lib/python3.6/multiprocessing/pool.py", line 644, in get
raise self._value
TypeError: new() received an invalid combination of arguments - got (NoneType, int), but expected one of:

(torch.device device)
(torch.Storage storage)
(Tensor other)
(tuple of ints size, torch.device device)
didn't match because some of the arguments have invalid types: (NoneType, int)
(object data, torch.device device)
didn't match because some of the arguments have invalid types: (NoneType, int)

python scripts/train.py --task node_classification --dataset cora --model gcn

Using backend: pytorch
Namespace(cpu=False, dataset=['cora'], device_id=[0], dropout=0.5, enhance=None, hidden_size=64, lr=0.01, max_epoch=500, model=['gcn'], num_classes=None, num_features=None, patience=100, save_dir='.', seed=[1], task='node_classification', weight_decay=0.0005)
Traceback (most recent call last):
File "scripts/train.py", line 101, in
results = [main(args) for args in variant_args_generator(args, variants)]
File "scripts/train.py", line 101, in
results = [main(args) for args in variant_args_generator(args, variants)]
File "scripts/train.py", line 29, in main
task = build_task(args)
File "/Users/happywei/Desktop/code/cogdl-master/cogdl/tasks/init.py", line 49, in build_task
return TASK_REGISTRYargs.task
File "/Users/happywei/Desktop/code/cogdl-master/cogdl/tasks/node_classification.py", line 66, in init
self.data.apply(lambda x: x.to(self.device))
File "/Users/happywei/opt/anaconda3/lib/python3.7/site-packages/torch_geometric/data/data.py", line 308, in apply
self[key] = self.apply(item, func)
File "/Users/happywei/opt/anaconda3/lib/python3.7/site-packages/torch_geometric/data/data.py", line 287, in apply
return func(item)
File "/Users/happywei/Desktop/code/cogdl-master/cogdl/tasks/node_classification.py", line 66, in
self.data.apply(lambda x: x.to(self.device))
File "/Users/happywei/opt/anaconda3/lib/python3.7/site-packages/torch/cuda/init.py", line 186, in _lazy_init
_check_driver()
File "/Users/happywei/opt/anaconda3/lib/python3.7/site-packages/torch/cuda/init.py", line 61, in _check_driver
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

我配置的环境是基于CPU版本的PyTorch 1.6，请问只有配置CUDA才能运行吗？

入侵检测

我想应用于入侵检测，入侵检测中的网络数据集要怎么处理呢？
KDDCup '99': http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
NSL-KDD: http://www.unb.ca/cic/datasets/nsl.html
UNSW-NB15: https://www.unsw.adfa.edu.au/australian-centre-for-cyber-security/cybersecurity/ADFA-NB15-Datasets/
IDS2017: https://www.unb.ca/cic/datasets/ids-2017.html

运行experiment(task="node_classification", dataset="cora", model="gcn", hidden_size=32, max_epoch=200)时，提示缺少cogdl\\match.yml文件

📚 Installation

Environment

OS:
Python version:
PyTorch version:
CUDA/cuDNN version:
How did you try to install CogDL (wheel, source):
Any other relevant information:

Checklist

I followed the installation guide.
I set up CUDA correctly.
I do have multiple CUDA versions on my machine.

Additional context

Issues loading TU dataset

❓ Questions & Help

@THINK2TRY

You were the last person to make significant edits to the TU dataloader. I am getting an error when loading what I believe to be correctly formatted data. I've been trying to debug for a while now. Any idea what is happening? I've attached the error along with my dataset. No worries if you don't have time to investigate, figured I'd ask in case I am missing something straightforward :)

tu-format-gh.zip

unsupervised_node_classification evaluation

Hi. When evaluating the performance of node classification, why LINE, NetMF, ProNE has the same result every time? For example, if use Wikipedia dataset on NetMF, It's always going to be this,
| ('wikipedia', 'netmf') | 0.4373±0.0000 | 0.4747±0.0000 | 0.4883±0.0000 | 0.4953±0.0000 | 0.5022±0.0000 |

Looking forward to your reply, Thanks!

Embedding vectors

Can I output the embedding vectors (contex vectors) as a file ?

Some question about link prediction

It seems that it is complex to implement other link prediction model based cogdl due to the design of HomoLinkPrediction task, such as https://github.com/rusty1s/pytorch_geometric/blob/master/examples/link_pred.py. Can you give me some ideas? Thanks.

Where is the DBLP dataset (51,264 nodes)?

Hi, does anyone know where the DBLP dataset with 51264 nodes and 127,968 edges is? (mentioned in this page: https://keg.cs.tsinghua.edu.cn/cogdl/datasets.html)

How to setup codgl in cluster!!

Hi,
I am trying to setup cogdl in a virtual environment of a cluster. Can you provide setup instruction to do so!!
Thanks,
Ajay Madhavan

Advice: provide a script to download all datasets for offline usage

Currrently, CogDL downloads the missing datasets on the fly. However, some servers are installed in an environment isolated from the Internet. It is inconvenient to use CogDL in such environment.

A script to download all the needed datasets into a local directory will be very helpful. Users can then upload the local directory to the remote server only once.

Thanks for considering the advice.

Optimize Exception Message

Loading CogDL will report Ninja is required to load C++ extensions, which could be traced back to https://github.com/THUDM/cogdl/blob/94b8bc9bf8215f63eb1f0471457ca831817a9bf3/cogdl/operators/sample.py#L14

Similarly, the following code also has the same problem.
https://github.com/THUDM/cogdl/blob/94b8bc9bf8215f63eb1f0471457ca831817a9bf3/cogdl/operators/spmm.py#L26

It might be better to add extra message to the print(e) to identify the "Ninja" message is thrown here. Since the first line will be reached as soon as cogdl is imported, it might confuse the users that do not use this module.

Running on new dataset

Hi,
I am wondering if this is limited to the datasets you have made available. Is there any documentation for how to format a new graph dataset to test?

I apologize as this is likely not a code base issue, but the slack invite link is broken.

Thanks,
Kayla

Add Adaptive Sampling GCN

Add AS-GCN for comparison with Dr-GCN

怎样获得模型输出而不是准确率？

❓ Questions & Help

Hi,
CogDL是很好的工作包，请问怎样获得结点分类模型对每个结点的预测值？如用GAT在Cora上执行半监督结点分类，希望得到输出尺寸为（2708，7）的预测矩阵，而不是简单的测试精度。

Possible bug in cogdl/tasks/node_classification.py

cogdl/tasks/node_classification.py, line 93 may be wrong? it seems that the missing_rate is not used.

if args.missing_rate >= 0:
    if args.model == "sgcpn":
        assert args.dataset in ["cora", "citeseer", "pubmed"]
        dataset.data = preprocess_data_sgcpn(dataset.data, normalize_feature=True, missing_rate=0)
        adj_slice = torch.tensor(dataset.data.adj.size())
        adj_slice[0] = 0
        dataset.slices["adj"] = adj_slice

Is there any plans to release version of tensorflow?

Thanks a lot.

thudm / cogdl Goto Github PK

cogdl's Introduction

❗ News

Getting Started

Requirements and Installation

Usage

API Usage

Command-Line Usage

❗ FAQ

CogDL Team

Citing CogDL

cogdl's People

Contributors

Stargazers

Watchers

Forkers

cogdl's Issues

❓ Questions & Help

❓ Questions & Help

❓ Questions & Help

🐛 Bug

To Reproduce

Expected behavior

1.Adding docs for preparing data

2.Support exporting embedding for nodes

3.Description for Graph Structure

📚 Installation

Environment

Checklist

Additional context

❓ Questions & Help

❓ Questions & Help

Recommend Projects

Recommend Topics

Recommend Org