tianxiangzhao / graphsmote Goto Github PK

Pytorch implementation of paper 'GraphSMOTE: Imbalanced Node Classification on Graphs with Graph Neural Networks' to appear on WSDM2021

Python 100.00%

graphsmote's People

Contributors

Stargazers

Watchers

graphsmote's Issues

model optimization

According to literature research,GraphSmote is probably the only one toolkit that can train graph neural networks on unbalanced data,It's a great privilege to use this toolkit.
My learning task is consistent with the Cora data set,just a simple node-level classification(0 or 1,0 represent negative and 1 represent positive),i have ~13000 nodes(positive:~550,the rest nodes are all negative),each node has 350 features;then i rewrite the Cora data_loader function for my data set,and it words successfully.then I train the model by GraphSmote main.py：
nohup python3 -u main.py --imbalance --no-cuda --dataset=cora --setting='embed_up' --epochs=1000 --model=GAT --up_scale=0.95 --nhid=8 --batch_size=64 --lr=0.001 --dropout=0.5 &
It works!but the accuracy of the model in both the training data and the verification data set is very low and fluctuates within 0.6,just like bellow:
Loading cora dataset... edges between class 0 and class 0: 63682080.000000 edges between class 0 and class 1: 2235402.000000 edges between class 1 and class 0: 2235402.000000 edges between class 1 and class 1: 99624.000000 0-th class sample number: 12353 1-th class sample number: 500 valid current auc-roc score: 0.643200, current macro_F score: 0.333333 Epoch: 00001 loss_train: 0.7087 loss_rec: 0.7087 acc_train: 0.5000 loss_val: 0.7085 acc_val: 0.5000 time: 125.3990s Test set results: loss= 0.7085 accuracy= 0.5000 test current auc-roc score: 0.452893, current macro_F score: 0.333333 valid current auc-roc score: 0.459200, current macro_F score: 0.333333 Epoch: 00002 loss_train: 0.7085 loss_rec: 0.7085 acc_train: 0.5000 loss_val: 0.7087 acc_val: 0.5000 time: 125.1620s valid current auc-roc score: 0.481600, current macro_F score: 0.333333 Epoch: 00003 loss_train: 0.7083 loss_rec: 0.7083 acc_train: 0.5000 loss_val: 0.7083 acc_val: 0.5000 time: 124.7956s valid current auc-roc score: 0.561600, current macro_F score: 0.333333 ...... Epoch: 00820 loss_train: 0.6924 loss_rec: 0.6924 acc_train: 0.5500 loss_val: 0.6953 acc_val: 0.3400 time: 174.9777s
the detailed results:
nohup.txt

i have try reset the nhid/dropout/learn ratio ,but both the acc_train and acc_val are very low and fluctuate,Do you have any suggestions for optimization,Looking forward to your reply!thanks

memory error

Hello.
I have this memory error when using my dataset. my dataset is attached.
numpy.core._exceptions.MemoryError: Unable to allocate 39.6 GiB for an array with shape (103106, 103106) and data type float32
hoppity.zip

GraphSMOTE on dgl graph

Hello! Can I use this technique on the dgl graph with my own dataset?
Would there be a lot of modification I need to de?
I am now doing a edge classification task and encountered a data imbalance problem.
The graph with only one edge can not be correctly detected, so I am also wondering if GraphSMOTE can help me deal with this problem.

pretrain weight

您好，可以分享一下您的预训练权重吗

Solution for CUDA error

If you want to train model on CUDA, modify these lines as follows:

{
    "Modification1":
    {
        "file": "models.py",
        "lines": [193],
        "origin": "if not isinstance(adj, torch.sparse.FloatTensor): ",
        "new": "if adj.layout != torch.sparse_coo:"
    },
    "Modification2":
    {
        "file": "utils.py",
        "lines": [275, 294, 349],
        "origin": "distance = squareform(pdist(chosen_embed.detach()))",
        "new": "distance = squareform(pdist(chosen_embed.cpu().detach()))"
    },
}

Though there are still some lines reporting errors, modify them referring to Modification2 (i.e., xx.detach() => xx.cpu().detach())

@TianxiangZhao Please check it if convenient. if correct, a modification sounds good.

Heterogeneous Graph Models

Hi there,

Thanks for putting GraphSMOTE together, really cool code! Was wondering if there was support for heterogeneous graph neural networks as well. I'm currently training a heterogeneous GNN (using SageCONV) on a dataset for a multi class classification problem. I didn't see anything in the paper/repo about support for heterogeneous GNNs, and wanted to check here. Thanks again!

The GraphSMOTE(T) results were not satisfactory

Hi, I'd like to restate the results of the GraphSMOTE(T) experiment, I typed in this command line:

python main.py --imbalance --no-cuda --dataset=cora --setting='recon'

The results are as follows:

ACC = 0.271, OC-ROC = 0.5000, F Score = 0.1843

I have tried many times but couldn't get the results of GraphSMOTE(T) in the paper, is there something wrong with my command line? Or is there something else wrong with my operation? Thank you.

License

Hello,Thank you so much for sharing！The method is very nice！
Could you provide a license to the code? I may use it in my project！
Thank you very much!

复现GraphSMOTE(T)结果不理想

你好，我想复现GraphSMOTE(T)的实验结果，我输入的命令行是：
python main.py --imbalance --no-cuda --dataset=cora --setting='recon'
得到的结果如下：
ACC = 0.271, AUC-ROC = 0.5000, F Score = 0.1843
我试了很多次都没法得到论文中GraphSMOTE(T)的结果，请问是我的命令行错误了吗？还是我的操作存在其他问题，谢谢。

Documentation on how to apply this to new datasets

First of all, great paper; I enjoyed reading it!

However, I do wonder how we can apply this technique to our own custom dataset, as I don't really see much comprehensive documentation on generalisability. I noted that your utility functions are mostly written with respect to the pre-specified dataset. Was kindly wondering if I could get some help with using this with custom datasets, thanks!

cuda running error

When running on cuda, the error is as follows：
RuntimeError: Could not run 'aten::sum.dim_IntList' with arguments from the 'SparseCUDA' backend. 'aten::sum.dim_IntList' is only available for these backends: [CPU, CUDA, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].
using pytorch-1.7.0 cuda-10.2 on windows.

dataset

Hi, I have a few queries, it's where the "twitter.embeddings_64" dataset , and how to divide sub_twitter_edges, sub_twitter_labels, sub_node_embedding_64

tianxiangzhao / graphsmote Goto Github PK

graphsmote's People

Contributors

Stargazers

Watchers

Forkers

graphsmote's Issues

Recommend Projects

Recommend Topics

Recommend Org