tianxiangzhao / graphsmote Goto Github PK
View Code? Open in Web Editor NEWPytorch implementation of paper 'GraphSMOTE: Imbalanced Node Classification on Graphs with Graph Neural Networks' to appear on WSDM2021
Pytorch implementation of paper 'GraphSMOTE: Imbalanced Node Classification on Graphs with Graph Neural Networks' to appear on WSDM2021
According to literature research,GraphSmote is probably the only one toolkit that can train graph neural networks on unbalanced data,It's a great privilege to use this toolkit.
My learning task is consistent with the Cora data set,just a simple node-level classification(0 or 1,0 represent negative and 1 represent positive),i have ~13000 nodes(positive:~550,the rest nodes are all negative),each node has 350 features;then i rewrite the Cora data_loader function for my data set,and it words successfully.then I train the model by GraphSmote main.py:
nohup python3 -u main.py --imbalance --no-cuda --dataset=cora --setting='embed_up' --epochs=1000 --model=GAT --up_scale=0.95 --nhid=8 --batch_size=64 --lr=0.001 --dropout=0.5 &
It works!but the accuracy of the model in both the training data and the verification data set is very low and fluctuates within 0.6,just like bellow:
Loading cora dataset... edges between class 0 and class 0: 63682080.000000 edges between class 0 and class 1: 2235402.000000 edges between class 1 and class 0: 2235402.000000 edges between class 1 and class 1: 99624.000000 0-th class sample number: 12353 1-th class sample number: 500 valid current auc-roc score: 0.643200, current macro_F score: 0.333333 Epoch: 00001 loss_train: 0.7087 loss_rec: 0.7087 acc_train: 0.5000 loss_val: 0.7085 acc_val: 0.5000 time: 125.3990s Test set results: loss= 0.7085 accuracy= 0.5000 test current auc-roc score: 0.452893, current macro_F score: 0.333333 valid current auc-roc score: 0.459200, current macro_F score: 0.333333 Epoch: 00002 loss_train: 0.7085 loss_rec: 0.7085 acc_train: 0.5000 loss_val: 0.7087 acc_val: 0.5000 time: 125.1620s valid current auc-roc score: 0.481600, current macro_F score: 0.333333 Epoch: 00003 loss_train: 0.7083 loss_rec: 0.7083 acc_train: 0.5000 loss_val: 0.7083 acc_val: 0.5000 time: 124.7956s valid current auc-roc score: 0.561600, current macro_F score: 0.333333 ...... Epoch: 00820 loss_train: 0.6924 loss_rec: 0.6924 acc_train: 0.5500 loss_val: 0.6953 acc_val: 0.3400 time: 174.9777s
the detailed results:
nohup.txt
i have try reset the nhid/dropout/learn ratio ,but both the acc_train and acc_val are very low and fluctuate,Do you have any suggestions for optimization,Looking forward to your reply!thanks
Hello.
I have this memory error when using my dataset. my dataset is attached.
numpy.core._exceptions.MemoryError: Unable to allocate 39.6 GiB for an array with shape (103106, 103106) and data type float32
hoppity.zip
Hello! Can I use this technique on the dgl graph with my own dataset?
Would there be a lot of modification I need to de?
I am now doing a edge classification task and encountered a data imbalance problem.
The graph with only one edge can not be correctly detected, so I am also wondering if GraphSMOTE can help me deal with this problem.
您好,可以分享一下您的预训练权重吗
If you want to train model on CUDA, modify these lines as follows:
{
"Modification1":
{
"file": "models.py",
"lines": [193],
"origin": "if not isinstance(adj, torch.sparse.FloatTensor): ",
"new": "if adj.layout != torch.sparse_coo:"
},
"Modification2":
{
"file": "utils.py",
"lines": [275, 294, 349],
"origin": "distance = squareform(pdist(chosen_embed.detach()))",
"new": "distance = squareform(pdist(chosen_embed.cpu().detach()))"
},
}
Though there are still some lines reporting errors, modify them referring to Modification2
(i.e., xx.detach()
=> xx.cpu().detach()
)
@TianxiangZhao Please check it if convenient. if correct, a modification sounds good.
Hi there,
Thanks for putting GraphSMOTE
together, really cool code! Was wondering if there was support for heterogeneous graph neural networks as well. I'm currently training a heterogeneous GNN (using SageCONV
) on a dataset for a multi class classification problem. I didn't see anything in the paper/repo about support for heterogeneous GNNs, and wanted to check here. Thanks again!
Hi, I'd like to restate the results of the GraphSMOTE(T) experiment, I typed in this command line:
python main.py --imbalance --no-cuda --dataset=cora --setting='recon'
The results are as follows:
ACC = 0.271, OC-ROC = 0.5000, F Score = 0.1843
I have tried many times but couldn't get the results of GraphSMOTE(T) in the paper, is there something wrong with my command line? Or is there something else wrong with my operation? Thank you.
Hello,Thank you so much for sharing!The method is very nice!
Could you provide a license to the code? I may use it in my project!
Thank you very much!
你好,我想复现GraphSMOTE(T)的实验结果,我输入的命令行是:
python main.py --imbalance --no-cuda --dataset=cora --setting='recon'
得到的结果如下:
ACC = 0.271, AUC-ROC = 0.5000, F Score = 0.1843
我试了很多次都没法得到论文中GraphSMOTE(T)的结果,请问是我的命令行错误了吗?还是我的操作存在其他问题,谢谢。
First of all, great paper; I enjoyed reading it!
However, I do wonder how we can apply this technique to our own custom dataset, as I don't really see much comprehensive documentation on generalisability. I noted that your utility functions are mostly written with respect to the pre-specified dataset. Was kindly wondering if I could get some help with using this with custom datasets, thanks!
When running on cuda, the error is as follows:
RuntimeError: Could not run 'aten::sum.dim_IntList' with arguments from the 'SparseCUDA' backend. 'aten::sum.dim_IntList' is only available for these backends: [CPU, CUDA, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].
using pytorch-1.7.0 cuda-10.2 on windows.
Hi, I have a few queries, it's where the "twitter.embeddings_64" dataset , and how to divide sub_twitter_edges, sub_twitter_labels, sub_node_embedding_64
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.