Giter VIP home page Giter VIP logo

graphsmote's People

Contributors

tianxiangzhao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

graphsmote's Issues

model optimization

According to literature research,GraphSmote is probably the only one toolkit that can train graph neural networks on unbalanced data,It's a great privilege to use this toolkit.
My learning task is consistent with the Cora data set,just a simple node-level classification(0 or 1,0 represent negative and 1 represent positive),i have ~13000 nodes(positive:~550,the rest nodes are all negative),each node has 350 features;then i rewrite the Cora data_loader function for my data set,and it words successfully.then I train the model by GraphSmote main.py:
nohup python3 -u main.py --imbalance --no-cuda --dataset=cora --setting='embed_up' --epochs=1000 --model=GAT --up_scale=0.95 --nhid=8 --batch_size=64 --lr=0.001 --dropout=0.5 &
It works!but the accuracy of the model in both the training data and the verification data set is very low and fluctuates within 0.6,just like bellow:
Loading cora dataset... edges between class 0 and class 0: 63682080.000000 edges between class 0 and class 1: 2235402.000000 edges between class 1 and class 0: 2235402.000000 edges between class 1 and class 1: 99624.000000 0-th class sample number: 12353 1-th class sample number: 500 valid current auc-roc score: 0.643200, current macro_F score: 0.333333 Epoch: 00001 loss_train: 0.7087 loss_rec: 0.7087 acc_train: 0.5000 loss_val: 0.7085 acc_val: 0.5000 time: 125.3990s Test set results: loss= 0.7085 accuracy= 0.5000 test current auc-roc score: 0.452893, current macro_F score: 0.333333 valid current auc-roc score: 0.459200, current macro_F score: 0.333333 Epoch: 00002 loss_train: 0.7085 loss_rec: 0.7085 acc_train: 0.5000 loss_val: 0.7087 acc_val: 0.5000 time: 125.1620s valid current auc-roc score: 0.481600, current macro_F score: 0.333333 Epoch: 00003 loss_train: 0.7083 loss_rec: 0.7083 acc_train: 0.5000 loss_val: 0.7083 acc_val: 0.5000 time: 124.7956s valid current auc-roc score: 0.561600, current macro_F score: 0.333333 ...... Epoch: 00820 loss_train: 0.6924 loss_rec: 0.6924 acc_train: 0.5500 loss_val: 0.6953 acc_val: 0.3400 time: 174.9777s
the detailed results:
nohup.txt

i have try reset the nhid/dropout/learn ratio ,but both the acc_train and acc_val are very low and fluctuate,Do you have any suggestions for optimization,Looking forward to your reply!thanks

memory error

Hello.
I have this memory error when using my dataset. my dataset is attached.
numpy.core._exceptions.MemoryError: Unable to allocate 39.6 GiB for an array with shape (103106, 103106) and data type float32
hoppity.zip

GraphSMOTE on dgl graph

Hello! Can I use this technique on the dgl graph with my own dataset?
Would there be a lot of modification I need to de?
I am now doing a edge classification task and encountered a data imbalance problem.
The graph with only one edge can not be correctly detected, so I am also wondering if GraphSMOTE can help me deal with this problem.

Solution for CUDA error

If you want to train model on CUDA, modify these lines as follows:

{
    "Modification1":
    {
        "file": "models.py",
        "lines": [193],
        "origin": "if not isinstance(adj, torch.sparse.FloatTensor): ",
        "new": "if adj.layout != torch.sparse_coo:"
    },
    "Modification2":
    {
        "file": "utils.py",
        "lines": [275, 294, 349],
        "origin": "distance = squareform(pdist(chosen_embed.detach()))",
        "new": "distance = squareform(pdist(chosen_embed.cpu().detach()))"
    },
}

Though there are still some lines reporting errors, modify them referring to Modification2 (i.e., xx.detach() => xx.cpu().detach())

@TianxiangZhao Please check it if convenient. if correct, a modification sounds good.

Heterogeneous Graph Models

Hi there,

Thanks for putting GraphSMOTE together, really cool code! Was wondering if there was support for heterogeneous graph neural networks as well. I'm currently training a heterogeneous GNN (using SageCONV) on a dataset for a multi class classification problem. I didn't see anything in the paper/repo about support for heterogeneous GNNs, and wanted to check here. Thanks again!

The GraphSMOTE(T) results were not satisfactory

Hi, I'd like to restate the results of the GraphSMOTE(T) experiment, I typed in this command line:

python main.py --imbalance --no-cuda --dataset=cora --setting='recon'

The results are as follows:

ACC = 0.271, OC-ROC = 0.5000, F Score = 0.1843

I have tried many times but couldn't get the results of GraphSMOTE(T) in the paper, is there something wrong with my command line? Or is there something else wrong with my operation? Thank you.

License

Hello,Thank you so much for sharing!The method is very nice!
Could you provide a license to the code? I may use it in my project!
Thank you very much!

复现GraphSMOTE(T)结果不理想

你好,我想复现GraphSMOTE(T)的实验结果,我输入的命令行是:
python main.py --imbalance --no-cuda --dataset=cora --setting='recon'
得到的结果如下:
ACC = 0.271, AUC-ROC = 0.5000, F Score = 0.1843
我试了很多次都没法得到论文中GraphSMOTE(T)的结果,请问是我的命令行错误了吗?还是我的操作存在其他问题,谢谢。

Documentation on how to apply this to new datasets

First of all, great paper; I enjoyed reading it!

However, I do wonder how we can apply this technique to our own custom dataset, as I don't really see much comprehensive documentation on generalisability. I noted that your utility functions are mostly written with respect to the pre-specified dataset. Was kindly wondering if I could get some help with using this with custom datasets, thanks!

cuda running error

When running on cuda, the error is as follows:
RuntimeError: Could not run 'aten::sum.dim_IntList' with arguments from the 'SparseCUDA' backend. 'aten::sum.dim_IntList' is only available for these backends: [CPU, CUDA, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].
using pytorch-1.7.0 cuda-10.2 on windows.

dataset

Hi, I have a few queries, it's where the "twitter.embeddings_64" dataset , and how to divide sub_twitter_edges, sub_twitter_labels, sub_node_embedding_64

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.