Giter VIP home page Giter VIP logo

fastgae's Introduction

FastGAE: Scalable Graph Autoencoders with Stochastic Subgraph Decoding

This repository provides a Tensorflow implementation of the FastGAE framework, introduced in the article FastGAE: Scalable Graph Autoencoders with Stochastic Subgraph Decoding. This framework aims to speed up Graph Autoencoders (AE) and Graph Variational Autoencoders (VAE), and at scaling these models to large graphs with millions of nodes and edges.

This work has been accepted for publication in Elsevier's Neural Networks journal.

We provide the FastGAE and Variational FastGAE models from our paper, together with standard Graph Autoencoders and standard Variational Graph Autoencoders from the original article of Kipf and Welling (2016).

We evaluate our framework on the link prediction and node clustering tasks introduced in the paper. We provide the Cora, Citeseer, Pubmed and Google datasets in the data folder. The three additional graphs used in the paper (SBM, Patent and Youtube) will later be provided via external links due to size constraints. We refer to section 4 of the paper for more information about datasets.

Our code builds upon Thomas Kipf's original implementation of graph AE and VAE, and upon previous research works from our team.

Installation

python setup.py install

Requirements: tensorflow (1.X), networkx, numpy, scikit-learn, scipy

Run Experiments

cd fastgae
python train.py --model=gcn_ae --dataset=pubmed --task=link_prediction --fastgae=True --measure=degree --alpha=1.0 --nb_node_samples=1000
python train.py --model=gcn_ae --dataset=pubmed --task=link_prediction

The above commands will train a graph AE on the Pubmed dataset, with (line 2) and without (line 3) the FastGAE framework, decoding random subgraphs of 1000 nodes, drawn via degree-based sampling with alpha=1.0. Models are evaluated on the Link Prediction task, with all parameters set to default values.

As detailed in Table 2 of the paper, line 2 returns competitive average performances w.r.t. line 3 i.e. the standard graph AE, while being significantly faster. We recommend GPU usage for faster learning. Use the --nb_run option to average results over multiple runs.

Complete list of parameters

Parameter Type Description Default Value
model string Name of the model, among:
- gcn_ae: Graph AE from Kipf and Welling (2016), with 2-layer GCN encoder and inner product decoder
- gcn_vae: Graph VAE from Kipf and Welling (2016), with Gaussian distributions, 2-layer GCN encoders for mu and sigma, and inner product decoder
gcn_ae
dataset string Name of the dataset, among:
- cora: scientific publications citation network
- citeseer: scientific publications citation network
- pubmed: scientific publications citation network
- google: google.com web graph

You can specify any additional graph dataset, in edgelist format, by editing input_data.py
cora
task string Name of the Machine Learning evaluation task, among:
- link_prediction: Link Prediction
- node_clustering: Node Clustering

See section 4 of paper for details about tasks
link_prediction
fastgae boolean Whether to use the FastGAE framework False
nb_node_samples int For FastGAE: number of nodes to sample at each iteration, i.e. sampled subgraph size 1000
measure string For FastGAE: node importance measure used in sampling, among degree, core and uniform degree
alpha float For FastGAE: alpha hyperparameter for core/degree-based sampling 2.0
replace boolean For FastGAE: whether to sample nodes with (True) or without (False) replacement False
dropout float Dropout rate 0.
iterations int Number of training iterations 200
features boolean Whether to include node features in encoder False
learning_rate float Initial learning rate (with Adam optimizer) 0.01
hidden int Number of units in GCN encoder hidden layer 32
dimension int Dimension of encoder output, i.e. embedding dimension 16
nb_run integer Number of model runs + tests 1
prop_val float Proportion of edges in validation set (for Link Prediction) 5.
prop_test float Proportion of edges in test set (for Link Prediction) 10.
validation boolean Whether to report validation results at each epoch (implemented for Link Prediction task) False
verbose boolean Whether to print full comments details True

Cite

Please cite our paper if you use this code in your own work:

@article{salha2021fastgae,
  title={FastGAE: Scalable graph autoencoders with stochastic subgraph decoding},
  author={Salha, Guillaume and Hennequin, Romain and Remy, Jean-Baptiste and Moussallam, Manuel and Vazirgiannis, Michalis},
  journal={Neural Networks},
  volume={142},
  pages={1--19},
  year={2021},
  publisher={Elsevier}
}

fastgae's People

Contributors

guillaumesalhagalvan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fastgae's Issues

Request for implementation

Hi authors,

I enjoyed reading the pre-print. Do you have a timeline in mind for releasing the implementation?

Thanks,
Satyaki

Loss in datasets with high variance in node degree.

Hello! Again, big fan of your work.

I just though I'd bring this to your attention: the pos_weight variable for loss in some datasets can have a very high variance when calculating it from different subgraphs. In your implementation, you calculate it once from the sample at the start, and use it for the weighted cross-entropy throughout. In a local fork, I incorporated the pos_weight as a placeholder, and updated it at each training iteration. This increased performance in my case.

I'm sorry that this is a passing comment in an issue rather than a pull request, it's just that I don't have time to write a PR right now, perhaps when the semester ends and I have some spare time?

Can FastGAE be used in weighted graph?

I hope to use graph variational autoencoders in weighted biological networks with hundreds of thousands of nodes and millions of edges. FastGAE can help handle such a large network, but we noticed $A_{i,j} \in$ { 0, 1 } in Proposition 4.

Could you please tell me whether FastGAE can be used for weighted networks?

image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.