tufts-ml / graph-generation-edge Goto Github PK

EDGE: Efficient and Degree-Guided Graph Generation via Discrete Diffusion Modeling

Python 53.94% C++ 37.98% Jupyter Notebook 8.08%

graph-generation-edge's Introduction

EDGE

Official pytorch implementation for "Efficient and Degree-Guided Graph Generation via Discrete Diffusion Modeling". Our code is devloped based on https://github.com/ehoogeboom/multinomial_diffusion.

We use the evaluation modules provided by https://github.com/uoguelph-mlrg/GGM-metrics and https://github.com/hheidrich/CELL.

Environment requirement

dgl
prettytable
scikit-learn
tensorboard
tensorflow
tensorflow-gan
torch
torch-geometric
tqdm
wandb

and dependencies from https://github.com/uoguelph-mlrg/GGM-metrics, https://github.com/hheidrich/CELL and https://github.com/ehoogeboom/multinomial_diffusion.

Training your degree sequence model

See node.ipynb, once you train the model, it's saved to the "./graphs" directory.

Training script

🌟IMPORTANT note on running EDGE for your own datasets: Do not use large diffusion steps for small graphs with less than 100 nodes, for those small graph datasets, please try #diffusion steps={8,16,32,64}

1. Training template for generic graph datasets

By default, we use an empirical degree sampler, which randomly takes a degree sequence from the training data as $d^0$ to perform degree guidance. You can replace the keyword empirical with neural in the option --empty_graph_sampler if you have trained your neural degree sampler.

#!/bin/bash

python train.py \
        --epochs 50000 \
        --num_generation 64 \
        --diffusion_dim 64 \
        --diffusion_steps 128 \
        --device cuda:1 \
        --dataset Ego \
        --batch_size 8 \
        --clip_value 1 \
        --lr 1e-4 \
        --optimizer adam \
        --final_prob_edge 1 0 \
        --sample_time_method importance \
        --check_every 500 \
        --eval_every 500 \
        --noise_schedule linear \
        --dp_rate 0.1 \
        --loss_type vb_ce_xt_prescribred_st \
        --arch TGNN_degree_guided \
        --parametrization xt_prescribed_st \
        --empty_graph_sampler empirical \     
        --degree \
        --num_heads 8 8 8 8 1

2. training template for large network datasets

#!/bin/bash

python train.py \
        --epochs 50000 \
        --num_generation 64 \
        --num_iter 256 \
        --diffusion_dim 64 \
        --diffusion_steps 512 \
        --device cuda:0 \
        --dataset polblogs \
        --batch_size 4 \
        --clip_value 1 \
        --lr 1e-4 \
        --optimizer adam \
        --final_prob_edge 1 0 \
        --sample_time_method importance \
        --check_every 50 \
        --eval_every 50 \
        --noise_schedule linear \
        --dp_rate 0.1 \
        --loss_type vb_ce_xt_prescribred_st \
        --arch TGNN_degree_guided \
        --parametrization xt_prescribed_st \
        --degree \
        --num_heads 8 8 8 8 1

Evaluation is done every eval_every epochs. You can also re-evaluate a specific checkpoint using the script below.

Evaluation script

python evaluate.py \
        --run_name 2023-05-29_18-29-35 \
        --dataset polblogs \
        --num_samples 8 \
        --checkpoints 5500

Results

Training results can be found in wandb/{dataset_name}/multinomial_diffusion/multistep/{run_name}

Work in progress

We are still working on integrating the following two features into our code:

Even faster sampling by incrementally modifying graph.
Attributed graph generation.

graph-generation-edge's People

Contributors

Stargazers

Watchers

Forkers

paul910 youjibiying thefoolgy turexx sidm1811 kdc202 kdc513 eyalmk hexin5515 strasser-pablo

graph-generation-edge's Issues

The question related to computing the active node

Hello! Thank you very much for your excellent work.

I have a few questions regarding the computation of active node, and I would greatly appreciate your guidance if you have a moment.

The active node $s^t$ is calculated during the training phase based on $A^t$ and $A^{t-1}$. How is $s^t$ computed during the testing phase? Is it because $s^t$ is no longer required to be computed during testing?

Question about environment

Could you provide some details about the environment ?
Which version of python you use? I can not install eden module when I try to run the code.
Thank you for help.

object has no attribute 'degree'

Thank you for your outstanding work!
However, when I am running the "2. training template for large network datasets" for training, there is an error indicating that 'degree' does not exist in 'pyg_data' ("layers/layers.py", line 331). Traceback (most recent call last):
File "train.py", line 88, in
exp.run()
File "/mnt/data1/lyk/experiment/graph-generation-EDGE-main/diffusion/experiment.py", line 155, in run
super(DiffusionExperiment, self).run(epochs=self.args.epochs)
File "/mnt/data1/lyk/experiment/graph-generation-EDGE-main/diffusion/base.py", line 181, in run
train_dict = self.train_fn(epoch)
File "/mnt/data1/lyk/experiment/graph-generation-EDGE-main/experiment.py", line 32, in train_fn
loss = elbo_bpd(self.model, pyg_data)
File "/mnt/data1/lyk/experiment/graph-generation-EDGE-main/diffusion/loss.py", line 29, in elbo_bpd
return loglik_bpd(model, x)
File "/mnt/data1/lyk/experiment/graph-generation-EDGE-main/diffusion/loss.py", line 12, in loglik_bpd
return -model.log_prob(x).sum() / (math.log(2) * x.num_entries)
File "/mnt/data1/lyk/experiment/graph-generation-EDGE-main/diffusion/diffusion_base.py", line 185, in log_prob
return self._train_loss(batched_graph)
File "/mnt/data1/lyk/experiment/graph-generation-EDGE-main/diffusion/diffusion_binomial_active.py", line 248, in _train_loss
kl = self._compute_MC_KL_joint(batched_graph, t, t_node, t_edge)
File "/mnt/data1/lyk/experiment/graph-generation-EDGE-main/diffusion/diffusion_binomial_active.py", line 158, in _compute_MC_KL_joint
log_model_prob_node, log_model_prob_edge = self._p_pred(batched_graph=batched_graph, t_node=t_node, t_edge=t_edge)
File "/mnt/data1/lyk/experiment/graph-generation-EDGE-main/diffusion/diffusion_binomial_active.py", line 187, in _p_pred
log_model_pred_node, log_model_pred_edge = self._predict_xtmin1_given_xt_st(batched_graph, t_node=t_node, t_edge=t_edge)
File "/mnt/data1/lyk/experiment/graph-generation-EDGE-main/diffusion/diffusion_binomial_active.py", line 148, in _predict_xtmin1_given_xt_st
out_node, out_edge = self._denoise_fn(batched_graph, t_node, t_edge)
File "/home/lyk/miniconda3/envs/py37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/data1/lyk/experiment/graph-generation-EDGE-main/layers/layers.py", line 331, in forward
nodes_0 = pyg_data.degree[..., None] / self.max_degree
File "/home/lyk/miniconda3/envs/py37/lib/python3.7/site-packages/torch_geometric/data/data.py", line 428, in getattr
return getattr(self._store, key)
File "/home/lyk/miniconda3/envs/py37/lib/python3.7/site-packages/torch_geometric/data/storage.py", line 65, in getattr
f"'{self.class.name}' object has no attribute '{key}'")
AttributeError: 'GlobalStorage' object has no attribute 'degree'

nice!

nice！

About the setting of epoch

Hello, I noticed that you set the epoch to 50,000 in your code, which will take a long time to train. Is it possible to set up an early stopping mechanism to reduce the training time? If so, do you have any experience with how many steps should be set for early stopping? I look forward to your reply, thank you.

How to install eden

When I was setting up the environment, I encountered the following error. How can I resolve it?
ModuleNotFoundError: No module named 'eden'

env problem

Is there some requirements.txt? The version of some dependency is kind of error.
Thank you.

Missing QM9 data set and running code for EDGE methods

Hello.

I tried to use your EDGE method, but found that the repository seemed to be missing the QM9 dataset and the code to run the EDGE method on it. Can you provide these resources, or provide guidance on how to access and use them?

Also, I noticed that the data set needed to be converted to PKL format. Can you explain the specific steps on how to process the data into PKL format?

Insufficient information to build the project

I wonder if you can provide additional information on the version of the library and python used for the project.
I am having a difficult time installing compatible dependencies to successfully run the project.

About unconnected nodes

Hello, thanks for your great work!
I am trying to use EDGE in my own dataset, with many nodes, there is only one connected graph and some nodes are not connected with any other nodes, do you think EDGE can deal with this condition?
Thanks a lot if you can give me some advice!