Giter VIP home page Giter VIP logo

Comments (14)

Astronaut-diode avatar Astronaut-diode commented on August 18, 2024 1

Thank you. I'll give it a try and hope for the best. :)

from ge-sc.

erichoang avatar erichoang commented on August 18, 2024 1

The authors seemingly have modified their repository a bit since I forked it. You can check our train function in the code below. Note that changing the path of your GCN models and having a suitable Tensorflow version are required if you want to re-use the code.

from __future__ import division
from __future__ import print_function

import time
import os
import sys

# find path to root directory of the project so as to import from other packages
tokens = os.path.abspath(__file__).split('/')
# print('tokens = ', tokens)
path2root = '/'.join(tokens[:-4])
# print('gae', 'path2root = ', path2root)
if path2root not in sys.path:
    sys.path.append(path2root)

# Train on CPU (hide GPU) due to memory constraints
# os.environ['CUDA_VISIBLE_DEVICES'] = ""

import tensorflow.compat.v1 as tf
import numpy as np
import scipy.sparse as sp

from sklearn.metrics import roc_auc_score
from sklearn.metrics import average_precision_score
import networkx as nx

# from gae.optimizer import OptimizerAE, OptimizerVAE
# from gae.input_data import load_data
# from gae.model import GCNModelAE, GCNModelVAE
# from gae.preprocessing import preprocess_graph, construct_feed_dict, sparse_to_tuple, mask_test_edges

from auto_encoders.vgae.gae.optimizer import OptimizerAE, OptimizerVAE
from auto_encoders.vgae.gae.model import GCNModelAE, GCNModelVAE
from auto_encoders.vgae.gae.preprocessing import preprocess_graph, construct_feed_dict, sparse_to_tuple, mask_test_edges

tf.disable_eager_execution()

def train(input_network, model_name='gcn_ae', emb_dim=16):
    """

    :param input_network: networkx network
    :param model_name: 'gcn_vae' or 'gcn_ae'
    :param emb_dim:
    :return:
    """
    adj = nx.adjacency_matrix(input_network)

    # Settings
    flags = tf.app.flags
    FLAGS = flags.FLAGS
    FLAGS.remove_flag_values(FLAGS.flag_values_dict())

    flags.DEFINE_float('learning_rate', 0.01, 'Initial learning rate.')
    flags.DEFINE_integer('epochs', 500, 'Number of epochs to train.')
    # flags.DEFINE_integer('epochs', 2000, 'Number of epochs to train.')
    flags.DEFINE_integer('hidden1', 32, 'Number of units in hidden layer 1.')
    flags.DEFINE_integer('hidden2', emb_dim, 'Number of units in hidden layer 2.')
    flags.DEFINE_float('weight_decay', 0., 'Weight for L2 loss on embedding matrix.')
    flags.DEFINE_float('dropout', 0., 'Dropout rate (1 - keep probability).')

    flags.DEFINE_string('model', model_name, 'Model string.')
    # flags.DEFINE_string('dataset', 'cora', 'Dataset string.')
    # flags.DEFINE_integer('features', 1, 'Whether to use features (1) or not (0).')

    model_str = FLAGS.model
    # dataset_str = FLAGS.dataset

    # Load data
    # adj, features = load_data(dataset_str)

    # Store original adjacency matrix (without diagonal entries) for later
    adj_orig = adj
    adj_orig = adj_orig - sp.dia_matrix((adj_orig.diagonal()[np.newaxis, :], [0]), shape=adj_orig.shape)
    adj_orig.eliminate_zeros()

    adj_train, train_edges, val_edges, val_edges_false, test_edges, test_edges_false = mask_test_edges(adj)
    adj = adj_train

    # if FLAGS.features == 0:
    #    features = sp.identity(features.shape[0])  # featureless

    features = sp.identity(adj.shape[0])  # featureless

    # Some preprocessing
    adj_norm = preprocess_graph(adj)

    # Define placeholders
    placeholders = {
        'features': tf.sparse_placeholder(tf.float32),
        'adj': tf.sparse_placeholder(tf.float32),
        'adj_orig': tf.sparse_placeholder(tf.float32),
        'dropout': tf.placeholder_with_default(0., shape=())
    }

    num_nodes = adj.shape[0]

    features = sparse_to_tuple(features.tocoo())
    num_features = features[2][1]
    features_nonzero = features[1].shape[0]

    # Create model
    model = None
    if model_str == 'gcn_ae':
        model = GCNModelAE(placeholders, num_features, features_nonzero)
    elif model_str == 'gcn_vae':
        model = GCNModelVAE(placeholders, num_features, num_nodes, features_nonzero)

    pos_weight = float(adj.shape[0] * adj.shape[0] - adj.sum()) / adj.sum()
    norm = adj.shape[0] * adj.shape[0] / float((adj.shape[0] * adj.shape[0] - adj.sum()) * 2)

    # Optimizer
    with tf.name_scope('optimizer'):
        if model_str == 'gcn_ae':
            opt = OptimizerAE(preds=model.reconstructions,
                              labels=tf.reshape(tf.sparse_tensor_to_dense(placeholders['adj_orig'],
                                                                          validate_indices=False), [-1]),
                              pos_weight=pos_weight,
                              norm=norm)
        elif model_str == 'gcn_vae':
            opt = OptimizerVAE(preds=model.reconstructions,
                               labels=tf.reshape(tf.sparse_tensor_to_dense(placeholders['adj_orig'],
                                                                           validate_indices=False), [-1]),
                               model=model, num_nodes=num_nodes,
                               pos_weight=pos_weight,
                               norm=norm)

    # Initialize session
    sess = tf.Session()
    sess.run(tf.global_variables_initializer())

    cost_val = []
    acc_val = []

    def get_roc_score(edges_pos, edges_neg, emb=None):
        if emb is None:
            feed_dict.update({placeholders['dropout']: 0})
            emb = sess.run(model.z_mean, feed_dict=feed_dict)

        def sigmoid(x):
            return 1 / (1 + np.exp(-x))

        # Predict on test set of edges
        adj_rec = np.dot(emb, emb.T)
        preds = []
        pos = []
        for e in edges_pos:
            preds.append(sigmoid(adj_rec[e[0], e[1]]))
            pos.append(adj_orig[e[0], e[1]])

        preds_neg = []
        neg = []
        for e in edges_neg:
            preds_neg.append(sigmoid(adj_rec[e[0], e[1]]))
            neg.append(adj_orig[e[0], e[1]])

        preds_all = np.hstack([preds, preds_neg])
        labels_all = np.hstack([np.ones(len(preds)), np.zeros(len(preds_neg))])
        roc_score = roc_auc_score(labels_all, preds_all)
        ap_score = average_precision_score(labels_all, preds_all)

        return roc_score, ap_score

    cost_val = []
    acc_val = []
    val_roc_score = []

    adj_label = adj_train + sp.eye(adj_train.shape[0])
    adj_label = sparse_to_tuple(adj_label)

    # Train model
    for epoch in range(FLAGS.epochs):
        t = time.time()
        # Construct feed dictionary
        feed_dict = construct_feed_dict(adj_norm, adj_label, features, placeholders)
        feed_dict.update({placeholders['dropout']: FLAGS.dropout})
        # Run single weight update
        outs = sess.run([opt.opt_op, opt.cost, opt.accuracy], feed_dict=feed_dict)

        # Compute average loss
        avg_cost = outs[1]
        avg_accuracy = outs[2]

        roc_curr, ap_curr = get_roc_score(val_edges, val_edges_false)
        val_roc_score.append(roc_curr)

        print("Epoch:", '%04d' % (epoch + 1), "train_loss=", "{:.5f}".format(avg_cost),
              "train_acc=", "{:.5f}".format(avg_accuracy), "val_roc=", "{:.5f}".format(val_roc_score[-1]),
              "val_ap=", "{:.5f}".format(ap_curr),
              "time=", "{:.5f}".format(time.time() - t))

    print("Optimization Finished!")

    roc_score, ap_score = get_roc_score(test_edges, test_edges_false)
    print('Test ROC score: ' + str(roc_score))
    print('Test AP score: ' + str(ap_score))

    feed_dict.update({placeholders['dropout']: 0})
    emb = sess.run(model.z_mean, feed_dict=feed_dict)
    # print('type(emb) = ', type(emb))
    # print('emb.shape = ', emb.shape)
    return emb

# emb = train(nx.karate_club_graph(), emb_dim=32)
# emb = np.asmatrix(emb)
# print('type(emb) = ', type(emb))
# print('emb.shape = ', emb.shape)

from ge-sc.

Astronaut-diode avatar Astronaut-diode commented on August 18, 2024 1

Ok, according to your source code, I have run it, but the problem is that it exceeds the memory limit. I have tried many methods, but all failed to reduce the consumption, so I have to give up temporarily.
It is worth mentioning that your model is also quite large for memory consumption, no criticism, hah! :)

from ge-sc.

erichoang avatar erichoang commented on August 18, 2024 1

Yes, executing the GCN model on our contract graphs requires a powerful GPU resource (We had used Nvidia A100-16GB). However, if you only want quick results, I suggest focusing on the node features generated by node-type one-hot vectors, LINE, and node2vec models. Based on our experiments, the results from these settings are often better than the ones from the GCN model. Besides, the settings can run with limited GPU resources, especially in the LINE model designed for the vast graph structure.

from ge-sc.

Astronaut-diode avatar Astronaut-diode commented on August 18, 2024 1

Thank you, I've successfully reproduced everything except the gae part, and it's great.

from ge-sc.

minhnn-tiny avatar minhnn-tiny commented on August 18, 2024

Hello, The source_path parameter was removed at this commit. Please make sure you pull the latest version. Btw, this command still work well from my side.

from ge-sc.

Astronaut-diode avatar Astronaut-diode commented on August 18, 2024

emm,I really did not understand this question, but now there is a new problem, how to import the data from other papers, I already have tags and source files, but I do not know how to convert them into your format, which I did not find in your submission history or readme.👍

from ge-sc.

minhnn-tiny avatar minhnn-tiny commented on August 18, 2024

Firstly, Thank you for your comments, we're lacking of some input preprocessing. We will update it.
The current required input are a compressed_graph which can generated by the scripts in process_graphs folder. In addition, when using node_feature are "GAE" or "LINE" or "Node2vec" as input node feature, you have to refer to those papers and generate nodes' features from the compressed_graph. We didn't include "GAE" or "LINE" or "Node2vec" tools inside this repo, we just dump their output of our current dataset to .pkl files in gesc_matrices_node_embedding folder.

from ge-sc.

Astronaut-diode avatar Astronaut-diode commented on August 18, 2024

Yes, I've already discovered that when creating a new dataset, if you use "GAE" or "LINE" or "Node2vec", the file you need to read doesn't actually exist. Also, I see that your paper should be written about GCN, not GAE.

from ge-sc.

Astronaut-diode avatar Astronaut-diode commented on August 18, 2024

So I was very curious about how to recreate GAE, LINE, and Node2vec.

from ge-sc.

erichoang avatar erichoang commented on August 18, 2024

Hi, we reused the following GitHub repository with minor modifications to generate the node embeddings of the LINE and Node2vec models.
https://github.com/shenweichen/GraphEmbedding

And the authors' repository with the GAE (or GCN) model.
https://github.com/tkipf/gae

However, please note that the above Github repositories are quite old. It is required to set up a specified environment with some old settings to run them.

from ge-sc.

Astronaut-diode avatar Astronaut-diode commented on August 18, 2024

Do you mean to feed those cfg_cg_compressed_graphs.gpickle files into these two libraries to generate the corresponding pre-trained embedded files? Which is the model that you read in when the option is "GAE" or "LINE" or "Node2vec"?

from ge-sc.

erichoang avatar erichoang commented on August 18, 2024

Do you mean to feed those cfg_cg_compressed_graphs.gpickle files into these two libraries to generate the corresponding pre-trained embedded files?

Yes. You feed the cfg_cg_compressed_graphs.gpickle files (using NetworkX format) to the two libraries to generate the corresponding embedded files.

from ge-sc.

Astronaut-diode avatar Astronaut-diode commented on August 18, 2024

I tried it and it worked for line and node2vec, but I couldn't find a suitable interface for conversion on gae and gcn.

from ge-sc.

Related Issues (6)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.