Giter VIP home page Giter VIP logo

Comments (2)

palash1992 avatar palash1992 commented on August 18, 2024

Each of the methods has a edge_f attribute which can be used to load the graph from a file containing edges, one edge per line. That is actually pretty fast and should work for your case.

from gem.

dhimmel avatar dhimmel commented on August 18, 2024

It looks like .learn_embedding methods with edge_f provided call graph_util.loadGraphFromEdgeListTxt:

GEM/gem/utils/graph_util.py

Lines 145 to 160 in 79be7fd

def loadGraphFromEdgeListTxt(file_name, directed=True):
with open(file_name, 'r') as f:
# n_nodes = f.readline()
# f.readline() # Discard the number of edges
if directed:
G = nx.DiGraph()
else:
G = nx.Graph()
for line in f:
edge = line.strip().split()
if len(edge) == 3:
w = float(edge[2])
else:
w = 1.0
G.add_edge(int(edge[0]), int(edge[1]), weight=w)
return G

This function constructs the complete network in-memory as a networkx object. Creating a networkx graph is the inefficient step that is slow and memory-intensive with the STRING network.

For example, node2vec.learn_embedding ends up reading edge_f to a networkx graph, then writing it to an edgelist. From the code below, it doesn't seem the networkx graph is actually used for anything besides rewriting the edgelist file:

def learn_embedding(self, graph=None, edge_f=None,
is_weighted=False, no_python=False):
args = ["node2vec"]
if not graph and not edge_f:
raise Exception('graph/edge_f needed')
if edge_f:
graph = graph_util.loadGraphFromEdgeListTxt(edge_f)
graph_util.saveGraphToEdgeListTxtn2v(graph, 'tempGraph.graph')

For lle.learn_embedding, it looks like a scipy.sparse matrix is actually what is required for computation, but that a networkx graph is loaded from the edgelist file to create the sparse matrix:

GEM/gem/embedding/lle.py

Lines 52 to 68 in 5da6632

def learn_embedding(self, graph=None, edge_f=None,
is_weighted=False, no_python=False):
if not graph and not edge_f:
raise Exception('graph/edge_f needed')
if not graph:
graph = graph_util.loadGraphFromEdgeListTxt(edge_f)
graph = graph.to_undirected()
t1 = time()
A = nx.to_scipy_sparse_matrix(graph)
normalize(A, norm='l1', axis=1, copy=False)
I_n = sp.eye(len(graph.nodes))
I_min_A = I_n - A
u, s, vt = lg.svds(I_min_A, k=self._d + 1, which='SM')
t2 = time()
self._X = vt.T
self._X = self._X[:, 1:]
return self._X.real, (t2 - t1)

For sdne.learn_embedding, a sparse matrix is also used:

def learn_embedding(self, graph=None, edge_f=None,
is_weighted=False, no_python=False):
if not graph and not edge_f:
raise Exception('graph/edge_f needed')
if not graph:
graph = graph_util.loadGraphFromEdgeListTxt(edge_f)
S = nx.to_scipy_sparse_matrix(graph)

I'd love access the unified API of GEM, but with support for providing an edgelist file or scipy.sparse matrix directly as input, without having this converted to a networkx graph. It seems that several of the methods actually use more efficient data structures than the networkx graph and don't need networkx features at all.

Do you have any suggestions on whether this would be possible? Even if just for one method? Currently, we've used node2vec without GEM to side-step this problem (notebook), but would love the ability to use the implementations in this package!

from gem.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.