Comments (7)
Hi @tiedaoxiaotubie ,
Assuming that the nodes of your graphs are not annotated with attributes, you can transform them into grakel graphs as follows:
from grakel.utils import graph_from_networkx
G = graph_from_networkx(G_nx)
where G_nx
is a list of networkx graphs.
Then, you can compute the n x n kernel matrix (where n is the number of graphs). You can think of the normalized kernel values as the similarities between graphs. Thus, the (i,j)-th element of the kernel matrix corresponds to the similarity between the i-th graph and the j-th graph. To produce the kernel matrix you can use one of the kernels that is implemented in grakel. For instance, you can compute the shortest path kernel as follows:
from grakel.kernels import ShortestPath
gk = ShortestPath(normalize=True)
K = gk.fit_transform(G)
Then, the (i,j)-th element of K
contains the similarity between graphs i and j.
from grakel.
Thank you very much for your reply! the nodes of my graphs are neither annotated with attributes nor with labels, I tried with your code, but seems not work for me. I have to admit that my graph is pretty large, in which the edge number is more than 6,000, maybe the GraKel cannot handle such a big graph in one time?
I tried to use ShortestPath first, it raised an error immediately:
(base) symexe@ubuntu1804:~/CFGFactory$ python3 CFGenerator.py
Traceback (most recent call last):
File "CFGenerator.py", line 68, in <module>
main()
File "CFGenerator.py", line 62, in main
C._cfg_similarity()
File "CFGenerator.py", line 44, in _cfg_similarity
K = gk.fit_transform(G)
File "/home/symexe/miniconda3/lib/python3.7/site-packages/grakel/kernels/kernel.py", line 197, in fit_transform
km = self._calculate_kernel_matrix()
File "/home/symexe/miniconda3/lib/python3.7/site-packages/grakel/kernels/kernel.py", line 231, in _calculate_kernel_matrix
K[i, i] = self.pairwise_operation(x, x)
File "/home/symexe/miniconda3/lib/python3.7/site-packages/grakel/kernels/random_walk.py", line 206, in pairwise_operation
XY = np.kron(X, Y)
File "<__array_function__ internals>", line 6, in kron
File "/home/symexe/miniconda3/lib/python3.7/site-packages/numpy/lib/shape_base.py", line 1154, in kron
result = outer(a, b).reshape(as_+bs)
File "<__array_function__ internals>", line 6, in outer
File "/home/symexe/miniconda3/lib/python3.7/site-packages/numpy/core/numeric.py", line 942, in outer
return multiply(a.ravel()[:, newaxis], b.ravel()[newaxis, :], out)
numpy.core._exceptions.MemoryError: Unable to allocate 8.83 PiB for an array with shape (35259844, 35259844) and data type float64
I searched the previous issue that similar to my question, I noticed there were people tried to use RandomWalk to calculate the similarity:#8, so I also gave it a try, my code are as follows:
def _cfg_similarity(self):
# Transforms list of NetworkX graphs into a list of GraKeL graphs
G = graph_from_networkx(self._cfg_all)
# Uses the shortest path kernel to generate the kernel matrices
gk = RandomWalk(method_type="baseline", lamda=0.01)
K = gk.fit_transform(G)
print(K)
at this time, my error message was:
(base) symexe@ubuntu1804:~/CFGFactory$ python3 CFGenerator.py
Traceback (most recent call last):
File "CFGenerator.py", line 68, in <module>
main()
File "CFGenerator.py", line 62, in main
C._cfg_similarity()
File "CFGenerator.py", line 44, in _cfg_similarity
K = gk.fit_transform(G)
File "/home/symexe/miniconda3/lib/python3.7/site-packages/grakel/kernels/kernel.py", line 197, in fit_transform
km = self._calculate_kernel_matrix()
File "/home/symexe/miniconda3/lib/python3.7/site-packages/grakel/kernels/kernel.py", line 231, in _calculate_kernel_matrix
K[i, i] = self.pairwise_operation(x, x)
File "/home/symexe/miniconda3/lib/python3.7/site-packages/grakel/kernels/random_walk.py", line 206, in pairwise_operation
XY = np.kron(X, Y)
File "<__array_function__ internals>", line 6, in kron
File "/home/symexe/miniconda3/lib/python3.7/site-packages/numpy/lib/shape_base.py", line 1154, in kron
result = outer(a, b).reshape(as_+bs)
File "<__array_function__ internals>", line 6, in outer
File "/home/symexe/miniconda3/lib/python3.7/site-packages/numpy/core/numeric.py", line 942, in outer
return multiply(a.ravel()[:, newaxis], b.ravel()[newaxis, :], out)
numpy.core._exceptions.MemoryError: Unable to allocate 8.83 PiB for an array with shape (35259844, 35259844) and data type float64
from grakel.
It seems that not only the graphs are large, but also the dataset itself (35,259,844 graphs in total?). Comparing every pair of graphs is not feasible since it requires generating a 35,259,844 x 35,259,844 matrix which does not fit in memory. Do you need to compare each one of the 35,259,844 graphs against every other of the dataset?
from grakel.
OK, I changed the number of graphs to be calculated, only select 4 of the original graphs, but there are still more than 6,000 edges and 5,000 nodes for each of the graphs, at this time, the RandomWalk worked, while the ShortestPath still failed. The error message of ShortestPath are as follows:
(base) symexe@ubuntu1804:~/CFGFactory$ python3 CFGenerator.py
Traceback (most recent call last):
File "CFGenerator.py", line 68, in <module>
main()
File "CFGenerator.py", line 62, in main
C._cfg_similarity()
File "CFGenerator.py", line 44, in _cfg_similarity
K = gk.fit_transform(G)
File "/home/symexe/miniconda3/lib/python3.7/site-packages/grakel/kernels/shortest_path.py", line 393, in fit_transform
self.fit(X)
File "/home/symexe/miniconda3/lib/python3.7/site-packages/grakel/kernels/kernel.py", line 123, in fit
self.X = self.parse_input(X)
File "/home/symexe/miniconda3/lib/python3.7/site-packages/grakel/kernels/shortest_path.py", line 460, in parse_input
labels=self._lt)
File "/home/symexe/miniconda3/lib/python3.7/site-packages/grakel/graph.py", line 688, in build_shortest_path_matrix
return shortest_path_mat, self.get_labels()
File "/home/symexe/miniconda3/lib/python3.7/site-packages/grakel/graph.py", line 737, in get_labels
raise ValueError('Graph does not have any labels for vertices.')
ValueError: Graph does not have any labels for vertices.
It seems trying to get labels from the graph, but the graphs of mine are neither annotated with attributes nor with labels, which makes me confused.
The output of RandomWalk are as follows:
(base) symexe@ubuntu1804:~/CFGFactory$ python3 CFGenerator.py
/home/symexe/miniconda3/lib/python3.7/site-packages/grakel/kernels/kernel.py:201: RuntimeWarning: invalid value encountered in sqrt
return km / np.sqrt(np.outer(self._X_diag, self._X_diag))
[[-1. nan nan nan]
[ nan 1. 1.07683489 0.89696687]
[ nan 1.07683489 1. 1.09279374]
[ nan 0.89696687 1.09279374 1. ]]
For the output of RandomWalk, I have 2 questions:
- why there are -1 and nan?
- I noticed the GPU was not used when during the script's execution, is it normal? I thought it should use GPU at least :)
Hope I could get your answer, thank you very much :)
from grakel.
With regards to the two questions:
- I think that this is related to the
lamda
parameter of the kernel. Probably, you need to set this to some smaller value than the default. To obtain values between 0 and 1, set alsonormalize=True
. - The code does not use the GPU. The kernels run on the CPU.
With regards to the shortest path kernel, it expects node-labeled graphs. You can set the with_labels
attribute to False
as follows:
gk = ShortestPath(with_labels=False, normalize=True)
Perhaps the most efficient kernel is the WL subtree. But this kernel expects the graphs to contain node labels. To run the kernel, you can assign the same label to all nodes of all graphs (for instance, the label 'a'). To do this, have a look at this example: https://ysig.github.io/GraKeL/0.1a8/auto_examples/nx_to_grakel.html#sphx-glr-auto-examples-nx-to-grakel-py
from grakel.
Perfect answer, awesome! I got better result after using a smaller lamda in RandomWalk. the error of shortest path kernel also was handled after adding with_labels=False
.
from grakel.
from grakel.
Related Issues (20)
- Print <class 'grakel.graph.Graph'>
- The Weisfeiler-Lehman Edge Kernel
- fetch_dataset not working HOT 1
- Graph hopper kernel unable to transfom the data
- Edge labels not working for custom dataset HOT 1
- Fingerprint dataset not accessible using fetch_dataset function HOT 13
- NaN error when using Random walk kernel on certain datasets HOT 3
- I don't know how to use my own data for input HOT 5
- MultiDiGraph: graph_from_networkx() ValueError HOT 1
- modified random walk kernel giving all 1 scores for normalize=true HOT 1
- Difference between ShortestPath and ShortestPathAttr HOT 2
- Generating graphs from dense/numpy matrices HOT 1
- Graph.clone() or Graph.copy() HOT 3
- Explicit graph kernels HOT 2
- Graphlet kernel HOT 1
- NeighborhoodSubgraphPairwiseDistance kernel returns diagonal elements less than 1 HOT 5
- One vs many comparison for WeisfeilerLehman; `.transform` gives `TypeError: each element of X must have either a graph with...` HOT 1
- RWK matrix returns by fit_transform by RandomWalkLabeled() contains only 1.0 HOT 7
- Error when using EdgeHistogram HOT 1
- Can't install on Windows 10, keep getting same error HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from grakel.