Giter VIP home page Giter VIP logo

Comments (7)

giannisnik avatar giannisnik commented on August 16, 2024

Hi @tiedaoxiaotubie ,

Assuming that the nodes of your graphs are not annotated with attributes, you can transform them into grakel graphs as follows:

from grakel.utils import graph_from_networkx
G = graph_from_networkx(G_nx)

where G_nx is a list of networkx graphs.

Then, you can compute the n x n kernel matrix (where n is the number of graphs). You can think of the normalized kernel values as the similarities between graphs. Thus, the (i,j)-th element of the kernel matrix corresponds to the similarity between the i-th graph and the j-th graph. To produce the kernel matrix you can use one of the kernels that is implemented in grakel. For instance, you can compute the shortest path kernel as follows:

from grakel.kernels import ShortestPath
gk = ShortestPath(normalize=True)
K = gk.fit_transform(G)

Then, the (i,j)-th element of K contains the similarity between graphs i and j.

from grakel.

tiedaoxiaotubie avatar tiedaoxiaotubie commented on August 16, 2024

Thank you very much for your reply! the nodes of my graphs are neither annotated with attributes nor with labels, I tried with your code, but seems not work for me. I have to admit that my graph is pretty large, in which the edge number is more than 6,000, maybe the GraKel cannot handle such a big graph in one time?

I tried to use ShortestPath first, it raised an error immediately:

(base) symexe@ubuntu1804:~/CFGFactory$ python3 CFGenerator.py
Traceback (most recent call last):
  File "CFGenerator.py", line 68, in <module>
    main()
  File "CFGenerator.py", line 62, in main
    C._cfg_similarity()
  File "CFGenerator.py", line 44, in _cfg_similarity
    K = gk.fit_transform(G)
  File "/home/symexe/miniconda3/lib/python3.7/site-packages/grakel/kernels/kernel.py", line 197, in fit_transform
    km = self._calculate_kernel_matrix()
  File "/home/symexe/miniconda3/lib/python3.7/site-packages/grakel/kernels/kernel.py", line 231, in _calculate_kernel_matrix
    K[i, i] = self.pairwise_operation(x, x)
  File "/home/symexe/miniconda3/lib/python3.7/site-packages/grakel/kernels/random_walk.py", line 206, in pairwise_operation
    XY = np.kron(X, Y)
  File "<__array_function__ internals>", line 6, in kron
  File "/home/symexe/miniconda3/lib/python3.7/site-packages/numpy/lib/shape_base.py", line 1154, in kron
    result = outer(a, b).reshape(as_+bs)
  File "<__array_function__ internals>", line 6, in outer
  File "/home/symexe/miniconda3/lib/python3.7/site-packages/numpy/core/numeric.py", line 942, in outer
    return multiply(a.ravel()[:, newaxis], b.ravel()[newaxis, :], out)
numpy.core._exceptions.MemoryError: Unable to allocate 8.83 PiB for an array with shape (35259844, 35259844) and data type float64

I searched the previous issue that similar to my question, I noticed there were people tried to use RandomWalk to calculate the similarity:#8, so I also gave it a try, my code are as follows:

    def _cfg_similarity(self):
        # Transforms list of NetworkX graphs into a list of GraKeL graphs
        G = graph_from_networkx(self._cfg_all)

        # Uses the shortest path kernel to generate the kernel matrices
        gk = RandomWalk(method_type="baseline", lamda=0.01)
        K = gk.fit_transform(G)
        print(K)

at this time, my error message was:

(base) symexe@ubuntu1804:~/CFGFactory$ python3 CFGenerator.py
Traceback (most recent call last):
  File "CFGenerator.py", line 68, in <module>
    main()
  File "CFGenerator.py", line 62, in main
    C._cfg_similarity()
  File "CFGenerator.py", line 44, in _cfg_similarity
    K = gk.fit_transform(G)
  File "/home/symexe/miniconda3/lib/python3.7/site-packages/grakel/kernels/kernel.py", line 197, in fit_transform
    km = self._calculate_kernel_matrix()
  File "/home/symexe/miniconda3/lib/python3.7/site-packages/grakel/kernels/kernel.py", line 231, in _calculate_kernel_matrix
    K[i, i] = self.pairwise_operation(x, x)
  File "/home/symexe/miniconda3/lib/python3.7/site-packages/grakel/kernels/random_walk.py", line 206, in pairwise_operation
    XY = np.kron(X, Y)
  File "<__array_function__ internals>", line 6, in kron
  File "/home/symexe/miniconda3/lib/python3.7/site-packages/numpy/lib/shape_base.py", line 1154, in kron
    result = outer(a, b).reshape(as_+bs)
  File "<__array_function__ internals>", line 6, in outer
  File "/home/symexe/miniconda3/lib/python3.7/site-packages/numpy/core/numeric.py", line 942, in outer
    return multiply(a.ravel()[:, newaxis], b.ravel()[newaxis, :], out)
numpy.core._exceptions.MemoryError: Unable to allocate 8.83 PiB for an array with shape (35259844, 35259844) and data type float64

from grakel.

giannisnik avatar giannisnik commented on August 16, 2024

It seems that not only the graphs are large, but also the dataset itself (35,259,844 graphs in total?). Comparing every pair of graphs is not feasible since it requires generating a 35,259,844 x 35,259,844 matrix which does not fit in memory. Do you need to compare each one of the 35,259,844 graphs against every other of the dataset?

from grakel.

tiedaoxiaotubie avatar tiedaoxiaotubie commented on August 16, 2024

OK, I changed the number of graphs to be calculated, only select 4 of the original graphs, but there are still more than 6,000 edges and 5,000 nodes for each of the graphs, at this time, the RandomWalk worked, while the ShortestPath still failed. The error message of ShortestPath are as follows:

(base) symexe@ubuntu1804:~/CFGFactory$ python3 CFGenerator.py
Traceback (most recent call last):
  File "CFGenerator.py", line 68, in <module>
    main()
  File "CFGenerator.py", line 62, in main
    C._cfg_similarity()
  File "CFGenerator.py", line 44, in _cfg_similarity
    K = gk.fit_transform(G)
  File "/home/symexe/miniconda3/lib/python3.7/site-packages/grakel/kernels/shortest_path.py", line 393, in fit_transform
    self.fit(X)
  File "/home/symexe/miniconda3/lib/python3.7/site-packages/grakel/kernels/kernel.py", line 123, in fit
    self.X = self.parse_input(X)
  File "/home/symexe/miniconda3/lib/python3.7/site-packages/grakel/kernels/shortest_path.py", line 460, in parse_input
    labels=self._lt)
  File "/home/symexe/miniconda3/lib/python3.7/site-packages/grakel/graph.py", line 688, in build_shortest_path_matrix
    return shortest_path_mat, self.get_labels()
  File "/home/symexe/miniconda3/lib/python3.7/site-packages/grakel/graph.py", line 737, in get_labels
    raise ValueError('Graph does not have any labels for vertices.')
ValueError: Graph does not have any labels for vertices.

It seems trying to get labels from the graph, but the graphs of mine are neither annotated with attributes nor with labels, which makes me confused.

The output of RandomWalk are as follows:

(base) symexe@ubuntu1804:~/CFGFactory$ python3 CFGenerator.py
/home/symexe/miniconda3/lib/python3.7/site-packages/grakel/kernels/kernel.py:201: RuntimeWarning: invalid value encountered in sqrt
  return km / np.sqrt(np.outer(self._X_diag, self._X_diag))
[[-1.                 nan         nan         nan]
 [        nan  1.          1.07683489  0.89696687]
 [        nan  1.07683489  1.          1.09279374]
 [        nan  0.89696687  1.09279374  1.        ]]

For the output of RandomWalk, I have 2 questions:

  1. why there are -1 and nan?
  2. I noticed the GPU was not used when during the script's execution, is it normal? I thought it should use GPU at least :)

Hope I could get your answer, thank you very much :)

from grakel.

giannisnik avatar giannisnik commented on August 16, 2024

With regards to the two questions:

  1. I think that this is related to the lamda parameter of the kernel. Probably, you need to set this to some smaller value than the default. To obtain values between 0 and 1, set also normalize=True.
  2. The code does not use the GPU. The kernels run on the CPU.

With regards to the shortest path kernel, it expects node-labeled graphs. You can set the with_labels attribute to False as follows:
gk = ShortestPath(with_labels=False, normalize=True)

Perhaps the most efficient kernel is the WL subtree. But this kernel expects the graphs to contain node labels. To run the kernel, you can assign the same label to all nodes of all graphs (for instance, the label 'a'). To do this, have a look at this example: https://ysig.github.io/GraKeL/0.1a8/auto_examples/nx_to_grakel.html#sphx-glr-auto-examples-nx-to-grakel-py

from grakel.

tiedaoxiaotubie avatar tiedaoxiaotubie commented on August 16, 2024

Perfect answer, awesome! I got better result after using a smaller lamda in RandomWalk. the error of shortest path kernel also was handled after adding with_labels=False.

from grakel.

tiedaoxiaotubie avatar tiedaoxiaotubie commented on August 16, 2024

from grakel.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.