Giter VIP home page Giter VIP logo

Comments (5)

ysig avatar ysig commented on September 18, 2024 1

Hi @altsoph, thanks for reporting this bug.

So it seems that there is a clear reason why this bug is happening:

    def fit_transform(self, X):
        """Fit and transform, on the same dataset.

        Parameters
        ----------
        X : iterable
            Each element must be an iterable with at most three features and at
            least one. The first that is obligatory is a valid graph structure
            (adjacency matrix or edge_dictionary) while the second is
            node_labels and the third edge_labels (that fitting the given graph
            format). If None the kernel matrix is calculated upon fit data.
            The test samples.

        Returns
        -------
        K : numpy array, shape = [n_input_graphs, n_input_graphs]
            corresponding to the kernel matrix, a calculation between
            all pairs of graphs between target an features

        """
        self._method_calling = 2
        self.fit(X)

        S, N = np.zeros(shape=(self._ngx, self._ngx)), dict()
        for (key, M) in iteritems(self.X):
            K = M.dot(M.T).toarray()
            K_diag = K.diagonal()
            N[key] = K_diag
-->       S += np.nan_to_num(K / np.sqrt(np.outer(K_diag, K_diag)))
            # here things that are num are not propagated

        self._X_level_norm_factor = N

        if self.normalize:
-->       return S / len(self.X) # this is meaningless if for one element adds zero and for others it doesn't
        else:
            return S

If you run:

nspd = NeighborhoodSubgraphPairwiseDistance(normalize=True)
nspd.fit([g1, g2])
S, N = np.zeros(shape=(nspd._ngx, nspd._ngx)), dict()
for (key, M) in iteritems(nspd.X):
    K = M.dot(M.T).toarray()
    K_diag = K.diagonal()
    N[key] = K_diag
    S += np.nan_to_num(np.sqrt(np.outer(K_diag, K_diag))) > 0

print(S)

you get:

/usr/local/lib/python3.10/dist-packages/grakel/kernels/neighborhood_subgraph_pairwise_distance.py:314: RuntimeWarning: invalid value encountered in divide
  S += np.nan_to_num(K / np.sqrt(np.outer(K_diag, K_diag)))
array([[16., 12.],
       [12., 12.]])

which is what is skewing the results as len(X) == 16.

More over if you print the K_diag the nan is clearly visible on the last entries:
print(np.sqrt(np.outer(K_diag, K_diag)))

0.1.8
[[1.         0.22577328]
 [0.22577328 0.75      ]]
[[36. 36.]
 [36. 36.]]
[[ 8.          9.79795897]
 [ 9.79795897 12.        ]]
[[ 8.         12.64911064]
 [12.64911064 20.        ]]
[[36.         26.83281573]
 [26.83281573 20.        ]]
[[144. 144.]
 [144. 144.]]
[[18.         26.83281573]
 [26.83281573 40.        ]]
[[ 18.          43.26661531]
 [ 43.26661531 104.        ]]
[[144.         122.37646833]
 [122.37646833 104.        ]]
[[64. 16.]
 [16.  4.]]
[[12.          6.92820323]
 [ 6.92820323  4.        ]]
[[12.          6.92820323]
 [ 6.92820323  4.        ]]
[[64. 16.]
 [16.  4.]]
[[100.   0.]
 [  0.   0.]]
[[18.  0.]
 [ 0.  0.]]
[[18.  0.]
 [ 0.  0.]]
[[100.   0.]
 [  0.   0.]]

Grakel was built at 2017 (before even deep-learning and was working even with python-2). At the time we tested it was working.

For a vector to have nan with itself is a bit weird result, so I kindly ask @giannisnik to do some further examination on what is a principled way to resolve this normalization issue for the nan cases and we will both reply here and push it as a bugfix to the version (0.1.10) - (need to fix having the correct version number too because it's 0.1.9).

from grakel.

giannisnik avatar giannisnik commented on September 18, 2024 1

@altsoph Yes, that's exactly when it appears. As @ysig said, we will fix this in the next version.

from grakel.

altsoph avatar altsoph commented on September 18, 2024

Thanks a lot!

from grakel.

giannisnik avatar giannisnik commented on September 18, 2024

Hi @altsoph ,

The issue is due to the default value of hyperparameter d of the kernel (default value is equal to 4). If you set d=2, no problem occurs. The reason behind the problem is that the kernel iteratively looks for pairs of nodes that are at distance $\delta$ for $\delta \in { 1, \ldots, d}$. The second graph you created (i.e., g2) consists of two connected components. The first is just a pair of nodes connected by an edge. The second component contains 4 nodes and the maximum shortest path distance between any two of those nodes is 2. Thus, for d=3 and d=4, the kernel finds no pair of nodes from g2 that satisfy the distance constraint and this is why kernel value for d=3 and d=4 is equal to 0 which leads to nan values.

@ysig I think this can be fixed just by replacing nanon the diagonal of K / np.sqrt(np.outer(K_diag, K_diag) with an 1 instead of a 0.

from grakel.

altsoph avatar altsoph commented on September 18, 2024

Hi @giannisnik,

Thanks for clarifications!

However, even with d=2 NeighborhoodSubgraphPairwiseDistance is having trouble with some disconnected graphs.
Here is another example with an isolated node:

g1 = Graph([[0., 0., 0.,],
            [0., 0., 1.,],
            [0., 1., 0.,]],
           node_labels=[1,1,1],
           edge_labels={(1, 2): 'B',
                        (2, 1): 'B',})

g2 = Graph([[0., 1., 1.,],
            [1., 0., 0.,],
            [1., 0., 0.,]],
           node_labels=[1,1,1],
           edge_labels={(0, 1): 'B',
                        (0, 2): 'B',
                        (1, 0): 'B',
                        (2, 0): 'B',})

print( NeighborhoodSubgraphPairwiseDistance(
          normalize=True, d=2
      ).fit_transform([g1, g2]) )

The output is:

[[0.66666667 0.23333333]
 [0.23333333 1.        ]]

Perhaps, the problem appears when there is a component with the radius less than d.

from grakel.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.