Comments (5)
Hi @altsoph, thanks for reporting this bug.
So it seems that there is a clear reason why this bug is happening:
def fit_transform(self, X):
"""Fit and transform, on the same dataset.
Parameters
----------
X : iterable
Each element must be an iterable with at most three features and at
least one. The first that is obligatory is a valid graph structure
(adjacency matrix or edge_dictionary) while the second is
node_labels and the third edge_labels (that fitting the given graph
format). If None the kernel matrix is calculated upon fit data.
The test samples.
Returns
-------
K : numpy array, shape = [n_input_graphs, n_input_graphs]
corresponding to the kernel matrix, a calculation between
all pairs of graphs between target an features
"""
self._method_calling = 2
self.fit(X)
S, N = np.zeros(shape=(self._ngx, self._ngx)), dict()
for (key, M) in iteritems(self.X):
K = M.dot(M.T).toarray()
K_diag = K.diagonal()
N[key] = K_diag
--> S += np.nan_to_num(K / np.sqrt(np.outer(K_diag, K_diag)))
# here things that are num are not propagated
self._X_level_norm_factor = N
if self.normalize:
--> return S / len(self.X) # this is meaningless if for one element adds zero and for others it doesn't
else:
return S
If you run:
nspd = NeighborhoodSubgraphPairwiseDistance(normalize=True)
nspd.fit([g1, g2])
S, N = np.zeros(shape=(nspd._ngx, nspd._ngx)), dict()
for (key, M) in iteritems(nspd.X):
K = M.dot(M.T).toarray()
K_diag = K.diagonal()
N[key] = K_diag
S += np.nan_to_num(np.sqrt(np.outer(K_diag, K_diag))) > 0
print(S)
you get:
/usr/local/lib/python3.10/dist-packages/grakel/kernels/neighborhood_subgraph_pairwise_distance.py:314: RuntimeWarning: invalid value encountered in divide
S += np.nan_to_num(K / np.sqrt(np.outer(K_diag, K_diag)))
array([[16., 12.],
[12., 12.]])
which is what is skewing the results as len(X) == 16.
More over if you print the K_diag the nan is clearly visible on the last entries:
print(np.sqrt(np.outer(K_diag, K_diag)))
0.1.8
[[1. 0.22577328]
[0.22577328 0.75 ]]
[[36. 36.]
[36. 36.]]
[[ 8. 9.79795897]
[ 9.79795897 12. ]]
[[ 8. 12.64911064]
[12.64911064 20. ]]
[[36. 26.83281573]
[26.83281573 20. ]]
[[144. 144.]
[144. 144.]]
[[18. 26.83281573]
[26.83281573 40. ]]
[[ 18. 43.26661531]
[ 43.26661531 104. ]]
[[144. 122.37646833]
[122.37646833 104. ]]
[[64. 16.]
[16. 4.]]
[[12. 6.92820323]
[ 6.92820323 4. ]]
[[12. 6.92820323]
[ 6.92820323 4. ]]
[[64. 16.]
[16. 4.]]
[[100. 0.]
[ 0. 0.]]
[[18. 0.]
[ 0. 0.]]
[[18. 0.]
[ 0. 0.]]
[[100. 0.]
[ 0. 0.]]
Grakel was built at 2017 (before even deep-learning and was working even with python-2). At the time we tested it was working.
For a vector to have nan with itself is a bit weird result, so I kindly ask @giannisnik to do some further examination on what is a principled way to resolve this normalization issue for the nan cases and we will both reply here and push it as a bugfix to the version (0.1.10) - (need to fix having the correct version number too because it's 0.1.9).
from grakel.
@altsoph Yes, that's exactly when it appears. As @ysig said, we will fix this in the next version.
from grakel.
Thanks a lot!
from grakel.
Hi @altsoph ,
The issue is due to the default value of hyperparameter d
of the kernel (default value is equal to 4). If you set d=2
, no problem occurs. The reason behind the problem is that the kernel iteratively looks for pairs of nodes that are at distance g2
) consists of two connected components. The first is just a pair of nodes connected by an edge. The second component contains 4 nodes and the maximum shortest path distance between any two of those nodes is 2. Thus, for d=3
and d=4
, the kernel finds no pair of nodes from g2
that satisfy the distance constraint and this is why kernel value for d=3
and d=4
is equal to 0 which leads to nan
values.
@ysig I think this can be fixed just by replacing nan
on the diagonal of K / np.sqrt(np.outer(K_diag, K_diag)
with an 1 instead of a 0.
from grakel.
Hi @giannisnik,
Thanks for clarifications!
However, even with d=2 NeighborhoodSubgraphPairwiseDistance is having trouble with some disconnected graphs.
Here is another example with an isolated node:
g1 = Graph([[0., 0., 0.,],
[0., 0., 1.,],
[0., 1., 0.,]],
node_labels=[1,1,1],
edge_labels={(1, 2): 'B',
(2, 1): 'B',})
g2 = Graph([[0., 1., 1.,],
[1., 0., 0.,],
[1., 0., 0.,]],
node_labels=[1,1,1],
edge_labels={(0, 1): 'B',
(0, 2): 'B',
(1, 0): 'B',
(2, 0): 'B',})
print( NeighborhoodSubgraphPairwiseDistance(
normalize=True, d=2
).fit_transform([g1, g2]) )
The output is:
[[0.66666667 0.23333333]
[0.23333333 1. ]]
Perhaps, the problem appears when there is a component with the radius less than d.
from grakel.
Related Issues (20)
- Print <class 'grakel.graph.Graph'>
- The Weisfeiler-Lehman Edge Kernel
- fetch_dataset not working HOT 1
- Graph hopper kernel unable to transfom the data
- Edge labels not working for custom dataset HOT 1
- Fingerprint dataset not accessible using fetch_dataset function HOT 13
- NaN error when using Random walk kernel on certain datasets HOT 3
- I don't know how to use my own data for input HOT 5
- MultiDiGraph: graph_from_networkx() ValueError HOT 1
- modified random walk kernel giving all 1 scores for normalize=true HOT 1
- Difference between ShortestPath and ShortestPathAttr HOT 2
- Generating graphs from dense/numpy matrices HOT 1
- Graph.clone() or Graph.copy() HOT 3
- Explicit graph kernels HOT 2
- Graphlet kernel HOT 1
- One vs many comparison for WeisfeilerLehman; `.transform` gives `TypeError: each element of X must have either a graph with...` HOT 1
- RWK matrix returns by fit_transform by RandomWalkLabeled() contains only 1.0 HOT 7
- Error when using EdgeHistogram HOT 1
- Can't install on Windows 10, keep getting same error HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from grakel.