Comments (10)
I agree with @LeoLaugier that his test should pass, afaiu if a node is of k-value k, then it's connected to at least k nodes of k-value âĽk
I looked at the code and in particular at minheap implementation, and found 2 bugs and a question:
- Why reimplement min-heap instead of using std::priority_queue
- [BUG] decrease_key misses a
p = parent(pos)
in the loop - [BUG] pop_min misses a
self.pos[self.val[0]] = 0
.
index 7bb2423d..f9ae7ca2 100644
--- a/sknetwork/utils/minheap.pyx
+++ b/sknetwork/utils/minheap.pyx
@@ -75,6 +75,7 @@ cdef class MinHeap:
while (pos != 0) and (scores[self.val[p]] > scores[self.val[pos]]):
self.swap(pos, p)
pos = p
+ p = parent(pos)
cdef int pop_min(self, int[:] scores):
"""Remove and return the minimum element (or root) from the heap."""
@@ -85,6 +86,7 @@ cdef class MinHeap:
# Store the minimum value, and remove it from heap
cdef int root = self.val[0]
self.val[0] = self.val[self.size-1]
+ self.pos[self.val[0]] = 0
self.size -= 1
self.min_heapify(0, scores)
However, even with those, I get the issue:
>>> import sknetwork
>>> import scipy
>>> x = scipy.sparse.load_npz("/tmp/biadjacency_full.npz")
>>> from sknetwork.utils import *
>>> from sknetwork.topology import *
>>> adjacency = bipartite2undirected(x)
>>> kcore = CoreDecomposition()
>>> kcore.fit(adjacency)
CoreDecomposition()
>>> for indice in range(adjacency.shape[0]):
... k=kcore.labels_[indice]
... x = 0
... for adj in range(adjacency.indptr[indice], adjacency.indptr[indice+1]):
... if kcore.labels_[adj] >= k:
... x = x+1
... print(indice, k, x)
... assert(x>=k)
...
0 5 4
Traceback (most recent call last):
File "<stdin>", line 8, in <module>
AssertionError
>>> adjacency.indptr
array([ 0, 10, 13, ..., 3593494, 3593495, 3593496],
dtype=int32)
>>> adjacency.indices[0:10]
array([24389, 24390, 24391, 24392, 24393, 24394, 24395, 24396, 24397,
24398], dtype=int32)
>>> kcore.labels_[24389:24398]
array([ 2, 38, 30, 5, 4, 38, 1, 1, 1], dtype=int32)
(4 values âĽ5)
Still digging...
Edit: The error was in my test.
for indice in range(adjacency.shape[0]-1):
k=kcore.labels_[indice]
x = 0
for adj in range(adjacency.indptr[indice], adjacency.indptr[indice+1]):
if kcore.labels_[adjacency.indices[adj]] >= k:
x = x+1
print(indice, k, x)
assert(x>=k)
passes with the diff above
from scikit-network.
Hi Leo,
I have assigned @tbonald because I suspect an error creeped in when @tbonald merged the Bi*
functions with the regular ones a few weeks ago. I'll have a deeper look too, if the error doesn't seem obvious to Thomas.
Cheers,
from scikit-network.
(note: with a priority_queue
, it would look like this: )
diff --git a/sknetwork/topology/kcore.pyx b/sknetwork/topology/kcore.pyx
index e180f219..adc7feaa 100755
--- a/sknetwork/topology/kcore.pyx
+++ b/sknetwork/topology/kcore.pyx
@@ -8,6 +8,8 @@ Created on Jun 3, 2020
@author: Yohann Robert <[email protected]>
"""
cimport cython
+from libcpp.queue cimport priority_queue
+from libcpp.pair cimport pair
import numpy as np
cimport numpy as np
@@ -15,7 +17,6 @@ cimport numpy as np
from scipy import sparse
from sknetwork.utils.base import Algorithm
-from sknetwork.utils.minheap cimport MinHeap
@cython.boundscheck(False)
@@ -40,23 +41,27 @@ cdef fit_core(int[:] indptr, int[:] indices):
cdef int min_node # current node of minimum degree
cdef int i, j, k
cdef int[:] degrees = np.asarray(indptr)[1:] - np.asarray(indptr)[:n]
- cdef np.ndarray[int, ndim=1] labels = np.empty((n,), dtype=np.int32)
- cdef MinHeap mh = MinHeap.__new__(MinHeap, n) # minimum heap with an update system
+ cdef np.ndarray[int, ndim=1] labels = np.full((n,), -1, dtype=np.int32)
+ cdef priority_queue[pair[int, int]] pq
- # inserts all nodes in the heap
+ # inserts all nodes in the queue
for i in range(n):
- mh.insert_key(i, degrees)
+ pq.push((-degrees[i], i))
i = n - 1 # index of the rear of the list/array
- while not mh.empty(): # until the heap is emptied
- min_node = mh.pop_min(degrees)
+ while not pq.empty():
+ min_node = pq.top().second
+ pq.pop()
+ if labels[min_node] > -1:
+ continue
core_value = max(core_value, degrees[min_node])
# decreases the degree of each neighbors of min_node to simulate its deletion
for k in range(indptr[min_node], indptr[min_node+1]):
j = indices[k]
degrees[j] -= 1
- mh.decrease_key(j, degrees) # updates the heap to take into account the new degrees
+ if labels[j] < 0:
+ pq.push((-degrees[j], j))
labels[min_node] = core_value
i -= 1
from scikit-network.
Hi again @LeoLaugier ! Sorry for the delay, I finally could look into it. I'm not sure there is a bug in the CoreDecomposition
module. Indeed, if I replace your last two lines of code with the following:
faulty_nodes = []
for indice in range(adjacency.shape[0]):
if kcore_nodes[indice] and adjacency.indptr[indice + 1] - adjacency.indptr[indice] < k:
faulty_nodes.append(indice)
then faulty_nodes
is indeed empty at the end on your data. This is also true when checking the degree of the neighbours in the core (in other words, the core property is respected on the whole core subgraph):
faulty_nodes = []
for indice in range(adjacency.shape[0]):
if kcore_nodes[indice] and adjacency.indptr[indice + 1] - adjacency.indptr[indice] < k:
if len([degree(x) for x in adjacency.indices[adjacency.indptr[indice + 1] - adjacency.indptr[indice]] if kcore_nodes[x] ]) < k:
faulty_nodes.append(indice)
len(faulty_nodes), faulty_nodes[:10]
from scikit-network.
Thanks @mclegrand, to answer your question, the priority queue cannot be cimported in Cython (see here for available std classes). There may be ways to make it work that I am not aware of though.
from scikit-network.
It is here, though: https://github.com/cython/cython/blob/master/Cython/Includes/libcpp/queue.pxd (it's a type of queue
)
from scikit-network.
Then the min heap is probably superfluous indeed.
from scikit-network.
Thanks @mclegrand! Could you please commit your changes so we can close the issue? I think refactoring to use the queue instead of the minheap is a bit out of the scope of the issue itself.
from scikit-network.
Thank you @mclegrand @QLutz @TiphaineV @tbonald : the k-core decomposition now works as expected on my dataset đ
from scikit-network.
Just to close this topic, I tried to time minheap vs priority_queue with
import sknetwork
from time import time
import scipy
x = scipy.sparse.load_npz("/home/mc/Downloads/biadjacency_full.npz")
from sknetwork.utils import *
from sknetwork.topology import *
adjacency = bipartite2undirected(x)
start_time = time()
kcore = CoreDecomposition()
kcore.fit(adjacency)
print("--- %s seconds ---" % (time() - start_time))
and got
pqueue
--- 0.5988922119140625 seconds ---
--- 0.6231446266174316 seconds ---
--- 0.5909104347229004 seconds ---
--- 0.6011207103729248 seconds ---
--- 0.6517014503479004 seconds ---
--- 0.6074094772338867 seconds ---
--- 0.6160576343536377 seconds ---
--- 0.639246940612793 seconds ---
--- 0.5900239944458008 seconds ---
minheap
--- 0.49435877799987793 seconds ---
--- 0.5180974006652832 seconds ---
--- 0.5038821697235107 seconds ---
--- 0.5666232109069824 seconds ---
--- 0.5060639381408691 seconds ---
--- 0.4990103244781494 seconds ---
--- 0.5349819660186768 seconds ---
--- 0.5090928077697754 seconds ---
--- 0.5094292163848877 seconds ---
--- 0.5129563808441162 seconds ---
15% better for minHeap so no need to get rid of it
from scikit-network.
Related Issues (20)
- data.load_edge_list is missing in 0.26.0 version HOT 1
- Single commodity flow feature HOT 1
- umpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObjec HOT 5
- python -m pip install scikit-network/ fails because it is looking for /triangles.cpp: HOT 1
- Installing the latest version HOT 2
- Verbose argument not expected in regression.Dirichlet fit_transform method though in description HOT 4
- Original node name mappings HOT 1
- PageRank-based recommendations HOT 1
- module 'sknetwork.data' has no attribute 'from_edge_list' HOT 5
- Error when cutting Paris dendrogram HOT 6
- KMeans does not work with numpy.ndarray HOT 1
- Feature Request GNN for KNN HOT 7
- Error installing on Linux for Python 3.11 HOT 2
- cython: linetrace=True causes TypeError: 'NoneType' object is not callable error HOT 8
- from sknetwork.utils import edgelist2adjacency, edgelist2biadjacency HOT 1
- do you have some where description for bimodularity HOT 5
- From Louvain to Leiden: guaranteeing well-connected communities. HOT 2
- [Feature Request] Add dynamic tree cutting for dendrograms hierarchy cuts HOT 1
- Error when installing "pip install scikit-network" HOT 2
- Louvain algorithm issue with louvain_core.pyx? ValueError: Buffer dtype mismatch, expected 'int' but got 'long' HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
đ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. đđđ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google â¤ď¸ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from scikit-network.