Giter VIP home page Giter VIP logo

cdlib's Introduction

CDlib - Community Detection Library

codecov Build Documentation Status CodeQL pyversions PyPI version Anaconda-Server Badge Code style: black Downloads Downloads DOI SBD++

plot

CDlib is a meta-library for community detection in complex networks: it implements algorithms, clustering fitness functions as well as visualization facilities.

CDlib is designed around the networkx python library: however, when needed, it takes care to automatically convert (from and to) igraph object so to provide an abstraction on specific algorithm implementations to the final user.

CDlib provides a standardized input/output facilities for several Community Discovery algorithms: whenever possible, to guarantee literature coherent results, implementations of CD algorithms are inherited from their original projects (acknowledged on the documentation).

If you use CDlib as support to your research consider citing:

G. Rossetti, L. Milli, R. Cazabet. CDlib: a Python Library to Extract, Compare and Evaluate Communities from Complex Networks. Applied Network Science Journal. 2019. DOI:10.1007/s41109-019-0165-9

Tutorial and Online Environments

Check out the official tutorial to get started!

If you would like to test CDlib functionalities without installing anything on your machine consider using the preconfigured Jupyter Hub instances offered by SoBigData++.

Installation

CDlib requires python>=3.8.

To install the latest version of our library just download (or clone) the current project, open a terminal and run the following commands:

pip install -r requirements.txt
pip install -r requirements_optional.txt # (Optional) this might not work in Windows systems due to C-based dependencies.
pip install .

Alternatively use pip

pip install cdlib

or conda

conda create -n cdlib python=3.9
conda config --add channels giuliorossetti
conda config --add channels conda-forge
conda install cdlib

Optional Dependencies (pip package)

To simplify the installation process, the default installation does not include optional dependencies (e.g., graph-tool). If you need them, you can install them manually or run the following command:

pip install 'cdlib[C]'

This option, safe for *nix users, will install all those optional dependencies that require C code compilation.

pip install 'cdlib[pypi]'

This option will install all those optional dependencies that are not available on conda/conda-forge.

pip install 'cdlib[all]'

This option will install all optional dependencies accessible with the flag C and pypi.

(Advanced)

Due to some strict requirements, the installation of a subset of optional dependencies is left outside the previous procedures.

graph-tool

CDlib integrates the support for SBM models offered by graph-tool. To install it refer to the official documentation and install the conda-forge version of the package (or the deb version if in a *nix system).

ASLPAw

Since its 2.1.0 release ASLPAw relies on gmpy2 whose installation through pip is not easy to automatize due to some C dependencies. To address such issue test the following recipe:

conda install gmpy2 
pip install shuffle_graph>=2.1.0 similarity-index-of-label-graph>=2.0.1 ASLPAw>=2.1.0

In case this does not solve the issue, please refer to the official gmpy2 installation instructions.

Optional Dependencies (Conda package)

CDlib relies on a few packages not available through conda: to install them please use pip.

pip install pycombo
pip install GraphRicciCurvature

conda install gmpy2 
pip install shuffle_graph>=2.1.0 similarity-index-of-label-graph>=2.0.1 ASLPAw>=2.1.0

In case ASLPAw installation fails, please refer to the official gmpy2 installation instructions.

Collaborate with us!

CDlib is an active project, any contribution is welcome!

If you like to include your model in CDlib feel free to fork the project, open an issue and contact us.

How to contribute to this project?

Contributing is good, doing it correctly is better! Check out our rules, issue a proper pull request /bug report / feature request.

We are a welcoming community... just follow the Code of Conduct.

cdlib's People

Contributors

aponom84 avatar beckedorf avatar danieledler avatar deklanw avatar dsalvaz avatar dummyindex avatar edgeber avatar eternity1984 avatar fossabot avatar gilcampos avatar giuliorossetti avatar hungyi5 avatar letiziam avatar oliver-lloyd avatar pitmonticone avatar praveer1981 avatar pyup-bot avatar seoanezonjic avatar shakebeforeuse avatar yquetzal avatar zhqu1148980644 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

cdlib's Issues

Compatibility issue with python3.8?

Hello!

I just tried using CDlib, from a pip installation on kubuntu (python3.8 as mentioned in the title).
When importing algorithms from cdlib, I get the following error:

ImportError: cannot import name 'clock' from 'time' (unknown location)

It seems that clock is deprecated in python3.8.
Is python3.8 incompatible with CDlib? Or am I doing something wrong?

Thanks a lot!
Anthony

Memory Error when using Lemon

Describe the bug
Hi, I am trying to do local community detection on a moderately large graph (850,000 nodes) and I am receiving memory errors asking for enormous amounts of memory.

Thanks for an amazing resource; I really appreciate all the work you are putting into this.

See below:


MemoryError Traceback (most recent call last)
in
----> 1 lemon_res=algorithms.lemon(Gm,seed_papers,min_com_size=3,max_com_size=300000)

~/miniconda3/envs/data/lib/python3.8/site-packages/cdlib/algorithms/overlapping_partition.py in lemon(g_original, seeds, min_com_size, max_com_size, expand_step, subspace_dim, walk_steps, biased)
456
457 graph = convert_graph_formats(g_original, nx.Graph)
--> 458 graph_m = nx.convert_matrix.to_numpy_array(graph)
459
460 node_to_pos = {n: p for p, n in enumerate(graph.nodes())}

~/miniconda3/envs/data/lib/python3.8/site-packages/networkx/convert_matrix.py in to_numpy_array(G, nodelist, dtype, order, multigraph_weight, weight, nonedge)
1135 else:
1136 # Graph or DiGraph, this is much faster than above
-> 1137 A = np.full((nlen, nlen), np.nan, order=order)
1138 for u, nbrdict in G.adjacency():
1139 for v, d in nbrdict.items():

~/miniconda3/envs/data/lib/python3.8/site-packages/numpy/core/numeric.py in full(shape, fill_value, dtype, order)
323 if dtype is None:
324 dtype = array(fill_value).dtype
--> 325 a = empty(shape, dtype, order)
326 multiarray.copyto(a, fill_value, casting='unsafe')
327 return a

MemoryError: Unable to allocate 5.26 TiB for an array with shape (850509, 850509) and data type float64

To Reproduce
Steps to reproduce the behavior:

  • CDlib version ( 0.1.9)
  • Operating System Ubuntu 18.04 on Windows
  • Python version 3.7
  • Version(s) of CDlib required libraries
    networkx 2.4
    Expected behavior
    Return communities for this subgraph.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.

[Clarification question] Is bipartite version of Infomap already implemented?

I was checking out the implemented bipartite algorithms in documentation (https://cdlib.readthedocs.io/en/latest/reference/cd_algorithms/node_clustering.html?highlight=bipartite#bipartite-graph-communities) and also description of Infomap (https://cdlib.readthedocs.io/en/latest/reference/cd_algorithms/algs/cdlib.algorithms.infomap.html#cdlib.algorithms.infomap) and couldn't figure out if the bipartite version of Infomap (https://github.com/chrisbloecker/infomap-bipartite/tree/bipartite-only here is an example notebook showcasing its usage: https://github.com/chrisbloecker/infomap-bipartite/blob/master/examples/python/infomap-examples.ipynb) is already implemented, because it is not listed under bipartite algorithms.

Thanks (sorry if this is a naive question and I have missed it out somewhere, searching in documentation didn't yield results on this and infomap under algorithms seems to return only "NodeClustering" object and not a "BiNodeClustering" one)

label propagation algorithm

hello
I want to use a label propagation algorithm. This algorithm is random that in each run the result is different. I run the algorithm multi-time but the result and communities are the same.

import networkx as nx
from cdlib import algorithms

g = nx.karate_club_graph()
coms = algorithms.label_propagation(g)
community = coms.to_node_community_map()
print(community)


also, I want to evaluate nmi metric for this result. nmi metric just uses nodeCluster object. I want to compare the result of the algorithm with a True label that is a list but it returns an error. how can I use nmi?

from cdlib import evaluation, algorithms

g = nx.karate_club_graph()
coms = algorithms.label_propagation(g)
community = coms.to_node_community_map()
NMI = evaluation.normalized_mutual_information(TrueLabel,coms)

Extracting communities and node names as vectors

Hello,
I used Louvain algorithm on a network (1299 nodes) and cdLib works perfect. I did clustering evalution. I am planning to do the same on a bigger network (about 15000 nodes).

Is it possible to extract clustering results as a 2 column file with one column as node name (names as given in the graphml file) and other column as the cluster that node belongs?

I am relatively new to Python so I could not achieve it. Does cdLib have a function like this?

Thank you in advance

ModuleNotFoundError: Optional dependency not satisfied: install igraph to use the selected feature.

Describe the bug
When I used the viz module to visualize the evaluation functions between the detected communities and the ground-truth communities, I was told to install igraph which I had installed before.

To Reproduce
Steps to reproduce the behavior:

  • CDlib version 0.1.8
  • Operating System win10
  • Python version 3.6.3
  • Version(s) of CDlib required libraries python-igraph 0.7.1.post6

Expected behavior
Plot the communities evaluation functions.

Screenshots

image

the function 'link_modularity'in evaluation metrics should be checked

Is your feature request related to a problem? Please describe.
the function 'link_modularity'in evaluation metrics should be checked , I tried it in some networks like karate club, football and dblp et through slpa,louvain\bigclam and other algorithm.it got a lower score than other paper reported.it always below 0.2.but the paper 'Overlapping community detection in networks: The state-of-the-art and comparative study' show it higher than I have got.I guess maybe the function should be checked.

Additional context
my sample codes:
from cdlib import algorithms,evaluation as ev
import networkx as nx
import pandas as pd
import numpy as np
g=nx.karate_club_graph()
print(g.name)
r_l=[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9]
m_score={}
for i in r_l:
i_j={}
for j in range(10):
c_cdlib=algorithms.slpa(g,t=30,r=i)
m_cdlib=ev.link_modularity(g,c_cdlib).score
p=ev.conductance(g,c_cdlib).score
print(p)
print(m_cdlib)
i_j[j]=m_cdlib
m_score[i]=i_j
m_score=pd.DataFrame(m_score)
m_score.to_excel('work/slpa.xlsx')

ModuleNotFoundError: Optional dependency not satisfied: install package wurlitzer to use infomap

Describe the bug
I am using pycharm IDE and am attempting to use your library's infomap community detection on a directed weighted graph created via networkx but I am getting the error module not found. I installed wurlitzer latest edition and infomap's latest version but it does not work. I installed all packages within PyCharm IDE

Error
Traceback (most recent call last):
File "C:/A/A/PycharmProjects/A/a.py", line 5, in
coms = algorithms.infomap(G)
File "C:\A\A\PycharmProjects\A\venv\lib\site-packages\cdlib\algorithms\crisp_partition.py", line 686, in infomap
raise ModuleNotFoundError("Optional dependency not satisfied: install package wurlitzer to use infomap.")
ModuleNotFoundError: Optional dependency not satisfied: install package wurlitzer to use infomap.

Code used

from cdlib import algorithms
import networkx as nx
from graphClass import G

coms = algorithms.infomap(G)
print(coms)

To Reproduce
Steps to reproduce the behavior:

  • CDlib version 0.1.10
  • Operating System Windows 10
  • Python version 3.7
  • Required libraries installed; infomap, wurlitzer

Question and Possible Discussion regarding Role Discovery as an addition to CDLib

I would like to ask something regarding Role Detection as a derivation of Community Detection:
Is it possible to consider role detection (role discovery, role similarity, role-based similarity), since it seems that research regarding roles are still a bit too new and that most research tends to treat it as a sub-problem of Community Detection?

For reference regarding lists of Role Detection algorithms:

Duration difference between MCL in cdLib, reference library and markov-clustering library

Hello,
I am so sorry if my question is inappropriate. I want to use MCL to cluster my network (about 17000 nodes), but kernel crashes everytime (I have 16 GB RAM).
Another library that does MCL clustering (markov-clustering library). I believe the difference comes from the adjacency matrix formats. That library works but I cannot import the results as cdLib NodeClustering object to do some comparisons such as ARI.

Is there any way to complete this task? Is it possible to import another clustering result directly to cdLib?

Thank you in advance

How do I increase the maximum number of communities in a graph visualization?

Hi, May I please ask you two questions in graph visualization?

  1. The maximum number of communities is 16 (the same as the number of COLOR elements); is there a reason why it's limited to 16?
  2. Do you plan to apply a lighter color to a label if the node is a dark color?

Is your feature request related to a problem? Please describe.

  • I'm stuck with a limited number of communities
  • If the node is dark, the label is hard to see

Describe the solution you'd like

  • I don't want the number of communities to be limited.
  • I'd like to determine the color of the label based on the relative luminance of the node's color.
  • I'd like to apply a color map if possible.

Describe alternatives you've considered
I've tried to create an implementation.

Additional context

font color

G = nx.karate_club_graph()
coms = algorithms.louvain(G)
pos = nx.spring_layout(G, k=0.3)

# viz.plot_network_clusters(G, coms, pos, plot_overlaps=True, plot_labels=True, cmap=None)
# viz.plot_network_clusters(G, coms, pos, plot_overlaps=True, plot_labels=True, cmap="jet")
viz.plot_network_clusters(G, coms, pos, plot_overlaps=True, plot_labels=True, cmap=plt.cm.Wistia)

image

infomap

Hello,

I am new to network science. when i run this code

from cdlib import algorithms
import networkx as nx
G = nx.karate_club_graph()
coms = algorithms.infomap(G)

i am getting the following error

TypeError: 'Network' object is not callable

Please check.

Thanks

plot_network_clusters() got an unexpected keyword argument 'top_k'

Describe the bug
top_k attr not recognised

To Reproduce
Steps to reproduce the behavior:

  • CDlib version 2021-02-23

  • Operating System big sur

  • Python version conda 4.9.2 python 3.8

  • Version(s) of CDlib required libraries

Expected behavior
Should plot the k most influential clusters, according to the docs

Screenshots
TypeError Traceback (most recent call last)
in
2
3 pos = nx.spring_layout(g)
----> 4 viz.plot_network_clusters(g, leiden_coms, pos, figsize=(20, 20), plot_labels=True, top_k=5)

TypeError: plot_network_clusters() got an unexpected keyword argument 'top_k'

Example for leiden community detection with edge weights

I have tried using the leiden algorithm with edge weight and have not been able to figure it out. Can you provide an example or tell me what I am doing wrong?

To use weight with Louvain, one simply does this:
louvain_coms = algorithms.louvain(g, weight='weight', resolution=0.1, randomize=False)
where 'weight' is an edge property (I am using network to build the initial graph g). This works.

When I try this with leiden, I get this error:
leiden_coms = algorithms.leiden(g, weights='weight')

I get the error below. Can you help me see what I am doing wrong?


KeyError Traceback (most recent call last)
in
----> 1 leiden_coms = algorithms.leiden(g, weights='weight')
2 comms = leiden_coms.communities
3 overlap = leiden_coms.overlap
4 print("overlap",overlap)
5 commNum = 0

~/.local/lib/python3.6/site-packages/cdlib/algorithms/crisp_partition.py in leiden(g, initial_membership, weights)
368
369 part = leidenalg.find_partition(g, leidenalg.ModularityVertexPartition,
--> 370 initial_membership=initial_membership, weights=weights
371 )
372 coms = [g.vs[x]['name'] for x in part]

~/.local/lib/python3.6/site-packages/leidenalg/functions.py in find_partition(graph, partition_type, initial_membership, weights, n_iterations, seed, **kwargs)
82 partition = partition_type(graph,
83 initial_membership=initial_membership,
---> 84 **kwargs)
85 optimiser = Optimiser()
86

~/.local/lib/python3.6/site-packages/leidenalg/VertexPartition.py in init(self, graph, initial_membership, weights)
448 if weights is not None:
449 if isinstance(weights, str):
--> 450 weights = graph.es[weights]
451 else:
452 # Make sure it is a list

KeyError: 'Attribute does not exist'

Provide guidance for algorithm selection?

It seems like a small table or flowchart could be useful for this library. Something like:

"< 5000 nodes, sparse. Try these: ... "
"< 5000 nodes, dense. Try these: ... "
">= 5000 nodes, sparse. Try these: ... "
">= 5000 nodes, dense. Try these: ... "

I'm not sure what the exact criteria would be, but this is just a rough idea. In particular, some of these algorithms aren't very feasible for large graphs. At least some big O runtime notes would be helpful.

Also, it seems like in most cases someone would either be using algorithms with a preset number of communities, or not. It seems like separating these algorithms might be useful.

Afaik some of these algorithms are considered to be strict improvements on others,
CONGO > CONGA
Angel > Demon
In these cases would it be helpful to put the supplanted algorithms somewhere separate?

Package dependencies pegged to minor versions

Thanks for the useful package.

Like presumably most users of this package, I use Anaconda. When I install the package through pip (pip install cdlib), it tries to replace core Anaconda packages such as scipy and pandas with slightly different versions. To avoid that, I installed with --no-deps and manually got the dependencies I needed. The package works, but I still get warnings from pip:

$ pip check
cdlib 0.1.7 requires sphinx, which is not installed.
cdlib 0.1.7 requires sphinx-rtd-theme, which is not installed.
cdlib 0.1.7 has requirement matplotlib==3.1.0, but you have matplotlib 3.1.1.
cdlib 0.1.7 has requirement pandas==0.24.2, but you have pandas 0.25.1.
cdlib 0.1.7 has requirement scipy==1.3.0, but you have scipy 1.3.1.
cdlib 0.1.7 has requirement tqdm==4.32.2, but you have tqdm 4.32.1.

The dependencies should probably not be specified so exactly.

Can't install package

I would like to try this package but I couldn't install it.

I tried to install CDlib lastest version
Operating System: macOS Catalina 10.15.7
Python version: 3.8.6

when I installed it, I got this error:
ERROR: Could not find a version that satisfies the requirement completely-shuffle>=0.9.2 (from shuffle-graph>=1.1.2->ASLPAw->cdlib==0.1.9) (from versions: none)
ERROR: No matching distribution found for completely-shuffle>=0.9.2 (from shuffle-graph>=1.1.2->ASLPAw->cdlib==0.1.9)

I have tried to install the required packages manually but didn't work either. Is this because of my computer? Do you have any advice?

Thank you
Best regards.

Rename your example tutorial notebook

Describe the bug
I was checking out the CDlib.ipynb tutorial notebook on colab (https://colab.research.google.com/github/KDDComplexNetworkAnalysis/CNA_Tutorials/blob/master/CDlib.ipynb). At one point, I was happy with the things I learned so I decided to download it as a python script to use on my machine. Since the python file exported by Colab is named CDlib.py too, it was giving circular import error.

To Reproduce
Export colab notebook to python script (not jupyter notebook, a simple python script). If you then are running python in the same directory, it won't import CDlib since the name is similar to the python script in the same directory.

To Solve
If you rename the jupyter notebook from "CDlib.ipynb" to something like "CDlib_tutorial.ipynb" will resolve this minor issue.

Thanks for CDlib!

Additional community evaluation scores

Add relevant community evaluation scores.

List of requests collected via email (to check for applicability, some seems not to apply to network clustering):

  • Silhouette score,
  • Davies-Bouldin (DB) index,
  • Calinski-Harabasz index(CH)
  • Dunn Index
  • Directed-weighted modularity
  • ...

Please integrate and fill missing references.

an error named "To avoid name collision with the igraph project, this visualization library has been renamed to 'jgraph'. Please upgrade when convenient." will arise when use the cdlib

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

  • CDlib version
  • Operating System
  • Python version
  • Version(s) of CDlib required libraries

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.

Unable to import name 'clock' from 'time' in cdlib

Hello,
I'm trying to use cdlib and have following error, while running "from cdlib import algorithms"

ImportError: cannot import name 'clock' from 'time' (unknown location)

I searched and read other same issues and most of them were suggesting to change clock by either time.perf_counter or time.process_time .

Would you please help me how to solve this problem (how/where to change it)?
I just want to use some overlapping algorithms like angel or bigclam according to your documentation.

  • CDlib version >> 0.1.8
  • Operating System >> Win 10 - 64-bit
  • Python version >> 3.8.2
  • Version(s) of CDlib required libraries

python-igraph 0.8.0
angel-cd 1.0.2
infomap 1.0.11
(I only could not install leidenalg--had some errors)


ImportError Traceback (most recent call last)
in
----> 1 from cdlib import algorithms

~\AppData\Local\Programs\Python\Python38\lib\site-packages\cdlib\algorithms_init_.py in
1 from .edge_clustering import *
2 from .crisp_partition import *
----> 3 from .overlapping_partition import *
4 from .attribute_clustering import *
5 from .biparitte_clustering import *

~\AppData\Local\Programs\Python\Python38\lib\site-packages\cdlib\algorithms\overlapping_partition.py in
19 from cdlib.algorithms.internal.LAIS2_nx import LAIS2
20 from cdlib.algorithms.internal.lfm import LFM_nx
---> 21 from cdlib.algorithms.internal import LEMON
22 from cdlib.algorithms.internal.SLPA_nx import slpa_nx
23 from cdlib.algorithms.internal.multicom import MultiCom

~\AppData\Local\Programs\Python\Python38\lib\site-packages\cdlib\algorithms\internal\LEMON.py in
23 import numpy as np
24 import math
---> 25 import pulp
26 from scipy import linalg as splin
27 import gc

~\AppData\Local\Programs\Python\Python38\lib\site-packages\pulp_init_.py in
32 from .constants import VERSION
33
---> 34 from .pulp import *
35 from .amply import *
36 doc = pulp.doc

~\AppData\Local\Programs\Python\Python38\lib\site-packages\pulp\pulp.py in
100
101 from .constants import *
--> 102 from .solvers import *
103 from collections import Iterable
104

~\AppData\Local\Programs\Python\Python38\lib\site-packages\pulp\solvers.py in
33 import os
34 import sys
---> 35 from time import clock
36 from uuid import uuid4
37 try:

ImportError: cannot import name 'clock' from 'time' (unknown location)

Thanks in advance,

Request to loosen package dependencies as much as possible

Hi there,

Thank you for the useful package!

I know you have removed the requirement for minor package versions in an earlier issue (#72 (comment)), but I was wondering if the requirements can be further loosened in general?

We're trying to install the package on the department server, and when we do so, it downgrades a number of the other packages. Because the requirements are a bit restrictive right now, it is preventing us from using it widely.

Thank you for considering this.

Best,
Ted

Fitness function from NodeClustering object versus from evaluation module

When I do this

from cdlib import evaluation as ev

...

cs = al.infomap(g)
print(f"Average internal degree {ev.average_internal_degree(g, cs)}")
print(f"Average internal degree {cs.average_internal_degree()}")

I get identical results. When I change the algorithm like this:

from cdlib import evaluation as ev

...

cs = al.leiden(g)
print(f"Average internal degree {ev.average_internal_degree(g, cs)}")
print(f"Average internal degree {cs.average_internal_degree()}")

The latter is FitnessResult(min=0, max=0, score=0.0, std=0.0 while the former looks fine.

And, for other algorithms the latter seems to be giving an error! max_odf with the former method works fine, but with the latter it throws a ValueError.

Am I confused about something or is this a bug?

overlapping NMI

-overlapping NMI checks that the node coverage of both clusterings is the same, while it is not relevant (if I'm right, at least it seems to work if we bypass it)
-There is no check done on the parameter "communities" of the NodeClustering class, and the documentation says only it require "list of communities" but the form of a community is not clear. Using frozenset to represent each community, I got a bug calling overlapping NMI. We should be more careful with the format of this parameter (check+doc)

(I can do it later, but too busy this week)

plot_scoring to be checked and reimplemented

Describe the bug
The plot_scoring function seems to be broken.

I removed its implementation from the master branch to prevent it to be used.
Please @Yquetzal check it if you have time: if I remember correctly you added/modified a few months ago.

Bugs

  1. I think that you have an error in your formula :
    internal_edge_density = ec/(ns*(ns-1))/2.

  2. Also, in the formula that calculates the z_modularity

fitness.py (line : 709) :

for node in c:
dc += c.degree(node)

I think that the correct dc is calculated by graph.degree(node) instead of c.degree(node)

there is a bug about graph attribute

Describe the bug
when I use the overlapping community detection algorithm 'LFM',there will be a error named'Graph' object has no attribute 'node'.I tried upgrade the cdlib ,but it didn't work.I think the networks didn't support the graph attribute node and it's has been changed to attribute nodes.likely g.nodes() not g.node().Maybe you can check it .

Expected behavior
check the graph attribute nodes in networkx

I have a issues

when i runing

from cdlib.algorithms import louvain
import networkx as nx
g = nx.karate_club_graph()
communities = louvain(g)
mod = communities.adjusted_mutual_information([[1,2],[3,4]])

nodes_second = {node: None for community in second_partition.communities for node in community}
AttributeError: 'list' object has no attribute 'communities'

Implementation of AGDL

Hello, I find something strange about the implementation of AGDL algorithm.

The input in the referenced implementation is a data matrix. https://github.com/myungjoon/GDL

def AGDL(data, targetClusterNum, Ks, Kc):

Following the paper, they first computed pairwise distance of the input data and convert the pairwise distance to pairwise similarity then construct a k nearest neighbor graph.

The implementation in cdlib/cdlib/algorithms/internal/AGDL.py requires a graph as input for consistency.

def Agdl(g, target_cluster_num, ks, kc, a):
    data = nx.to_numpy_matrix(g)
    distance, indices = __knn(ks, data)
    ...

Nevertheless, I thought nx.to_numpy_matrix(g) returns a pairwise similarity matrix of input data (In graph, higher edge weight means closer connection.) with size (n_sample, n_sample) rather than data with size (n_sample, n_features). So it is unnecessary and not suitable to apply NearestNeighbors().fit(x) (in __knn()).

The other thing (not a problem) is __w_matrix does not require data

def __w_matrix(data, distance, indices, ks, a=10):
    ...
    weight_matrix[i][j] = np.exp(-1 * (np.linalg.norm(data[i] - data[j]) ** 2) / sigma2)

is equivalent to

weight_matrix[i][j] = np.exp(-1 * (np.linalg.norm(distance[i,j]) ** 2) / sigma2)

Willing to discuss further. Thanks.

Python 3.8 error within required pulp library because of cdlib's requirement to use an outdated version of pulp

Describe the bug
A clear and concise description of what the bug is.
It seems cdlib requires only pulp version 1.6. However in Python 3.8, a function used in pulp is depreciated. A function calls "from time import clock" which is now depreciated in Python 3.8. Pulp has fixed the problem in recent versions (in particular 2.2, the most recent), but running "from cdlib import algorithms" causes an error when pulp tries to use the depreciated "clock" function. my solution was to manually download a new version of pulp (2.2), which generated a warning from pip that I created a conflict with cdlib, which requires 1.6. However, I was then able to run "from cdlib import algorithms" without issue.

To Reproduce
Steps to reproduce the behavior:

  • CDlib version
    0.1.8
  • Operating System
    MacOs Catalina
  • Python version
    3.8
  • Version(s) of CDlib required libraries
    pulp 1.6 vs pulp 2.2

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.
ERROR: cdlib 0.1.8 has requirement pulp==1.6.*, but you'll have pulp 2.2 which is incompatible.

Additional context
Add any other context about the problem here.
Please let me know if I can help troubleshoot anymore. I am really excited about cdlib and can't wait to start using it in projects. Thanks for all your hard work on this package!

CPMVertexPartition.Bipartite implementation

Is your feature request related to a problem? Please describe.
First, thanks for cdlib, it is amazing. I am using it for bipartite network analysis and community detection and then evaluation of results of different algorithms. There is currently only bimlpa implemented for it. I have a suggestion.

Describe the solution you'd like
Since you have already implemented CPMVertexPartition, I am guessing implementation of its bipartite (https://leidenalg.readthedocs.io/en/stable/multiplex.html?highlight=bipartite#bipartite) version should not be much complicated (I certainly hope so, but my background is in social sciences, so I cannot say for sure).

Describe alternatives you've considered
I have provided below a sample of code I have written to use the CPMVertexPartition.Bipartite, but I use it directly in Leidenalg package by https://github.com/vtraag . Do you think there is hope to have CPMVertexPartition.Bipartite in cdlib?

Additional context
This is an example of the code I run directly in leidenalg library:

def find_bipartite_partitions(graph, gamma, seed=0):
    """find_bipartite_partitions(graph, seed, gamma)
    
    Takes an igraph graph and returns the bipartite partitions detected with the provided gamma which is the internal density of the communities passed to CPM method in leiden algorithm package by Vincent Traag (https://leidenalg.readthedocs.io/en/latest/multiplex.html#bipartite). Results are returned as a pandas dataframe which includes vertices names, cluster membership and other node attributes in graph. It also return number of clusters detecetd and members in each of them to decide and change gamma if needed.
    
    Parameters
    ----------
    @param graph: igraph graph that needs to be bipartite (e.g., org/paper or author/paper) and named as we use vertex names and attributes in the detaframe returned.
    @param seed: the random seed to be used in CPM method to keep results/partitions replicable.
    @param gamma: CPM method (see leidenalg for more details: https://leidenalg.readthedocs.io/en/latest/multiplex.html#bipartite) takes a gamma which is the internal density of the communities detected and uses it as a resolution parameter.
    
    Examples
    --------
    >>> number_partitions, partitions_freq_members, results_df, p_01 = net_tables.find_bipartite_partitions(graph=gg, seed=0, gamma=1.625e-07)
    
    """

    import igraph as ig
    import leidenalg as la
    import pandas as pd

    optimiser = la.Optimiser()
    # set seed to get the same communities
    la.Optimiser.set_rng_seed(self=optimiser, value=seed)

    p_01, p_0, p_1 = la.CPMVertexPartition.Bipartite(graph, resolution_parameter_01=gamma)

    diff = optimiser.optimise_partition_multiplex([p_01, p_0, p_1], layer_weights=[1, -1, -1])

    # to see number of communities detected
    number_partitions = len(set(p_01.membership))
    # pd series of clusters with number of members in each
    partitions_freq_members = pd.Series(p_01.membership).value_counts()

    # add node attributes and cluster membership and build pandas table of results
    graph.vs['cluster'] = p_01.membership
    # for all other attributes
    all_attributes = dict()
    for node_attr in graph.vs.attributes():
        attr_list = [v[node_attr] for v in graph.vs]
        all_attributes[node_attr] = attr_list
    results_df = pd.DataFrame(all_attributes)

    return (number_partitions, partitions_freq_members, results_df, p_01)

Error converting graphs - cannot add edges, Invalid vertex id

Describe the bug
I have received the "InternalError: Error at type_indexededgelist.c:272: cannot add edges, Invalid vertex id" error trying to use the method algorithms.angel. I have a nx undirected weighted graph, created from a pandas data frame, and think that the error is in __from_nx_to_igraph at utils.py

To Reproduce
my graph:
g = nx.from_pandas_edgelist(df, 'source', 'target', 'weight')
When I try to see my edges:
list(g.edges())
I receive:
[(21198069, 28883519), (21198069, 28667573), (21198069, 26440037), ... ]
To see the weight of an edge:
g.edges[21198069, 28883519]
Output:
{'weight': 9}
When I try to get the communities with the command: coms = algorithms.angel(g, min_community_size=4, threshold=0.25)
I get the error:


InternalError                             Traceback (most recent call last)
<ipython-input-132-44b171f02346> in <module>
      1 #coms = algorithms.big_clam(g, 5)
----> 2 coms = algorithms.angel(g, min_community_size=4, threshold=0.25)

/opt/conda/lib/python3.6/site-packages/cdlib/algorithms/overlapping_partition.py in angel(g, threshold, min_community_size)
    120         raise ModuleNotFoundError("Optional dependency not satisfied: install igraph to use the selected feature.")
    121 
--> 122     g = convert_graph_formats(g, ig.Graph)
    123     with suppress_stdout():
    124         a = Angel(graph=g, min_comsize=min_community_size, threshold=threshold, save=False)

/opt/conda/lib/python3.6/site-packages/cdlib/utils.py in convert_graph_formats(graph, desired_format, directed)
    122         return __from_igraph_to_nx(graph, directed)
    123     elif ig is not None and desired_format is ig.Graph:
--> 124         return __from_nx_to_igraph(graph, directed)
    125     else:
    126         raise TypeError("The graph object should be either a networkx or an igraph one.")

/opt/conda/lib/python3.6/site-packages/cdlib/utils.py in __from_nx_to_igraph(g, directed)
     79     gi = ig.Graph(directed=directed)
     80     gi.add_vertices(list(g.nodes()))
---> 81     gi.add_edges(list(g.edges()))
     82     return gi
     83 

/opt/conda/lib/python3.6/site-packages/igraph/__init__.py in add_edges(self, es)
    253           endpoints. Vertices are enumerated from zero.
    254         """
--> 255         return GraphBase.add_edges(self, es)
    256 
    257     def add_vertex(self, name=None, kwds):

InternalError: Error at type_indexededgelist.c:272: cannot add edges, Invalid vertex id

  • CDlib version cdlib-0.1.7
  • Operating System mac os 10.12.5
  • Python version 3.6
  • Version(s) of CDlib required libraries

Additional context
I'm using jupyter notebook

ARI score error

Hello Sir,
Please check this.

from cdlib.algorithms import louvain
g = nx.karate_club_graph()
communities = louvain(g)
mod = communities.adjusted_rand_index([[1,2], [3,4]])
mod

gives this error

AttributeError Traceback (most recent call last)
in ()
2 g = nx.karate_club_graph()
3 communities = louvain(g)
----> 4 mod = communities.adjusted_rand_index([[1,2], [3,4]])
5 mod

2 frames
/usr/local/lib/python3.6/dist-packages/cdlib/evaluation/comparison.py in __check_partition_coverage(first_partition, second_partition)
18 def __check_partition_coverage(first_partition, second_partition):
19 nodes_first = {node: None for community in first_partition.communities for node in community}
---> 20 nodes_second = {node: None for community in second_partition.communities for node in community}
21
22 if len(set(nodes_first.keys()) ^ set(nodes_second.keys())) != 0:

AttributeError: 'list' object has no attribute 'communities'

Thanks & Regards,
Seema Rani

MemoryError at algorithms.big_clam

Describe the bug
I was unable to run the algorithms.big_clam algorithm in a graph with 2436844 edges, receiving a MemoryError.

To Reproduce
I'm instantiating my graph from a data frame with source, target and weight, and than trying to get the communities.

g = nx.from_pandas_edgelist(ds, 'source', 'target', 'weight')
coms = algorithms.big_clam(g, 500)

  • CDlib version cdlib-0.1.7
  • Operating System mac os 10.12.5
  • Python version 3.6
  • Version(s) of CDlib required libraries

Additional context
I'm using jupyter notebook

Exception trace

MemoryError Traceback (most recent call last)
in
----> 1 coms = algorithms.big_clam(g, 500)

/opt/conda/lib/python3.6/site-packages/cdlib/algorithms/overlapping_partition.py in big_clam(g, number_communities)
576 g = convert_graph_formats(g, nx.Graph)
577
--> 578 communities = BIGCLAM.big_Clam(g, number_communities)
579
580 return NodeClustering(communities, g, "BigClam", method_parameters={"number_communities": number_communities}, overlap=True)

/opt/conda/lib/python3.6/site-packages/cdlib/algorithms/internal/BIGCLAM.py in big_Clam(graph, number_communities)
76
77 def big_Clam(graph, number_communities):
---> 78 adj = nx.to_numpy_matrix(graph)
79 F = train(adj, number_communities)
80 F_argmax = np.argmax(F, 1)

/opt/conda/lib/python3.6/site-packages/networkx/convert_matrix.py in to_numpy_matrix(G, nodelist, dtype, order, multigraph_weight, weight, nonedge)
446 return M
447
--> 448
449 def from_numpy_matrix(A, parallel_edges=False, create_using=None):
450 """Returns a graph from numpy matrix.

/opt/conda/lib/python3.6/site-packages/networkx/convert_matrix.py in to_numpy_array(G, nodelist, dtype, order, multigraph_weight, weight, nonedge)
1125 # This occurs when there are fewer desired nodes than
1126 # there are nodes in the graph: len(nodelist) < len(G)
-> 1127 pass
1128
1129 A[np.isnan(A)] = nonedge

/opt/conda/lib/python3.6/site-packages/numpy/core/numeric.py in full(shape, fill_value, dtype, order)
300 shape : int or sequence of ints
301 Shape of the new array, e.g., (2, 3) or 2.
--> 302 fill_value : scalar
303 Fill value.
304 dtype : data-type, optional

MemoryError:

lfm recalculate function seems return wrong vertex

    def recalculate(self):
        for vid in self.nodes:
            fitness = self.cal_remove_fitness(vid)
            if fitness < 0.0:
                return vid
        return None

If return the vid will cause the new_fitness lower than old_fitness.
The right condition should be:

if fitness > 0.0:

Adding cdlib to Anaconda

Hello folks!
This package looks amazing and I can't wait to try it. Unfortunately, my work environment prevents me from installing packages other than those on the Anaconda repos.

I don't know how complicated it would be to add your package there and whether you could benefit from it, but I would be thrilled if you did.

Using installing pip is unfortunately not possible (in the foreseeable future).

Thank you very much!

Visualization of Edge clustering

Hi,

Can cdlib.classes.edge_clustering.EdgeClustering objects be plotted yet?
I could not find the answer in the user documentation.

Plotting a graph from a cdlib.classes.node_clustering.NodeClustering object is easy:

graph = nx.karate_club_graph()
communities = algorithms.louvain(graph)
pos = nx.spring_layout(graph)
viz.plot_network_clusters(graph, communities, pos)

But the same does not seem to work with an ..EdgeClustering object:

communities = algorithms.hierarchical_link_community(graph)
pos = nx.spring_layout(graph)
viz.plot_network_clusters(graph, communities, pos)

Thanks a lot for any help you can provide.
s.

New release planned?

Since the last release some features were added and requirements were updated (eg networkx). Do you plan to create a new release?

Thanks

Guidance for selection of algorithms listed in overlapping communities

Hello Everyone,

I'm trying to partition a networkx graph (with around 700 nodes) into ~5 to 7 subgraphs with overlapping nodes (overlap size >=4 nodes). I'd like to try the Overlapping Communities algorithms listed here.
Unfortunately, I don't have prior experience in this field and I'm not sure which algorithm will be apt for this case. I'd like to request for recommendations on some of the algorithms that I could try for partitioning the original graph into subgraphs with overlapping nodes.

User defined functions and other clarifications

I have several questions which pertain to custom functions for community partition and scoring

  1. If I have a graph partition of nodes then is there any way to convert it to a "CDLIB node clustering" object so that I can apply CDLIB functions, such as scoring, to it?

  2. Is there any way I can apply multiple scoring functions as evaluation parameters to grid_search or pool ensemble functions in CDLIB? Currently, it seems I can use only one, and to use others I need to run individually for every scoring function. But using multiple algorithms is possible.

  3. In the "pool" function under the ensemble approach, how to include community detection algorithms that don't have any parameter to assign such as INFOMAP? Currently, it is required by default that the number of methods and number of configurations must be the same. I think it inclusion of algorithms without any parameter requirement should be allowed with "None" as the definition in the configurations list.

Unable to perform grid search for fluid asyn community detection algo

I am trying to do a grid search for async fluid community detection method using the following commands

import networkx as nx
from cdlib import algorithms, ensemble,evaluation
G = nx.karate_club_graph()
k = ensemble.Parameter(name='k',start=5,end=20,step=1)
com_flud,scor_louv = ensemble.grid_search(graph=G,method=algorithms.async_fluid,parameters=[k],quality_score=evaluation.erdos_renyi_modularity,aggregate=max)

But when trying to execute it, I get the following error:

File "/usr/local/lib/python3.6/dist-packages/networkx/algorithms/community/asyn_fluid.py", line 74, in asyn_fluidc raise NetworkXError("k must be an integer.")

NetworkXError: k must be an integer.
I don't understand this error as the start, end and step all are integers then why I get this error?

tiles AttributeError

Hello and congratulations on your excellent work!

I have a problem using tiles algorithm. Actually I try your example
`from cdlib import algorithms
import dynetx as dn
import networkx as nx

dg = dn.DynGraph()
for x in range(10):
g = nx.erdos_renyi_graph(200, 0.05)
dg.add_interactions_from(list(g.edges()), t=x)
coms = algorithms.tiles(dg, 2)`

and get:
coms = algorithms.tiles(dg, 2)
AttributeError: module 'cdlib.algorithms' has no attribute 'tiles'

Why is that so?
Thanks in advance.

markov clustering listed as edge clustering rather than node clustering

Markov clustering (mcl) is listed as an edge clustering algorithm in the documentation
MCL is a node clustering algorithm. While it is true that there is an interpretation step of the sparse limit, this interpretation is most logically as a separation into connected components, i.e. node clusters. I don't quite see where the edge clustering comes from. Througout it's 20-year-ish lifetime, MCL has always been compared with other node-clustering algorithms, never with edge-clustering algorithms. Happy to discuss further here. Thanks, Stijn (mcl author).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.