Giter VIP home page Giter VIP logo

Comments (16)

xrotwang avatar xrotwang commented on July 1, 2024 1

I just added a --seed option to the CLI, and

import random
random.seed(args.seed)

at the top of the communities command. Here's what I get for seed 1 to 4:

(clics2) [email protected]@dlt4803010l:~/venvs/clics2/clics2$ clics -t 3 -s 1 -f families communities
INFO    loaded graph
INFO    starting infomap                                                                                      
INFO    converted graph...
INFO    finished infomap
INFO    computed cluster names
(clics2) [email protected]@dlt4803010l:~/venvs/clics2/clics2$ clics -t 3 -g infomap -f families graph-stats   -----------  ----
nodes        1534
edges        2630
components     95
communities   247
-----------  ----
(clics2) [email protected]@dlt4803010l:~/venvs/clics2/clics2$ clics -t 3 -s 2 -f families communitiesINFO    loaded graph
INFO    starting infomap                                                                                      
INFO    converted graph...
INFO    finished infomap
INFO    computed cluster names
(clics2) [email protected]@dlt4803010l:~/venvs/clics2/clics2$ clics -t 3 -g infomap -f families graph-stats   -----------  ----
nodes        1534
edges        2634
components     96
communities   249
-----------  ----
(clics2) [email protected]@dlt4803010l:~/venvs/clics2/clics2$ clics -t 3 -s 3 -f families communitiesINFO    loaded graph
INFO    starting infomap                                                                                      
INFO    converted graph...
INFO    finished infomap
INFO    computed cluster names
(clics2) [email protected]@dlt4803010l:~/venvs/clics2/clics2$ clics -t 3 -g infomap -f families graph-stats   -----------  ----
nodes        1534
edges        2645
components     95
communities   249
-----------  ----
(clics2) [email protected]@dlt4803010l:~/venvs/clics2/clics2$ clics -t 3 -s 4 -f families communitiesINFO    loaded graph
INFO    starting infomap                                                                                      
INFO    converted graph...
INFO    finished infomap
INFO    computed cluster names
(clics2) [email protected]@dlt4803010l:~/venvs/clics2/clics2$ clics -t 3 -g infomap -f families graph-stats   -----------  ----
nodes        1534
edges        2637
components     96
communities   247
-----------  ----

from clics2.

xrotwang avatar xrotwang commented on July 1, 2024 1

So I'd say this is rather a documentation issue, and we may think about adding this seed option into pyclics now.

from clics2.

xrotwang avatar xrotwang commented on July 1, 2024

Hah, probably hard to debug, but would be really interesting. I'm in the process of recreating clics2 sqlite, too, right now. What does pip freeze say in your virtualenv?

from clics2.

chrzyki avatar chrzyki commented on July 1, 2024

https://gist.github.com/chrzyki/13e11e8471791e6bd151b153ae294596

Just noticed: I installed pyclics from the clics2 repository - that might have got something to do with it?

from clics2.

xrotwang avatar xrotwang commented on July 1, 2024

from clics2.

xrotwang avatar xrotwang commented on July 1, 2024

Hm, funny enough I'm getting

$ clics -t 3 -g infomap -f families graph-stats   -----------  ----
nodes        1534
edges        2634
components     96
communities   248
-----------  ----

from clics2.

xrotwang avatar xrotwang commented on July 1, 2024

As a first step towards debugging, I inserted a

print(len(edges), len(ignore_edges))

here

G.remove_edges_from(ignore_edges)

to see whether we are removing different numbers of edges, or adding differently. My numbers are

50051  46467

from clics2.

chrzyki avatar chrzyki commented on July 1, 2024

Check!

50051 46467

from clics2.

LinguList avatar LinguList commented on July 1, 2024

Hi, important is that infomap is based on random walks, so different numbers of edges can happen. You need to seed the random function to caputre this, but I'm not sure if igraph allows for this... So I'd say: use nx.connected_components to check for same size, a sthis is the simplest cluster algorithm.

from clics2.

xrotwang avatar xrotwang commented on July 1, 2024

Ah. Ok, I was chasing down the wrong path! So the actual colexification network is created reproducibly, but the clustering isn't - in the absence of a fixed seed. So, (almost) nothing to see here, move on :)
Except, maybe, we want to find a seed that gives us the number of edges reported in the README :)

from clics2.

chrzyki avatar chrzyki commented on July 1, 2024

My bad, sorry, wasn't aware of the random walks for the clustering!

from clics2.

xrotwang avatar xrotwang commented on July 1, 2024

Hm. Turns out we already had

random.seed(123456)
numpy.random.seed(123456)

in src/pyclics/__main__.py. So then the question is where does infomap get its randomness from?

from clics2.

xrotwang avatar xrotwang commented on July 1, 2024

Stackoverflow seems to think what we do should be sufficient: https://stackoverflow.com/a/25726079

from clics2.

xrotwang avatar xrotwang commented on July 1, 2024

Ok, just ran a couple of tests: Running the community_infomap method multiple times immediately after setting the seed gives indeed identical clusters. But running the complete communities command multiple times - with seed set again immediately before calling community_infomap - does not! So I guess, something goes wrong in networkx2igraph - maybe we must iterate over the graph nodes explicitly sorting them?

from clics2.

xrotwang avatar xrotwang commented on July 1, 2024

Ok, confirmed: The order in which vertices are added to igraph.Graph in pyclics.util.networkx2igraph varies across command calls.

from clics2.

chrzyki avatar chrzyki commented on July 1, 2024

That's a good find! I was dumbfounded by this ...

from clics2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.