Giter VIP home page Giter VIP logo

scm's Introduction

scm Build Status

The Simplicial Configuration Model is random null model for simplicial complexes, mathematical objects which can be seen as high-order generalizations of simple graphs (they incorporate multi-node interactions). This repository contains a C++ reference implementation of a Markov chain Monte Carlo (MCMC) sampler for this model, see [Phys. Rev. E, 96, 032312 (2017)] for more information.
A summary of compilation / usage instructions can be found below; but see the tutorial if you are looking for detailed instructions and examples.

Table of content

  1. Compilation
  2. Using the sampler
    1. Rejection sampler
    2. MCMC sampler
  3. Publications

Compilation

The sampler has two mandatory dependencies, namely boost::program_options and cmake (for automated building).

To compile from the terminal, use the following commands:

cmake .;
make

The resulting binaries will be built in bin/. CMakes should also allow Windows users to compile the source easily (not tested---let us know!).

If cmake is not installed, the following manual compilation lines should work (for gcc with C++11 support):

g++ -std=c++11 -o3 -c src/scm/scm.cpp -o src/scm/libscm.a  #compile scm library
g++ -std=c++11 -o3 -L src/scm/ -lboost_program_options -l scm -o bin/mcmc_sampler src/mcmc_sampler.cpp  #compile the main binaries (mcmc)
g++ -std=c++11 -o3 -L src/scm/ -lboost_program_options -l scm -o bin/rejection_sampler src/rejection_sampler.cpp  #compile the main binaries (rejection)

Using the sampler

The sampler randomizes an initial facet list, and samples uniformly from the space of all simplicial complexes with the same degree sequence (degree = number of facet incident on a node) and size sequence (size = number of node in a facet). As such, it requires an initial facet list, either artificial or taken from a real system.
We thus provide not one, but two important binaries: rejection_sampler and mcmc_sampler.

Rejection sampler

The rejection method is a very inefficient sampler for the SCM, see our paper. However, since there is---so far---no known constructive procedure to generate SCM instances directly from sequences, it can be used to find one instance, which is then plugged into the much more efficient MCMC sampler as an initial condition.

There are two ways to call bin/rejection_sampler.
The first (default) takes a single positional argument, the path to a facet list, and tries to sample from the associated SCM ensemble:

> bin/rejection_sampler datasets/simple_facet_list.txt  
# Sample:
0 1 
2 3 4 
1 2 3

where the output is a facet list.

The second way to call the rejection sampler is in the sequence mode.
This is accomplished by using the flags --degree_seq_file=path-to-degrees.txt and --size_seq_file=path-to-sizes.txt where path-to-degrees.txt and path-to-sizes.txt are paths to files containing the integer sequences (no particular organization required; will use all integers in the file). There is no known simpliciality test yet, so we make no test on the sequences---convergence is not guaranteed and the sampler might run forever.

Here is a simple example, using small sequences, which we know are simplicial:

> echo "2 2 2 1 1" > d.txt 
> echo "3 3 2" > s.txt
> bin/rejection_sampler -s s.txt -k d.txt
# Sample:
0 1 4
0 1 2
2 3

Note that we have used the the shorthand flags -k and -s for the sequences, see the full list of option for rejection_sampler below:

Usage:
 [Facet list mode] bin/rejection_sampler [--option_1=VAL] ... [--option_n=VAL] path-to-facet-list
 [Seq. mode] bin/rejection_sampler [--option_1=VAL] ... -k path-to-degrees.txt -s path-to-sizes.txt
Options:
  -d [ --seed ] arg             Seed of the pseudo random number generator 
                                (Mersenne-twister 19937). Seed with time if not
                                specified.
  -c [ --cleansed_input ]       In facet list mode, assume that the input is 
                                already cleansed, i.e., that nodes are labeled 
                                with 0 indexed contiguous integers and that no 
                                facet is included in another.
  -k [ --degree_seq_file ] arg  Path to degree sequence file.
  -s [ --size_seq_file ] arg    Path to size sequence file.
  -v [ --verbose ]              Output log messages.
  -h [ --help ]                 Produce help message.

MCMC sampler

Once we have an initial condition (either by using the rejection sampler or a real system), the MCMC is called with the following commad:

bin/mcmc_sampler -f 10000 -b 2000 -t 200 -d 42 seed_facet_list.txt

Here, -f 10000 specifies that 10000 MCMC move will be applied between each samples (the sampling frequency), -b 2000 specifies that the burn-in time equals 2000 (the number of steps to ignore away before sampling begins), -t 200 sets the number of samples to 200, and -d 42 sets the seeds of the RNG to 42 (it will be seeded with the time by default). seed_facet_list.txt is the path to the initial condition file (notice how it is the only positional argument).

Note: All the above commands have sensible default values and can be omitted.

By default the sampler uses the uniform proposal distribution with L_max = 2 max s, see the paper, but the behavior can be changed. We provide two parameterizable proposal distributions, and it is straightforward to implement additional ones.

The provided distributions are

  • exp_prop: Exponential distribution. Draw L with the p.d.f. Pr(l) = exp(lambda * l)/Z where lambda is a parameter, set with --prop_param LAMBDA, and Z is a normalization constant. L is limited to 2,...,L_max.
  • pl_prop: Power law distribution. Draw L with the p.d.f. Pr(l) = l ** (-\lambda)/Z. L is limited to 2,...,L_max.
  • unif_prop: Uniform distribution [Default]. Draw L with the p.d.f. Pr(l) = 1 /(L_max-2). L is limited to 2,...,L_max.

Note: The sampler can handle arbitrary facet lists as input (lines beginning with # will be ignored). However, it is better if facet lists are cleansed from the get go. By clean we mean that nodes are 0 indexed contiguous integers, and there are no included facet. If the data is already cleansed, use the flag -c to skip the pre-processing cleansing steps. See scm/utilities/ for some lightweight python cleansing tools.

The full list of options for mcmc_sampler:

Usage:
  bin/mcmc_sampler [--option_1=VAL] ... [--option_n=VAL] path-to-facet-list
Options:
  -b [ --burn_in ] arg                  Burn-in time. Defaults to M log M, 
                                        where M is the sum of degrees.
  -t [ --sampling_steps ] arg           Number of sampling steps.
  -f [ --sampling_frequency ] arg (=10) Number of step between each sample. 
                                        Defaults to M log M, where M is the sum
                                        of degrees.
  -d [ --seed ] arg                     Seed of the pseudo random number 
                                        generator (Mersenne-twister 19937). 
                                        Seed with time if not specified.
  -l [ --l_max ] arg                    Manually set L_max. The correctness of 
                                        the sampler is not guaranteed if L_max 
                                        < 2 max s. Defaults to 10% of the sum 
                                        of facet sizes. 
  --exp_prop                            Use exponential proposal distribution.
  --pl_prop                             Use power law proposal distribution.
  --unif_prop                           Use uniform proposal distribution 
                                        [default].
  --prop_param arg                      Parameter of the proposal distribution 
                                        (only works for the exponential and 
                                        power law proposal distributions).
  -c [ --cleansed_input ]               Assume that the input is already 
                                        cleansed, i.e., that nodes are labeled 
                                        with 0 indexed contiguous integers and 
                                        that no facet is included in another.
  -v [ --verbose ]                      Output log messages.
  -h [ --help ]                         Produce this help message.

Publications

Please cite:

"Construction of and efficient sampling from the simplicial configuration model"
J.-G. Young, G. Petri, F. Vaccarino, and A. Patania
Phys. Rev. E, 96, 032312 (2017)

Read it on: arXiv | Phys. Rev. E

scm's People

Contributors

jg-you avatar xavier-rp avatar arbenson avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.