Giter VIP home page Giter VIP logo

orchid's Introduction

Orchid ๐ŸŒธ โ€“ Ollivier-Ricci Curvature for Hypergraphs: A Unified Framework

This repository provides a Julia library and a command-line interface that implements the Ollivier-Ricci Curvature for Hypergraphs in Data (Orchid) Framework.

This project is based on the research paper Ollivier-Ricci Curvature for Hypergraphs: A Unified Framework, published at ICLR 2023.

The full reproducibility package, including the data that can be shared, is available on Zenodo.

If you find this repository helpful, please consider citing our paper!
Orchid Thumbnail
@inproceedings{coupette2023orchid,
    title     = {Ollivier-Ricci Curvature for Hypergraphs: A Unified Framework},
    author    = {Corinna Coupette and Sebastian Dalleiger and Bastian Rieck},
    booktitle = {The Eleventh International Conference on Learning Representations (ICLR)},
    year      = {2023},
    url       = {https://openreview.net/forum?id=sPCKNl5qDps},
    doi       = {10.48550/arXiv.2210.12048}
}

Installation

Julia Library

To install the Orchid Julia library:

julia> using Pkg
julia> Pkg.add(url="https://github.com/aidos-lab/orchid.git")

Alternatively, we can install Orchid from the command line:

julia -e 'using Pkg; Pkg.add(url="https://github.com/aidos-lab/orchid.git")'

Command-Line Interface

To use the command-line interface, we additionally need bin/orchid.jl and its dependencies.

git clone https://github.com/aidos-lab/orchid
julia -e 'using Pkg; Pkg.add(path="./orchid"); Pkg.add.(["ArgParse", "JSON", "Glob", "CodecZlib"])'

Usage

Julia REPL

Assuming the hypergraph resides in variable X:

julia> using Orchid
julia> hypergraph_curvatures(DispersionUnweightedStar, AggregationMax, X, 0.01)

help?> Orchid.hypergraph_curvatures

To inspect the results:

hypergraph_curvatures

Arguments

  • disperser: Dispersion (options: DisperseUnweightedClique, DisperseWeightedClique, or DisperseUnweightedStar โ€“ Orchid paper: ฮผ)

  • aggregator: Aggregation (options: AggregateMean, AggregateMax, or (AggregateMean, AggregateMax) โ€“ Orchid paper: AGG)

  • input: Incidence-matrix or edge-list encoding of the hypergraph

  • alpha: Self-dispersion weight (smoothing parameter corresponding to the laziness of the random walk โ€“ Orchid paper: ฮฑ)

  • cost: Cost computation strategy (options: CostOndemand^, CostMatrix)

    ^ useful for very large hyper graphs.

Command-Line Interface

To use the command-line interface:

chmod +x bin/orchid.jl
bin/orchid.jl --help
bin/orchid.jl --aggregation mean --dispersion WeightedClique -i data/toy.ihg.tsv -o results/toy.orc.json 
bin/orchid.jl --aggregation max --dispersion UnweightedStar --alpha 0.1 -i data/toys.chg.tsv -o results/toys.orc.json

The first execution might take some time.

Bash Scripts

For convenience, we provide bash scripts to perform the curvature computations in the configurations reported in the ICLR paper for the shareable datasets used in the paper as well as (for illustration) for tiny toy data. Both scripts compute curvatures with alpha in {0.0,0.1,0.2,0.3,0.4,0.5} for all combinations of dispersion and aggregation:

  • reproduce.sh: Computation for {dblp,ndc-ai,ndc-pc}.ihg.tsv and {dblp-v,mus,sha,stex,syn_hcm,syn_hcm-hsbm,syn_hnmp,syn_hsbm}.chg.tsv; results are stored to results folder as gzip-compressed JSON files.
  • reproduce_toy.sh: Computation for toy.ihg.tsv and toys.chg.tsv; results are stored to results folder as uncompressed JSON files.

Note that reproduce.sh, when run as-is, will consume considerable computational resources. The easiest way to restrict computation to smaller datasets or some parts of our configuration space is to redefine some of the arrays at the top of the script.

Experiments

To evaluate our curvature results, we require additional python packages. We recommend installing these into a virtual environment, the classic option being venv.

pip install -r experiments/requirements.txt

For our clustering, MMD, and kPCA experiments on collections of hypergraphs, we first compute their curvatures.

bin/orchid.jl --aggregation mean --dispersion WeightedClique -i data/syn_hcm-hsbm.chg.tsv.gz -o results/syn_hcm-hsbm.orc.json.gz 

Then, we evaluate the collection of curvatures using the tools in experiments/.

python experiments/graph-clustering.py -k 2 -i results/syn_hcm-hsbm.orc.json.gz -o gc/syn_hcm-hsbm.gc.json.gz 
python experiments/kpca.py -k 2 -i results/syn_hcm-hsbm.orc.json.gz -o kpca/syn_hcm-hsbm.kpca.json.gz 
python experiments/mmd.py -i results/syn_hcm-hsbm.orc.json.gz -o mmd/syn_hcm-hsbm.mmd.json.gz 

For our node-clustering experiments with individual hypergraphs, we proceed similarly, now computing curvatures before we cluster the nodes.

bin/orchid.jl --aggregation mean --dispersion WeightedClique -i data/dblp.ihg.tsv.gz -o results/dblp.orc.json.gz
python experiments/node-clustering.py -k 2 -i results/dblp.orc.json.gz -o nc/dblp.nc.json.gz

To produce the files containing the competing local features, which can be input to the experiment scripts in place of the curvature files:

python experiments/features.py -i data/sha.chg.tsv.gz -o features/sha.chg.json.gz

Data Formats used by the Command-Line Interface

Inputs

The data underlying our experiments are provided in a concise tsv format which allows us, inter alia, to store an entire hypergraph collection in one file.
The files encoding individual hypergraphs end with ihg.tsv[.gz].
The files encoding collections of hypergraphs end with chg.tsv[.gz].
Nodes are assumed to be consecutive, one-indexed integers.

Individual hypergraphs (ihg): {name}.ihg.tsv.gz

Each row is a hyperedge, with the identifiers of nodes occurring in the hyperedge separated by \t characters.

Example (data/toy.ihg.tsv):

1   2   3   4   5
2   3
5   7   3   6

Collections of hypergraphs (chg): {name}.chg.tsv.gz

Just like the format for individual hypergraphs, except that now the first identifier in each row identifies the hypergraph to which the hyperedge belongs.

Example (data/toys.chg.tsv):

2   1   2   3   4   5
2   2   3
2   5   7   3   6
0   1   2   4
0   1   3   5
0   1   4   6
0   6   4   2   5

Note that Orchid will treat the hypergraphs in the order in which their unique identifiers appear in the input, so in the example above, the hypergraph with ID 2 will occur before the hypergraph with ID 0 in the results.
The example also illustrates that we do not assume the hypergraph identifiers to be one-indexed or consecutive.

Outputs

Curvature files are (optionally: gzip-compressed) JSON files of the form:

[
  {
    "node_curvature_neighborhood":[...],
    "directional_curvature":[
      [...i values...],
      [...j values...],
      [...k values...]
    ],
    "node_curvature_edges":[...],
    "edge_curvature":[...],
    "aggregation":"Orchid.AggregateMax",
    "dispersion":"UnweightedStar",
    "input":"../data/toys.chg.tsv",
    "alpha":0.1
  },
  {
    ...
  }
]

That is, we provide a list of JSON objects, one for each input hypergraph.
If the input is an individual hypergraph, the list will just have one entry.
If the input is a collection of hypergraphs, the list will contain the hypergraphs in the order they were found in the input file.

Disclaimer

We refactored the entire code base and introduced the {ihg,chg}.tsv[.gz] data format after ICLR 2023. The material results are the same, but there might be small deviations in details.

Contributing

Contributions to Orchid are welcome.
If you find any issues or have suggestions for improvements, please open an issue or submit a pull request in the GitHub repository: https://github.com/aidos-lab/orchid

orchid's People

Contributors

dataspider avatar sdall avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.