Giter VIP home page Giter VIP logo

alsm's Introduction

Latent Space Approaches to Aggregate Network Data

This repository contains the notebooks and Stan and Python code required to reproduce the results in the accompanying manuscript "Latent Space Approaches to Aggregate Network Data". From the abstract,

Large-scale network data can pose computational challenges, be expensive to acquire, and compromise the privacy of individuals in social networks. We show that the locations and scales of latent space cluster models can be inferred from the number of connections between groups alone. We demonstrate this modelling approach using synthetic data and apply it to friendships between students collected as part of the Add Health study, eliminating the need for node-level connection data. The method thus protects the privacy of individuals and simplifies data sharing. It also offers performance advantages over node-level latent space models because the computational cost scales with the number of clusters rather than the number of nodes.

Reproducing the Results

Reproducing the results is straightforward by following these steps.

  1. Set up a clean Python environment. This code has been tested with Python 3.10 on macOS and Ubuntu.
  2. Install the Python dependencies by running pip install -r requirements.txt from the root directory of this repository.
  3. Install cmdstan, the command line interface to the probabilistic programming framework Stan, by running python -m cmdstanpy.install_cmdstan --version=2.34.0; this may take a few minutes depending on your machine. Other recent versions of Stan may also be compatible but have not been tested.
  4. Optionally, run make tests to test the installation and runtime environment.
  5. Run make data to download the Adolescent to Adult Health network data.
  6. Run make analysis to run all analysis. The results will be saved in a new workspace folder at the root of the repository. Results comprise .html files summarizing the analysis and .pdf and .png files for the figures in the manuscript.

You can also review the GitHub Action workflow that performs the analysis; example runs are available here.

The source code comprises two parts: first, the Python package alsm (containing the Stan model code and utility functions) and, second, Jupyter notebooks stored as .md jupytext files in the scripts folder (containing the code to run analysis and produce figures). If you are familiar with jupytext, go right ahead and open the .md files as a notebook. If you prefer traditional .ipynb files, run make ipynb to generate .ipynb notebooks which will be stored in the scripts folder.

alsm's People

Contributors

tillahoffmann avatar

Watchers

 avatar  avatar

alsm's Issues

Consider moving to estimating the excess variance rather than variance of aggregate connection volumes.

The code currently evaluates the variance of aggregate relational data as follows.

alsm/alsm/stan.py

Lines 53 to 84 in 33a74b9

# Evaluate the variance of aggregate connection volumes between two clusters. If n2 == 0, we
# consider the self connection rate.
'evaluate_aggregate_var': """
real evaluate_aggregate_var(vector loc1, vector loc2, real scale1, real scale2,
real propensity, real n1, real n2) {
real y_ij = evaluate_mean(loc1, loc2, scale1, scale2, propensity);
real y_ijkl = y_ij ^ 2;
real y_ijji = evaluate_square(loc1, loc2, scale1, scale2, propensity);
real y_ijij = y_ij + y_ijji;
real y_ijil = evaluate_cross(loc1, loc2, scale1, scale2, propensity);
real y_ijkj = evaluate_cross(loc2, loc1, scale2, scale1, propensity);
// Between group connections.
if (n2 > 0) {
return n1 * n2 * (
y_ijij
+ (n2 - 1) * y_ijil
+ (n1 - 1) * y_ijkj
- (n1 + n2 - 1) * y_ijkl
);
}
// Within group connections.
else {
return n1 * (n1 - 1) * (
y_ijij
+ y_ijji
+ 4 * (n1 - 2) * y_ijil
- 2 * (2 * n1 - 3) * y_ijkl
);
}
}
""",

However, we subsequently only consider the "excess variance" (as discussed in the context of the neg_binomial_2 to estimate the concentration parameter \phi of the distribution.

real invphi = (var_ - mean) / mean ^ 2;

It may be preferable (from a numerical perspective) to evaluate the excess variance directly.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.