Consider moving to estimating the excess variance rather than variance of aggregate connection volumes.

Latent Space Approaches to Aggregate Network Data

This repository contains the notebooks and Stan and Python code required to reproduce the results in the accompanying manuscript "Latent Space Approaches to Aggregate Network Data". From the abstract,

Large-scale network data can pose computational challenges, be expensive to acquire, and compromise the privacy of individuals in social networks. We show that the locations and scales of latent space cluster models can be inferred from the number of connections between groups alone. We demonstrate this modelling approach using synthetic data and apply it to friendships between students collected as part of the Add Health study, eliminating the need for node-level connection data. The method thus protects the privacy of individuals and simplifies data sharing. It also offers performance advantages over node-level latent space models because the computational cost scales with the number of clusters rather than the number of nodes.

Reproducing the Results

Reproducing the results is straightforward by following these steps.

Set up a clean Python environment. This code has been tested with Python 3.10 on macOS and Ubuntu.
Install the Python dependencies by running pip install -r requirements.txt from the root directory of this repository.
Install cmdstan, the command line interface to the probabilistic programming framework Stan, by running python -m cmdstanpy.install_cmdstan --version=2.34.0; this may take a few minutes depending on your machine. Other recent versions of Stan may also be compatible but have not been tested.
Optionally, run make tests to test the installation and runtime environment.
Run make data to download the Adolescent to Adult Health network data.
Run make analysis to run all analysis. The results will be saved in a new workspace folder at the root of the repository. Results comprise .html files summarizing the analysis and .pdf and .png files for the figures in the manuscript.

You can also review the GitHub Action workflow that performs the analysis; example runs are available here.

The source code comprises two parts: first, the Python package alsm (containing the Stan model code and utility functions) and, second, Jupyter notebooks stored as .md jupytext files in the scripts folder (containing the code to run analysis and produce figures). If you are familiar with jupytext, go right ahead and open the .md files as a notebook. If you prefer traditional .ipynb files, run make ipynb to generate .ipynb notebooks which will be stored in the scripts folder.

	# Evaluate the variance of aggregate connection volumes between two clusters. If n2 == 0, we
	# consider the self connection rate.
	'evaluate_aggregate_var': """
	real evaluate_aggregate_var(vector loc1, vector loc2, real scale1, real scale2,
	real propensity, real n1, real n2) {
	real y_ij = evaluate_mean(loc1, loc2, scale1, scale2, propensity);
	real y_ijkl = y_ij ^ 2;
	real y_ijji = evaluate_square(loc1, loc2, scale1, scale2, propensity);
	real y_ijij = y_ij + y_ijji;
	real y_ijil = evaluate_cross(loc1, loc2, scale1, scale2, propensity);
	real y_ijkj = evaluate_cross(loc2, loc1, scale2, scale1, propensity);

	// Between group connections.
	if (n2 > 0) {
	return n1 * n2 * (
	y_ijij
	+ (n2 - 1) * y_ijil
	+ (n1 - 1) * y_ijkj
	- (n1 + n2 - 1) * y_ijkl
	);
	}
	// Within group connections.
	else {
	return n1 * (n1 - 1) * (
	y_ijij
	+ y_ijji
	+ 4 * (n1 - 2) * y_ijil
	- 2 * (2 * n1 - 3) * y_ijkl
	);
	}
	}
	""",

tillahoffmann / alsm Goto Github PK

alsm's Introduction

Latent Space Approaches to Aggregate Network Data

Reproducing the Results

alsm's People

Contributors

Watchers

alsm's Issues

Consider moving to estimating the excess variance rather than variance of aggregate connection volumes.

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent