Giter VIP home page Giter VIP logo

causalflow's Introduction

CausalFlow: Causal Discovery Methods with Observational and Interventional Data from Time-series

CausalFlow is a python library for causal analysis from time-series data. It comprises two causal discovery methods recently released in the literature:

Acronym Full-name
F-PCMCI Filtered-PCMCI
CAnDOIT CAusal Discovery with Observational and Interventional data from Time-series

Useful links

Coming soon..

F-PCMCI

Extension of the state-of-the-art causal discovery method PCMCI, augmented with a feature-selection method based on Transfer Entropy. The algorithm, starting from a prefixed set of variables, identifies the correct subset of features and a hypothetical causal model between them. Then, using the selected features and the hypothetical causal model, the causal discovery is executed. This refined set of variables and the list of potential causal links between them contribute to achieving faster and more accurate causal discovery.

In the following, an example demonstrating the main functionality of F-PCMCI is presented, along with a comparison between causal models obtained by PCMCI and F-PCMCI causal discovery algorithms using the same data. The dataset consists of a 7-variables system defined as follows:

$$ \begin{cases} X_0(t) = 2X_1(t-1) + 3X_3(t-1) + \eta_0\\ X_1(t) = \eta_1\\ X_2(t) = 1.1(X_1(t-1))^2 + \eta_2\\ X_3(t) = X_3(t-1)X_2(t-1) + \eta_3\\ X_4(t) = X_4(t-1) + X_5(t-1)X_0(t-1) + \eta_4\\ X_5(t) = \eta_5\\ X_6(t) = \eta_6\\ \end{cases} $$

min_lag = 1
max_lag = 1
np.random.seed(1)
nsample = 1500
nfeature = 7

d = np.random.random(size = (nsample, feature))
for t in range(max_lag, nsample):
  d[t, 0] += 2 * d[t-1, 1] + 3 * d[t-1, 3]
  d[t, 2] += 1.1 * d[t-1, 1]**2
  d[t, 3] += d[t-1, 3] * d[t-1, 2]
  d[t, 4] += d[t-1, 4] + d[t-1, 5] * d[t-1, 0]
Causal Model by PCMCI Causal Model by F-PCMCI
Execution time ~ 8min 40sec Execution time ~ 3min 00sec

F-PCMCI removes the variable $X_6$ from the causal graph (since isolated), and generate the correct causal model. In contrast, PCMCI retains $X_6$ leading to the wrong causal structure. Specifically, a spurious link $X_6$$X_5$ appears in the causal graph derived by PCMCI.

CAnDOIT

CAnDOIT extends F-PCMCI, allowing the possibility of incorporating interventional data in the causal discovery process alongside the observational data.

In the following, an example is presented that demonstrates CAnDOIT's capability to incorporate and exploit interventional data. The dataset consists of a 5-variables system defined as follows:

$$ \begin{cases} X_0(t) = \eta_0\\ X_1(t) = 2.5X_0(t-1) + \eta_1\\ X_2(t) = 0.5X_0(t-2) * 0.75X_3(t-1) + \eta_2\\ X_3(t) = 0.7X_3(t-1)X_4(t-2) + \eta_3\\ X_4(t) = \eta_4\\ \end{cases} $$

This system of equation generates the time-series data in the observational case. For the interventional case instead, the equation $X_1(t) = 2.5X_0(t-1) + \eta_1$ was replaced by a hard intervention $X_1(t) = 15$.

min_lag = 1
max_lag = 2
np.random.seed(1)
nsample_obs = 1000
nsample_int = 300
nfeature = 5
d = np.random.random(size = (nsample_obs, nfeature))
for t in range(max_lag, nsample_obs):
    d[t, 1] += 2.5 * d[t-1, 0]
    d[t, 2] += 0.5 * d[t-2, 0] * 0.75 * d[t-1, 3] 
    d[t, 3] += 0.7 * d[t-1, 3] * d[t-2, 4]


# hard intervention on X_1
d_int1 = np.random.random(size = (nsample_int, nfeature))
d_int1[:, 1] = 15 * np.ones(shape = (nsample_int,)) 
for t in range(max_lag, nsample_int):
    d_int1[t, 2] += 0.5 * d_int1[t-2, 0] * 0.75 * d_int1[t-2, 3] 
    d_int1[t, 3] += 0.7 * d_int1[t-1, 3] * d_int1[t-2, 4]
Ground-truth Causal Model Causal Model by F-PCMCI Causal Model by CAnDOIT
$X_0$ observable $X_0$ hidden $X_0$ hidden
observation samples 1000 observation samples 1000 observation samples 700
intervention samples ✗ intervention samples ✗ observation samples 300

By using interventional data, CAnDOIT removes the spurious link $X_1$$X_2$ generated by the hidden confounder $X_0$.

Other Causal Discovery Algorithms

Although the main contribution of this repository is to present the CAnDOIT and F-PCMCI algorithms, other causal discovery methods have been included for benchmark purposes. As a consequence, CausalFLow provides a collection of causal discovery methods, beyond F-PCMCI and CAnDOIT, that output time-series DAGs (DAGs which comprises the lag specification for each link). They are listed as follows:

Some algorithms are imported from other languages such as R and Java and are then wrapped in Python. Having the majority of causal discovery methods integrated into a single framework, which handles various types of inputs and outputs causal models, can facilitate the use of these algorithms.

Algorithm Feature Selection Observations Interventions
DYNOTEARS
PCMCI
TCDF
tsFCI
VarLiNGAM
F-PCMCI
CAnDOIT

Citation

Please consider citing the following papers depending on which method you use:

  • F-PCMCI:
    L. Castri, S. Mghames, M. Hanheide and N. Bellotto (2023). Enhancing Causal Discovery from Robot Sensor Data in Dynamic Scenarios, Proceedings of the Conference on Causal Learning and Reasoning (CLeaR).

    @inproceedings{castri2023fpcmci,
        title={Enhancing Causal Discovery from Robot Sensor Data in Dynamic Scenarios},
        author={Castri, Luca and Mghames, Sariah and Hanheide, Marc and Bellotto, Nicola},
        booktitle={Conference on Causal Learning and Reasoning (CLeaR)},
        year={2023},
    }
    
  • CAnDOIT:
    L. Castri, S. Mghames, M. Hanheide and N. Bellotto (2024). CAnDOIT: Causal Discovery with Observational and Interventional Data from Time-Series, Under review in Advanced Intelligent System.

Requirements

  • pandas>=1.5.2
  • netgraph>=4.10.2
  • networkx>=2.8.6
  • ruptures>=1.1.7
  • scikit_learn>=1.1.3
  • torch>=1.11.0
  • gpytorch>=1.4
  • dcor>=0.5.3
  • h5py>=3.7.0
  • jpype1>=1.5.0
  • mpmath>=1.3.0
  • causalnex>=0.12.1
  • lingam>=1.8.2
  • tigramite>=5.1.0.3

Installation

Before installing CausalFlow, you need to install Java and the IDTxl package used for the feature-selection process, following the guide described here. Once complete, you can install the current release of CausalFlow with:

# COMING SOON: pip install causalflow

For a complete installation Java - IDTxl - CausalFlow, follow the following procedure.

1 - Java installation

Verify that you have not already installed Java:

java -version

if the latter returns Command 'java' not found, ..., you can install Java by the following commands, otherwise you can jump to IDTxl installation.

# Java
sudo apt-get update
sudo apt install default-jdk

Then, you need to add JAVA_HOME to the environment

sudo nano /etc/environment
JAVA_HOME="/lib/jvm/java-11-openjdk-amd64/bin/java" # Paste the JAVA_HOME assignment at the bottom of the file
source /etc/environment

2 - IDTxl installation

# IDTxl
git clone https://github.com/pwollstadt/IDTxl.git
cd IDTxl
pip install -e .

3 - CausalFlow installation

# COMING SOON: pip install causalflow

Recent changes

Version Changes
4.0.0 package published

causalflow's People

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.