denovo

The de novo method is developed within a causal inference framework and in the context of matched observational studies. The denovo R package implements a novel statistical method that discovers subgroups whose causal effects of the variable of interest (e.g., air pollution on mortality) are statistically significantly different from the population average.

In the first sub-sample, we let data discover the "promising" subgroup with air pollution effects that differ from the population mean. In this step, machine learning approaches (e.g., classification and regression trees (CART) and Causal Tree) are used to discover promising groups. In the second subsample, we develop randomization-based hypothesis tests to confirm whether there is evidence that exposure effects for the newly discovered subgroups are statistically significantly different from the population average causal effect.

Installation

User the following instruction to install the denovo package from source:

install.packages("devtools")
library(devtools)
install_github("fasrc_denovo/master")

There are two R packages (causalTree & Gurobi) that cannot be installed from CRAN. Users need to install these packages manually. To install the "causalTree" package, please use the following instruction (see causalTree for more details):

install.packages("devtools")
library(devtools) 
install_github("susanathey/causalTree")

denovo package uses Gurobi optimizer in sensitivity analyses. For academic use, you can download and install it from here. For R wrapper, please visit Gurobi installation.

Getting Started

denovo functions can be used for both binary and continues outcomes. The discover_subgroups function, get's the first sub-sample and generates a classification and regression tree.

discovered_tree <- discover_subgroups(tr_1, cr_1, covars_1)

In this function, tr is the vector of control outcomes, cr is the vector of control outcomes, and covars is a data.frame for covariates. The output is the discovered tree or classification and regression prediction model. The estimate_subgroups_sig function receives the second sub-sample as well as the prediction model, and estimates the significance of each sub-groups.

analysis <- estimate_subgroups_sig(tr_2, cr_2, covars_sig_2,
                                     tree = discovered_tree$tree,
                                     significance = total_significance,
                                     gamma = gamma)

The estimate_exposure_eff function uses the mentioned functions to discovery of effect modification under no unmeasured confounder assumption. See the following section for more details.

Analyses on Synthetic Data

We provide analyses on synthetic data to address the following research question, which population subgroups have causal effects of air pollution on mortality that are statistically significantly different from population average? The actual study has been conducted on Medicare data, however, the data is not open to public, as a result we redo the process with synthetic data. These analyses are further discussed in Lee et al (2021). Please refer to the following link for more details.

Application of denovo package on simulated dataset

References

Lee, K., Small, D.S. and Dominici, F., 2021. Discovering Heterogeneous Exposure Effects Using Randomization Inference in Air Pollution Studies. Journal of the American Statistical Association, pp.1-12.
Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J., 1984. Classification and Regression Trees, New York: Chapman &Hall/CRC.
Athey, S. and Imbens, G., 2016. Recursive partitioning for heterogeneous causal effects. Proceedings of the National Academy of Sciences, 113(27), pp.7353-7360.

boyuren158 / denovo Goto Github PK

denovo's Introduction

denovo

Installation

Getting Started

Analyses on Synthetic Data

References

denovo's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent