Giter VIP home page Giter VIP logo

denovo's Introduction

denovo

The de novo method is developed within a causal inference framework and in the context of matched observational studies. The denovo R package implements a novel statistical method that discovers subgroups whose causal effects of the variable of interest (e.g., air pollution on mortality) are statistically significantly different from the population average.

In the first sub-sample, we let data discover the "promising" subgroup with air pollution effects that differ from the population mean. In this step, machine learning approaches (e.g., classification and regression trees (CART) and Causal Tree) are used to discover promising groups. In the second subsample, we develop randomization-based hypothesis tests to confirm whether there is evidence that exposure effects for the newly discovered subgroups are statistically significantly different from the population average causal effect.

Installation

User the following instruction to install the denovo package from source:

install.packages("devtools")
library(devtools)
install_github("fasrc_denovo/master")

There are two R packages (causalTree & Gurobi) that cannot be installed from CRAN. Users need to install these packages manually. To install the "causalTree" package, please use the following instruction (see causalTree for more details):

install.packages("devtools")
library(devtools) 
install_github("susanathey/causalTree")

denovo package uses Gurobi optimizer in sensitivity analyses. For academic use, you can download and install it from here. For R wrapper, please visit Gurobi installation.

Getting Started

denovo functions can be used for both binary and continues outcomes. The discover_subgroups function, get's the first sub-sample and generates a classification and regression tree.

discovered_tree <- discover_subgroups(tr_1, cr_1, covars_1)

In this function, tr is the vector of control outcomes, cr is the vector of control outcomes, and covars is a data.frame for covariates. The output is the discovered tree or classification and regression prediction model. The estimate_subgroups_sig function receives the second sub-sample as well as the prediction model, and estimates the significance of each sub-groups.

analysis <- estimate_subgroups_sig(tr_2, cr_2, covars_sig_2,
                                     tree = discovered_tree$tree,
                                     significance = total_significance,
                                     gamma = gamma)

The estimate_exposure_eff function uses the mentioned functions to discovery of effect modification under no unmeasured confounder assumption. See the following section for more details.

Analyses on Synthetic Data

We provide analyses on synthetic data to address the following research question, which population subgroups have causal effects of air pollution on mortality that are statistically significantly different from population average? The actual study has been conducted on Medicare data, however, the data is not open to public, as a result we redo the process with synthetic data. These analyses are further discussed in Lee et al (2021). Please refer to the following link for more details.

References

  • Lee, K., Small, D.S. and Dominici, F., 2021. Discovering Heterogeneous Exposure Effects Using Randomization Inference in Air Pollution Studies. Journal of the American Statistical Association, pp.1-12.
  • Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J., 1984. Classification and Regression Trees, New York: Chapman &Hall/CRC.
  • Athey, S. and Imbens, G., 2016. Recursive partitioning for heterogeneous causal effects. Proceedings of the National Academy of Sciences, 113(27), pp.7353-7360.

denovo's People

Contributors

naeemkh avatar kwonsang avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.