Giter VIP home page Giter VIP logo

simcausal's People

Contributors

fkgruber avatar osofr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

simcausal's Issues

Mediation Analysis with simcausal

Hi,
I'd like to use simcausal for mediation analysis but wonder how this can best be done. For natural direct and indirect effects I need to simulate the crossworld counterfactual Y(a, M(a*)), - A is the binary exposure variable and M the mediator. M(a*) is the level of the mediator that it would have naturally been under the reference intervention A=a*.

I'm also particularly interested in simulating randomized interventional analogues of the natural effects in the presence of exposure-induced confounding of the relationship of M and the outcome Y. These effects require the counterfactual Y(a, G(m|a*, c)), with G(m|a*, c) denoting a random draw from the distribution of M given A=a* and non-exposure-induced confounders C=c.

What I did so far was rather intuitive. For the interventional effects, I need to draw from the distribution G(m|a*, c) - therefore I simulated data first and then specified a model (lm or glm,e.g.) for M|a, c for each counterfactual dataset a1 and a0. I used the estimated parameters for specifying G(m|a*, c), G(m|a, c) and the respective outcome variables in new nodes and simulated the counterfactual dataset again to evaluate the interventional effects. Below is a simple example for illustration:

################################################################
#Simple Mediation process with an exposure-induced confounder L
#An "intuitive" approach
################################################################

set seed for reproducability

set.seed(41188)

M <- DAG.empty()

M <- M +
node("a", # Binary exposure variable
distr = "rbern",
prob = 0.53) +
node("c", # Confounder variable C
distr="rnorm", mean=5, sd=2)+
node("l", #Binary confounder variable L -> exposure induced
distr="rbern" ,
prob=ifelse(c>=5, 0.6-0.3a, 0.7-0.3a))+
node("m", #Mediator variable M
distr = "rnorm",
mean=10-2a+c+5l,
sd=3)+
node("u.Y", #Latent variable
distr = "rnorm", mean = 0, sd = 100)+
node("y", #Outcome Variable Y
distr = "rnorm", mean = 2000-200a+10m-5am + 50l -20al+ 15c + u.Y, sd=200)
Mset <- set.DAG(M, latent.v = c("u.Y"))
str(Mset)

specify the two interventions

a1 <- node("a", distr = "rbern", prob = 1)
Mset <- Mset + action("a1", nodes = a1)
a0 <- node("a", distr = "rbern", prob = 0)
Mset <- Mset + action("a0", nodes = a0)

counterfactual data to estimate conditional distribution M|a,c and M|a*,c

dat <- simcausal::sim(DAG = Mset, actions = c("a1", "a0"), n = 10000000, rndseed = 345)

look at counterfactual data

head(dat[["a1"]]); head(dat[["a0"]])

plotDAG(Mset, xjitter = 0.2, yjitter = 0.2,
edge_attrs = list(width = 0.5, arrow.width = 0.5, arrow.size = 0.3),
vertex_attrs = list(size = 20, label.cex = 0.8))

#Evaluate distribution M|a,c and M|a*,c

summary(m0<-lm(m~c, data=dat[["a0"]]))
sd.m0<-summary(m0)$sigma; sd.m0
coef.m0<-coefficients(m0); coef.m0

summary(m1<-lm(m~c, data=dat[["a1"]]))
coef.m1<-coefficients(m1); coef.m1
sd.m1<-summary(m1)$sigma; sd.m1

M.r<- M +
node("m.0", distr = "rnorm", mean = 13.7465169 + 0.9006255c, sd=3.828051)+ #insert values by hand
node("m.1", distr = "rnorm", mean = 10.2433650 + 0.9009554 c, sd=3.826656) +
#node("m.0", distr = "rnorm", mean = coef.m0[1]+coef.m0[2]c, sd=sd.m0)+ this syntax doesn't work
#node("m.1", distr = "rnorm", mean =coef.m1[1] + coef.m1[2]c, sd=sd.m1)+
node("y.m0", distr = "rnorm",
mean = 2000-200
a+10
m.0-5
a
m.0 + 50l -20al+ 15c + u.Y, sd=200)+
node("y.m1", distr = "rnorm",
mean = 2000-200a+10m.1-5am.1 + 50l -20al+ 15c + u.Y, sd=200)
Mset.r <- set.DAG(M.r, latent.v = c("u.Y"))
str(Mset.r)

specify the two interventions

a1 <- node("a", distr = "rbern", prob = 1)
Mset.r <- Mset.r + action("a1", nodes = a1)
a0 <- node("a", distr = "rbern", prob = 0)
Mset.r <- Mset.r + action("a0", nodes = a0)

##################

Evaluate TRUTH

##################
#Simulate counterfactual dataset
dat <- simcausal::sim(DAG = Mset.r, actions = c("a1", "a0"), n = 10000000, rndseed = 345)

#Evaluate true randomized interventional analogue for the direct effect

Mset.r <- set.targetE(Mset.r, outcome = "y.m0", param = "a1")
y_r.m0_a1<-eval.target(Mset.r, data = dat)$res ; y_r.m0_a1

Mset.r <- set.targetE(Mset.r, outcome = "y.m0", param = "a0")
y_r.m0_a0<-eval.target(Mset.r, data = dat)$res; y_r.m0_a0
RIA.NDE<-y_r.m0_a1-y_r.m0_a0; RIA.NDE

#Evaluate true randomized interventional analogue for the indirect effect

Mset.r <- set.targetE(Mset.r, outcome = "y.m1", param = "a1")
y_r.m1_a1<-eval.target(Mset.r, data = dat)$res; y_r.m1_a1

RIA.NIE<-y_r.m1_a1-y_r.m0_a1; RIA.NIE

#Evaluate randomized interventional analogue for total effect:
RIA.TE<-y_r.m1_a1-y_r.m0_a0
RIA.TE == RIA.NIE + RIA.NDE
###################################

I'm not quite sure, if this code is ok or if there are more efficient solutions?
Thanks a lot for suggestions!

fix dependency for rmvnorm

There is an attempt to load mvtnorm namespace in examples -

require("mvtnorm")

As mvtnorm is suggested dependency, it should not be required to run examples according to R-exts. This can be conditionally escaped with requireNamespace.
It is possible there are other cases of that issue in other examples, this one was to first that fails.

> #---------------------------------------------------------------------------------------
> # EXAMPLE 7: Multivariate random variables
> #---------------------------------------------------------------------------------------
> require("mvtnorm")
Loading required package: mvtnorm
Warning in library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE,  :
  there is no package called ‘mvtnorm’
> D <- DAG.empty()
> 
> # 3 dimensional normal (uncorrelated), using rmvnorm function from rmvnorm package:
> D <- D +
+   node(c("X1","X2","X3"), distr = "rmvnorm", mean = c(0,1,2))
Error in node(c("X1", "X2", "X3"), distr = "rmvnorm", mean = c(0, 1, 2)) : 
  rmvnorm: this node distribution function could not be located
Execution halted

Defining an deterministic action

I am trying to specify an action that assigns a value of "abar" to E[t] while a time-varying covariate C[t-1] is 1, and 0 otherwise. The DAG is defined as follows:

tmax<-10
S1.E.1 <-substitute(plogis(-0.30 + 1.00*W1 + 0.08*W2))
S1.E   <-substitute(plogis(-0.30 + 1*E[t-1]))
S1.D.1 <-substitute(plogis(-2.50 - 0.10*W1 + 0.20*W2 + 0.07*E[t]))
S1.D   <-substitute(plogis(-2.50 - 0.10*W1 + 0.20*W2 + 0.07*E[t]))
S1.Y.1 <-substitute(plogis(-2.50 - 0.50*W1 + 0.50*W2 + 0.10*E[t] + 1.00*S + 0.20*S*E[t]))
S1.Y   <-substitute(plogis(-2.50 - 0.05*W1 - 0.25*W2 + 0.30*E[t] + 0.10*Ebar[t-1] + 2.00*H[t-1] + 2.00*S + 0.20*S*Ebar[t-1]))
S1.H.1 <-substitute(plogis(-3.00 + 0.10*W1 + 0.50*W2 + 2.00*S + 0.25*S*E[t]))
S1.H   <-substitute(plogis(-3.00 + 0.10*W1 + 0.50*W2 + 2.00*S + 0.25*S*E[t] + 0.50*Ebar[t-1]))
S1.C.1 <-substitute(plogis( 4.00 - 0.05*W1 - 0.25*W2)) #staying at work
S1.C   <-S1.C.1
D1 <- DAG.empty() +
  node("S",      distr = "rconst", const = 0)+
  node("TTH",    distr = "runif",  min = -2, max = 10)+
  node("C.star", distr = "rconst", const = floor(TTH))+
  node("W1",     distr = "rbern",  prob  = 0.20) +
  node("W2",     distr = "rbern",  prob  = 0.20) +

  node("E",      t=1,       distr = "rbern",  prob=.(S1.E.1))+
  node("D",      t=1,       distr = "rbern",  prob=.(S1.D.1),EFU=TRUE)+
  node("Y",      t=1,       distr = "rbern",  prob=.(S1.Y.1), EFU=TRUE)+
  node("Ebar",   t=1,       distr = "rconst", const=E[1])+
  node("H",      t=1,       distr = "rbern",  prob=.(S1.H.1))+
  node("C",      t=1,       distr = "rbern",  prob=.(S1.C.1))+

  node("E",      t=2:tmax,  distr="rbern",    prob=ifelse(C[t-1]==0,0,.(S1.E)))+
  node("D",      t=2:tmax,  distr="rbern",    prob=.(S1.D), EFU=TRUE)+
  node("Y",      t=2:tmax,  distr="rbern",    prob=.(S1.Y), EFU=TRUE)+ 
  node("Ebar",   t=2:tmax,  distr="rconst",   const=sum(E[1:t])) + 
  node("H",      t=2:tmax,  distr="rbern",    prob=ifelse(H[t-1]==1,1,.(S1.H)))+
  node("C",      t=2:tmax,  distr="rbern",    prob=ifelse(C[t-1]==0,0,.(S1.C)))
RCDAG1 <- set.DAG(D1)

I tried to generate an observed dataset as follows:

K <- 10
act_dynAD0 <-c(
  node("E", t = 1:K, distr = "rconst", const = ifelse(t==1,abar,ifelse(C[t-1]==1, abar, 0))),
  node("D", t = 1:K, distr = "rconst", const = 0)) 
DynDact   <-  RCDAG1 + action("DynE0D0", nodes = act_dynAD0, abar = 0) + action("DynE1D0", nodes = act_dynAD0, abar = 1)
dt    <-  sim(DynDact,actions = c("DynE0D0","DynE1D0"), n = 1000, rndseed = 4)

And obtained the following warning:

Warning messages:
1: In rbinom(n = n, prob = prob, size = 1) : NAs produced
2: In rbinom(n = n, prob = prob, size = 1) : NAs produced
3: In rbinom(n = n, prob = prob, size = 1) : NAs produced
4: In rbinom(n = n, prob = prob, size = 1) : NAs produced
5: In rbinom(n = n, prob = prob, size = 1) : NAs produced

Many of the E[t] columns contain NAs. Would appreciate your help in resolving this.

Gaussian var affecting categorical var

Hi,

I want to have a Gaussian variable affecting a categorical variable. I tried this which seems to work

C <- rnorm(3,0,2) # or even rnorm(1,0,2)
rcat.b1(250, softmax(C * c(0.5, 0.25, 0.25)))

But running the below only gets me category 4, while there are only three categories?

logsumexp <- function (x) {
  y = max(x)
  y + log(sum(exp(x - y)))
}

softmax <- function (x) {
  exp(x - logsumexp(x))
}

D <- DAG.empty()
N <- 250

D <- D +
  # Confounder with Gaussian distribution
  node("C",
      distr = "rnorm",
      mean = 0,
      sd = 2) +

  # Set BA to categorical (c=3).
  # Make sure it starts at 1, to ensure sane priors later.
  # Let C be a causal effect on BA.
  node("BA", 
    distr = "rcat.b1",
    probs = softmax(C * c(0.5, 0.25, 0.25)))

Dset <- set.DAG(D, vecfun = c("softmax"))
d <- sim(Dset, n = N)
r$> head(d)
  ID          C BA
1  1 -1.3004380  4
2  2  3.4434901  4
3  3 -1.7159391  4
4  4  3.7295937  4
5  5 -1.7967269  4
6  6 -0.8985413  4

Have I misunderstood something? :)

Default values for non-existing time-points

Allow non-existing time-point references to default to some value, for example, instead of doing this

node("TI", t=0:7, distr="rbern", prob=plogis(-5 - 0.3*CVD + 0.5*A1C[t] + 1.5*{if (t==0) {0} else {TI[t-1]}}))

write just this, with TI[t-1] at t=0 defaulting to 0:

node("TI", t=0:7, distr="rbern", prob=plogis(-5 - 0.3*CVD + 0.5*A1C[t] + 1.5*TI[t-1]))

Performing intervention in ver. 0.5.0

I am trying to perform a deterministic intervention on a DAG.

library(simcausal)
library(rje)
D <- DAG.empty()

D <- D +
  node("U1", distr="runif"
  )

D <- D +
  node("U2", distr="runif"
       )

D <- D +
  node("UA", distr="runif"
  )

D <- D +
  node("UY", distr="runif"
  )

D <- D +
  node("W1", distr="rconst",
       const = as.numeric(U1 < 0.5))

D <- D +
  node("W2", distr="rconst",
       const = as.numeric(U2 < 0.5))

D <- D +
  node("A", distr="rconst",
       const = as.numeric(UA < expit(-0.5 + W1 - 1.5 * W2)) )

D <- D +
  node("Y", distr="rconst",
       const = as.numeric(UY < expit(-0.75 + W1 - 2 * W2 + 2.5 * A + A*W1)),
       EFU = TRUE)

setDl2 <- set.DAG(D)
# ----------------------------------------------------------------------
plotDAG(setDl2, xjitter = 0.32, yjitter = 0.03,
        edge_attrs = list(width = 0.5, arrow.width = 0.4, arrow.size = 0.8),
        vertex_attrs = list(size = 19, label.cex = 1.5))
title(main = "SCM graph")
# ----------------------------------------------------------------------
# intervention
actN_A <- node("A",distr="rconst", const = theta)

D_act <- D + action("A_th0", nodes=actN_A, theta=0)
D_act <- D_act + action("A_th1", nodes=actN_A, theta=1)
D_act <- D_act + action("A_no_intervene", nodes=node("A", distr="rconst", const = as.numeric(UA < expit(-0.5 + W1 - 1.5 * W2)) ))

ate.data <- simfull(A(D_act), n=1e5 , rndseed = 252)

The D_act assignment fails to run under 0.5.0 while suceeds under 0.4.0. Could you please take a look when you have time?

Error in rcategor and rcategor.int for large numbers of groups

I'd like to use rcategor (or rcategor.int) to assign individuals to a large number of categories, but the following example illustrates an error:

D <- DAG.empty()
D1 <- D + node('group',
              distr = 'rcategor.int',
              probs = rep(1/50, 50)) 
D2 <- D + node('group',
               distr = 'rcategor.int',
               probs = rep(1/55, 55)) 
D1 <- set.DAG(D1)
D2 <- set.DAG(D2)

D1 is fine. D2, however, presents the following error:
image

I'm just starting to play with the package. It's great so far!

#Simulating from a fixed network

The current simulation seems to only allow us to fix the hyper parameters of the network and whenever we simulate, it will first generate a new network and then simulate the covariates.

However, we want to be able to simulate from a fixed network. I.E at most simulate our network according to hyper parameters once, then fix such network, and simulate many different realizations of the covariates and outcomes.

So, I add such functionality by changing the simFromDAG function.

Understanding Non-standard evaluation in 0.5.1

The updates in v0.5.1 break my code, and I trying to understand why. I'm simulating a space-time structure, so I have two indexes s and t. simcausal understands a single dimension, so I map s and t to a single dimension via the function tFUN.

For each node in the DAG, I have a dataset of parameters that I lookup for s, t.

Here's a short example:

for(t in t_start:TT ){ # Per month
    for(s in s_start:SS ){ # Per location

      tpos  <- tFUN(s, t, SS)
      ## Initial month values ##

      if(s == 1 & t == 0){
       D <- D + node('L1',
               distr = 'rnorm',
               mean  = .(lookup_parameter(pp, node = 'L1', s = s, t = t, variable = '(Intercept)') ),
               sd    = .(lookup_parameter(pp, node = 'L1', s = s, t = t, variable = 'mse') ),
               t     = tpos)
 }
}

lookup_parameter basically uses [ to pick off the correct element in the dataset.

With the new updates, the values for s and t getting passed to lookup_parameter are not what I expected, and I get a One of the distribution parameters evaluated to non-standard vector error. t is the value I'd expect from simcausal rather then the value from the loop. s is a mystery altogether. The other arguments (node and variable pass to lookup_parameter correctly).

I think for my purposes (I'm trying to wrap up this study), I should just roll back to v0.5.0. I figured I would bring this up, in case there is something obvious I'm missing.

Thanks!

cannot find function 'melt'

I've run into the issue a few times, where when I call simobs get a cannot find function 'melt' error. When I run library(data.table), the issue goes away. I have not pinpointed if I have other libraries load that may cause the conflict, but I think changing melt to data.table::melt on line 749 of simulation.R may fix the issue. Though I see that data.table::melt has been commented out.

using as.integer() in node formulas prints to the terminal:

using as.integer() in node formulas prints it to the terminal:

library(simcausal)
D <- DAG.empty()
D <- D +
         node("Trexp", distr = "rexp", rate = 1) +
         node("Cweib", distr = "rweibull", shape = 0.5, scale = 1) +
         node("T", distr = "rconst", const = as.integer(Trexp*100))
setD <- set.DAG(D)
dat <- sim(setD, n=100)

Evaluation of categorical by variable name

THIS no longer works, because the dimensions of catprob.W0 and catprob.W1 (1 row) do not match the dims of W (n rows)

  D <- DAG.empty()
  D <- D + node("W", distr = "rbern", prob = 0.3)
  catprob.W0 <- cbind(0.7,0.1,0.2)
  catprob.W1 <- cbind(0.2,0.1,0.7)
  D <- D + node("Cat3", distr = "rcategor.int", probs = (W==0)*.(catprob.W0) + (W==1)*.(catprob.W1))
  checkException(Dset2 <- set.DAG(D))

This works however, because simcausal gets a chance to parse c(0.7,0.1,0.2) into a matrix with appropriate number of rows:

  D <- DAG.empty()
  D <- D + node("W", distr = "rbern", prob = 0.3)
  catprob.W0 <- cbind(0.7,0.1,0.2)
  catprob.W1 <- cbind(0.2,0.1,0.7)
  D <- D + node("Cat3", distr = "rcategor.int", probs = (W==0)*c(0.7,0.1,0.2) + (W==1)*c(0.2,0.1,0.7))
  Dset2 <- set.DAG(D)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.