osofr / simcausal Goto Github PK
View Code? Open in Web Editor NEWSimulating Longitudinal and Network Data with Causal Inference Applications
Simulating Longitudinal and Network Data with Causal Inference Applications
Hi,
Just to let the author know that simcausal has been removed from CRAN:
https://cran.r-project.org/web/packages/simcausal/index.html
Hi,
I'd like to use simcausal for mediation analysis but wonder how this can best be done. For natural direct and indirect effects I need to simulate the crossworld counterfactual Y(a, M(a*)), - A is the binary exposure variable and M the mediator. M(a*) is the level of the mediator that it would have naturally been under the reference intervention A=a*.
I'm also particularly interested in simulating randomized interventional analogues of the natural effects in the presence of exposure-induced confounding of the relationship of M and the outcome Y. These effects require the counterfactual Y(a, G(m|a*, c)), with G(m|a*, c) denoting a random draw from the distribution of M given A=a* and non-exposure-induced confounders C=c.
What I did so far was rather intuitive. For the interventional effects, I need to draw from the distribution G(m|a*, c) - therefore I simulated data first and then specified a model (lm or glm,e.g.) for M|a, c for each counterfactual dataset a1 and a0. I used the estimated parameters for specifying G(m|a*, c), G(m|a, c) and the respective outcome variables in new nodes and simulated the counterfactual dataset again to evaluate the interventional effects. Below is a simple example for illustration:
################################################################
#Simple Mediation process with an exposure-induced confounder L
#An "intuitive" approach
################################################################
set.seed(41188)
M <- DAG.empty()
M <- M +
node("a", # Binary exposure variable
distr = "rbern",
prob = 0.53) +
node("c", # Confounder variable C
distr="rnorm", mean=5, sd=2)+
node("l", #Binary confounder variable L -> exposure induced
distr="rbern" ,
prob=ifelse(c>=5, 0.6-0.3a, 0.7-0.3a))+
node("m", #Mediator variable M
distr = "rnorm",
mean=10-2a+c+5l,
sd=3)+
node("u.Y", #Latent variable
distr = "rnorm", mean = 0, sd = 100)+
node("y", #Outcome Variable Y
distr = "rnorm", mean = 2000-200a+10m-5am + 50l -20al+ 15c + u.Y, sd=200)
Mset <- set.DAG(M, latent.v = c("u.Y"))
str(Mset)
a1 <- node("a", distr = "rbern", prob = 1)
Mset <- Mset + action("a1", nodes = a1)
a0 <- node("a", distr = "rbern", prob = 0)
Mset <- Mset + action("a0", nodes = a0)
dat <- simcausal::sim(DAG = Mset, actions = c("a1", "a0"), n = 10000000, rndseed = 345)
head(dat[["a1"]]); head(dat[["a0"]])
plotDAG(Mset, xjitter = 0.2, yjitter = 0.2,
edge_attrs = list(width = 0.5, arrow.width = 0.5, arrow.size = 0.3),
vertex_attrs = list(size = 20, label.cex = 0.8))
#Evaluate distribution M|a,c and M|a*,c
summary(m0<-lm(m~c, data=dat[["a0"]]))
sd.m0<-summary(m0)$sigma; sd.m0
coef.m0<-coefficients(m0); coef.m0
summary(m1<-lm(m~c, data=dat[["a1"]]))
coef.m1<-coefficients(m1); coef.m1
sd.m1<-summary(m1)$sigma; sd.m1
M.r<- M +
node("m.0", distr = "rnorm", mean = 13.7465169 + 0.9006255c, sd=3.828051)+ #insert values by hand
node("m.1", distr = "rnorm", mean = 10.2433650 + 0.9009554 c, sd=3.826656) +
#node("m.0", distr = "rnorm", mean = coef.m0[1]+coef.m0[2]c, sd=sd.m0)+ this syntax doesn't work
#node("m.1", distr = "rnorm", mean =coef.m1[1] + coef.m1[2]c, sd=sd.m1)+
node("y.m0", distr = "rnorm",
mean = 2000-200a+10m.0-5am.0 + 50l -20al+ 15c + u.Y, sd=200)+
node("y.m1", distr = "rnorm",
mean = 2000-200a+10m.1-5am.1 + 50l -20al+ 15c + u.Y, sd=200)
Mset.r <- set.DAG(M.r, latent.v = c("u.Y"))
str(Mset.r)
a1 <- node("a", distr = "rbern", prob = 1)
Mset.r <- Mset.r + action("a1", nodes = a1)
a0 <- node("a", distr = "rbern", prob = 0)
Mset.r <- Mset.r + action("a0", nodes = a0)
##################
##################
#Simulate counterfactual dataset
dat <- simcausal::sim(DAG = Mset.r, actions = c("a1", "a0"), n = 10000000, rndseed = 345)
#Evaluate true randomized interventional analogue for the direct effect
Mset.r <- set.targetE(Mset.r, outcome = "y.m0", param = "a1")
y_r.m0_a1<-eval.target(Mset.r, data = dat)$res ; y_r.m0_a1
Mset.r <- set.targetE(Mset.r, outcome = "y.m0", param = "a0")
y_r.m0_a0<-eval.target(Mset.r, data = dat)$res; y_r.m0_a0
RIA.NDE<-y_r.m0_a1-y_r.m0_a0; RIA.NDE
#Evaluate true randomized interventional analogue for the indirect effect
Mset.r <- set.targetE(Mset.r, outcome = "y.m1", param = "a1")
y_r.m1_a1<-eval.target(Mset.r, data = dat)$res; y_r.m1_a1
RIA.NIE<-y_r.m1_a1-y_r.m0_a1; RIA.NIE
#Evaluate randomized interventional analogue for total effect:
RIA.TE<-y_r.m1_a1-y_r.m0_a0
RIA.TE == RIA.NIE + RIA.NDE
###################################
I'm not quite sure, if this code is ok or if there are more efficient solutions?
Thanks a lot for suggestions!
There is an attempt to load mvtnorm
namespace in examples -
Line 285 in d0c5d8b
mvtnorm
is suggested dependency, it should not be required to run examples according to R-exts. This can be conditionally escaped with requireNamespace.> #---------------------------------------------------------------------------------------
> # EXAMPLE 7: Multivariate random variables
> #---------------------------------------------------------------------------------------
> require("mvtnorm")
Loading required package: mvtnorm
Warning in library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, :
there is no package called ‘mvtnorm’
> D <- DAG.empty()
>
> # 3 dimensional normal (uncorrelated), using rmvnorm function from rmvnorm package:
> D <- D +
+ node(c("X1","X2","X3"), distr = "rmvnorm", mean = c(0,1,2))
Error in node(c("X1", "X2", "X3"), distr = "rmvnorm", mean = c(0, 1, 2)) :
rmvnorm: this node distribution function could not be located
Execution halted
I am trying to specify an action that assigns a value of "abar" to E[t] while a time-varying covariate C[t-1] is 1, and 0 otherwise. The DAG is defined as follows:
tmax<-10
S1.E.1 <-substitute(plogis(-0.30 + 1.00*W1 + 0.08*W2))
S1.E <-substitute(plogis(-0.30 + 1*E[t-1]))
S1.D.1 <-substitute(plogis(-2.50 - 0.10*W1 + 0.20*W2 + 0.07*E[t]))
S1.D <-substitute(plogis(-2.50 - 0.10*W1 + 0.20*W2 + 0.07*E[t]))
S1.Y.1 <-substitute(plogis(-2.50 - 0.50*W1 + 0.50*W2 + 0.10*E[t] + 1.00*S + 0.20*S*E[t]))
S1.Y <-substitute(plogis(-2.50 - 0.05*W1 - 0.25*W2 + 0.30*E[t] + 0.10*Ebar[t-1] + 2.00*H[t-1] + 2.00*S + 0.20*S*Ebar[t-1]))
S1.H.1 <-substitute(plogis(-3.00 + 0.10*W1 + 0.50*W2 + 2.00*S + 0.25*S*E[t]))
S1.H <-substitute(plogis(-3.00 + 0.10*W1 + 0.50*W2 + 2.00*S + 0.25*S*E[t] + 0.50*Ebar[t-1]))
S1.C.1 <-substitute(plogis( 4.00 - 0.05*W1 - 0.25*W2)) #staying at work
S1.C <-S1.C.1
D1 <- DAG.empty() +
node("S", distr = "rconst", const = 0)+
node("TTH", distr = "runif", min = -2, max = 10)+
node("C.star", distr = "rconst", const = floor(TTH))+
node("W1", distr = "rbern", prob = 0.20) +
node("W2", distr = "rbern", prob = 0.20) +
node("E", t=1, distr = "rbern", prob=.(S1.E.1))+
node("D", t=1, distr = "rbern", prob=.(S1.D.1),EFU=TRUE)+
node("Y", t=1, distr = "rbern", prob=.(S1.Y.1), EFU=TRUE)+
node("Ebar", t=1, distr = "rconst", const=E[1])+
node("H", t=1, distr = "rbern", prob=.(S1.H.1))+
node("C", t=1, distr = "rbern", prob=.(S1.C.1))+
node("E", t=2:tmax, distr="rbern", prob=ifelse(C[t-1]==0,0,.(S1.E)))+
node("D", t=2:tmax, distr="rbern", prob=.(S1.D), EFU=TRUE)+
node("Y", t=2:tmax, distr="rbern", prob=.(S1.Y), EFU=TRUE)+
node("Ebar", t=2:tmax, distr="rconst", const=sum(E[1:t])) +
node("H", t=2:tmax, distr="rbern", prob=ifelse(H[t-1]==1,1,.(S1.H)))+
node("C", t=2:tmax, distr="rbern", prob=ifelse(C[t-1]==0,0,.(S1.C)))
RCDAG1 <- set.DAG(D1)
I tried to generate an observed dataset as follows:
K <- 10
act_dynAD0 <-c(
node("E", t = 1:K, distr = "rconst", const = ifelse(t==1,abar,ifelse(C[t-1]==1, abar, 0))),
node("D", t = 1:K, distr = "rconst", const = 0))
DynDact <- RCDAG1 + action("DynE0D0", nodes = act_dynAD0, abar = 0) + action("DynE1D0", nodes = act_dynAD0, abar = 1)
dt <- sim(DynDact,actions = c("DynE0D0","DynE1D0"), n = 1000, rndseed = 4)
And obtained the following warning:
Warning messages:
1: In rbinom(n = n, prob = prob, size = 1) : NAs produced
2: In rbinom(n = n, prob = prob, size = 1) : NAs produced
3: In rbinom(n = n, prob = prob, size = 1) : NAs produced
4: In rbinom(n = n, prob = prob, size = 1) : NAs produced
5: In rbinom(n = n, prob = prob, size = 1) : NAs produced
Many of the E[t] columns contain NAs. Would appreciate your help in resolving this.
latent.v
does not appear to work when output data is in long format
sim(DAG, wide=FALSE)
Hi,
I want to have a Gaussian variable affecting a categorical variable. I tried this which seems to work
C <- rnorm(3,0,2) # or even rnorm(1,0,2)
rcat.b1(250, softmax(C * c(0.5, 0.25, 0.25)))
But running the below only gets me category 4, while there are only three categories?
logsumexp <- function (x) {
y = max(x)
y + log(sum(exp(x - y)))
}
softmax <- function (x) {
exp(x - logsumexp(x))
}
D <- DAG.empty()
N <- 250
D <- D +
# Confounder with Gaussian distribution
node("C",
distr = "rnorm",
mean = 0,
sd = 2) +
# Set BA to categorical (c=3).
# Make sure it starts at 1, to ensure sane priors later.
# Let C be a causal effect on BA.
node("BA",
distr = "rcat.b1",
probs = softmax(C * c(0.5, 0.25, 0.25)))
Dset <- set.DAG(D, vecfun = c("softmax"))
d <- sim(Dset, n = N)
r$> head(d)
ID C BA
1 1 -1.3004380 4
2 2 3.4434901 4
3 3 -1.7159391 4
4 4 3.7295937 4
5 5 -1.7967269 4
6 6 -0.8985413 4
Have I misunderstood something? :)
Allow non-existing time-point references to default to some value, for example, instead of doing this
node("TI", t=0:7, distr="rbern", prob=plogis(-5 - 0.3*CVD + 0.5*A1C[t] + 1.5*{if (t==0) {0} else {TI[t-1]}}))
write just this, with TI[t-1] at t=0 defaulting to 0:
node("TI", t=0:7, distr="rbern", prob=plogis(-5 - 0.3*CVD + 0.5*A1C[t] + 1.5*TI[t-1]))
I am trying to perform a deterministic intervention on a DAG.
library(simcausal)
library(rje)
D <- DAG.empty()
D <- D +
node("U1", distr="runif"
)
D <- D +
node("U2", distr="runif"
)
D <- D +
node("UA", distr="runif"
)
D <- D +
node("UY", distr="runif"
)
D <- D +
node("W1", distr="rconst",
const = as.numeric(U1 < 0.5))
D <- D +
node("W2", distr="rconst",
const = as.numeric(U2 < 0.5))
D <- D +
node("A", distr="rconst",
const = as.numeric(UA < expit(-0.5 + W1 - 1.5 * W2)) )
D <- D +
node("Y", distr="rconst",
const = as.numeric(UY < expit(-0.75 + W1 - 2 * W2 + 2.5 * A + A*W1)),
EFU = TRUE)
setDl2 <- set.DAG(D)
# ----------------------------------------------------------------------
plotDAG(setDl2, xjitter = 0.32, yjitter = 0.03,
edge_attrs = list(width = 0.5, arrow.width = 0.4, arrow.size = 0.8),
vertex_attrs = list(size = 19, label.cex = 1.5))
title(main = "SCM graph")
# ----------------------------------------------------------------------
# intervention
actN_A <- node("A",distr="rconst", const = theta)
D_act <- D + action("A_th0", nodes=actN_A, theta=0)
D_act <- D_act + action("A_th1", nodes=actN_A, theta=1)
D_act <- D_act + action("A_no_intervene", nodes=node("A", distr="rconst", const = as.numeric(UA < expit(-0.5 + W1 - 1.5 * W2)) ))
ate.data <- simfull(A(D_act), n=1e5 , rndseed = 252)
The D_act assignment fails to run under 0.5.0 while suceeds under 0.4.0. Could you please take a look when you have time?
Simulated data contains two ID columns
I'd like to use rcategor
(or rcategor.int
) to assign individuals to a large number of categories, but the following example illustrates an error:
D <- DAG.empty()
D1 <- D + node('group',
distr = 'rcategor.int',
probs = rep(1/50, 50))
D2 <- D + node('group',
distr = 'rcategor.int',
probs = rep(1/55, 55))
D1 <- set.DAG(D1)
D2 <- set.DAG(D2)
D1 is fine. D2, however, presents the following error:
I'm just starting to play with the package. It's great so far!
Is there a way to convert the DAG in the "Brief Overview" section to an igraph object?
I did try using igraph::as.igraph()
and it doesn't seem to work.
The current simulation seems to only allow us to fix the hyper parameters of the network and whenever we simulate, it will first generate a new network and then simulate the covariates.
However, we want to be able to simulate from a fixed network. I.E at most simulate our network according to hyper parameters once, then fix such network, and simulate many different realizations of the covariates and outcomes.
So, I add such functionality by changing the simFromDAG function.
The updates in v0.5.1 break my code, and I trying to understand why. I'm simulating a space-time structure, so I have two indexes s
and t
. simcausal
understands a single dimension, so I map s
and t
to a single dimension via the function tFUN
.
For each node in the DAG, I have a dataset of parameters that I lookup for s
, t
.
Here's a short example:
for(t in t_start:TT ){ # Per month
for(s in s_start:SS ){ # Per location
tpos <- tFUN(s, t, SS)
## Initial month values ##
if(s == 1 & t == 0){
D <- D + node('L1',
distr = 'rnorm',
mean = .(lookup_parameter(pp, node = 'L1', s = s, t = t, variable = '(Intercept)') ),
sd = .(lookup_parameter(pp, node = 'L1', s = s, t = t, variable = 'mse') ),
t = tpos)
}
}
lookup_parameter
basically uses [
to pick off the correct element in the dataset.
With the new updates, the values for s
and t
getting passed to lookup_parameter
are not what I expected, and I get a One of the distribution parameters evaluated to non-standard vector
error. t
is the value I'd expect from simcausal
rather then the value from the loop. s
is a mystery altogether. The other arguments (node
and variable
pass to lookup_parameter
correctly).
I think for my purposes (I'm trying to wrap up this study), I should just roll back to v0.5.0. I figured I would bring this up, in case there is something obvious I'm missing.
Thanks!
I've run into the issue a few times, where when I call simobs
get a cannot find function 'melt'
error. When I run library(data.table)
, the issue goes away. I have not pinpointed if I have other libraries load that may cause the conflict, but I think changing melt
to data.table::melt
on line 749 of simulation.R may fix the issue. Though I see that data.table::melt
has been commented out.
add S3 method plot
for DAG object.
using as.integer() in node formulas prints it to the terminal:
library(simcausal)
D <- DAG.empty()
D <- D +
node("Trexp", distr = "rexp", rate = 1) +
node("Cweib", distr = "rweibull", shape = 0.5, scale = 1) +
node("T", distr = "rconst", const = as.integer(Trexp*100))
setD <- set.DAG(D)
dat <- sim(setD, n=100)
THIS no longer works, because the dimensions of catprob.W0 and catprob.W1 (1 row) do not match the dims of W (n rows)
D <- DAG.empty()
D <- D + node("W", distr = "rbern", prob = 0.3)
catprob.W0 <- cbind(0.7,0.1,0.2)
catprob.W1 <- cbind(0.2,0.1,0.7)
D <- D + node("Cat3", distr = "rcategor.int", probs = (W==0)*.(catprob.W0) + (W==1)*.(catprob.W1))
checkException(Dset2 <- set.DAG(D))
This works however, because simcausal gets a chance to parse c(0.7,0.1,0.2) into a matrix with appropriate number of rows:
D <- DAG.empty()
D <- D + node("W", distr = "rbern", prob = 0.3)
catprob.W0 <- cbind(0.7,0.1,0.2)
catprob.W1 <- cbind(0.2,0.1,0.7)
D <- D + node("Cat3", distr = "rcategor.int", probs = (W==0)*c(0.7,0.1,0.2) + (W==1)*c(0.2,0.1,0.7))
Dset2 <- set.DAG(D)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.