- Clone the reposity
- Open the Rproj file in Rstudio (for example by double-clicking it)
- Open the
R/simulation.R
file - Run it
- Simulates data with the following setup:
- Outcome is "event" (0/1)
- Outcome is misclassified. So there is an unobserved true outcome,
y
, and an observed, potentially error-prone, outcomeystar
- Predictor
x
is "vaccine" (0/1) - Everything, including misclassfication table, is parameterized as a logistic regression. So parameters in these tables are log-linear.
- Defines estimators you might apply to such data. Currently the following are implemented:
- "Naive", pretending there is no misclassfication: a logistic regression of oobserved
ystar
onx
. And the relative risk calculated on the observedystar
- "MLE": this formulates the (true) model, including misclassification. This model is estimated by maximizing the marginal log-likelihood directly.
- "EM": this formulates the same (true) model as MLE, but uses a different estimation method (EM), which is usually more stable.
- "Naive", pretending there is no misclassfication: a logistic regression of oobserved
- Defines and runs a simulation study (experiment) in which the following factors are varieed:
- Sample size
n
- The sensitivity and specificity of the event registration (reworked into loglinear paramters
tau
andlambda
) - The overall base event rate, parameterized with logit parameter
alpha
- The true effect size, with logistic regression coefficient
beta
(fixed at 0.2 for the moment) - The overall vaccination rate (fixed at 0.7 for the moment)
- Sample size
At the moment, only nondifferential error is examined.
"Obviously", the MLE is unbiased, especially as n
increases and/or alpha
gets closer to zero (more events observed). However, in terms of mean-squared-error, it is almost never worthwhile to use the MLE. This is because, in this simple nondifferential setup, only the specificity, which we can assume to be excellent, is relevant to bias in beta
. So there is little to no payoff for trading unbiasedness for the considerably larger variance in the MSE relative to the naive estimator.
The same results may not hold for relative risk (not examined yet).
The same results will not hold for differential error.