mccarthy-m-g / alda Goto Github PK

An R data package for the book "Applied longitudinal data analysis: Modeling change and event occurrence" by Singer and Willett (2003).

Home Page: https://mccarthy-m-g.github.io/alda/

License: Creative Commons Zero v1.0 Universal

R 100.00%

data r longitudinal-data survival-analysis mixed-models r-package time-to-event structural-equation-modeling growth-curves nonlinear-mixed-models

alda's Introduction

alda

This package contains 31 data sets provided by Singer and Willett (2003) with their book, Applied longitudinal data analysis: Modeling change and event occurrence, that are suitable for longitudinal mixed effects modelling, longitudinal structural equation modelling, and survival analysis. All the data sets in this package are real data from real studies; however, most were modified by Singer and Willett (2003) for the illustration of statistical methods, so they may not match the results of the original studies.

There are eleven data sets for longitudinal mixed effects modelling:

?deviant_tolerance: Adolescent tolerance of deviant behaviour (Chapter 2)
?early_intervention: Early educational interventions and cognitive performance (Chapter 3)
?alcohol_use_1: Adolescent and peer alcohol use (Chapters 4 and 6)
?reading_scores: Peabody Individual Achievement Test reading scores (Chapter 5)
?dropout_wages: High school dropout labour market experiences (Chapters 5 and 6)
?depression_unemployment: Unemployment and depression (Chapter 5)
?antidepressants: Antidepressant medication and positive mood (Chapter 5)
?berkeley: Berkeley Growth Study (Chapter 6)
?externalizing_behaviour: Externalizing behaviour in children (Chapter 6)
?cognitive_growth: Cognitive growth in children (Chapter 6)
?opposites_naming: Opposites naming task (Chapter 7)

There is one data set for longitudinal structural equation modelling:

?alcohol_use_2: Adolescent alcohol consumption and peer pressure (Chapter 8)

There are twenty data sets for survival analysis:

?teachers: Years to special education teacher turnover (Chapters 9 and 10)
?cocaine_relapse_1: Weeks to cocaine relapse after treatment (Chapter 10)
?first_sex: Age of first sexual intercourse (Chapters 10 and 11)
?suicide_ideation: Age of first suicide ideation (Chapter 10)
?congresswomen: House of Representatives tenure (Chapter 10)
?tenure: Years to academic tenure (Chapter 12)
?first_depression_1: Age of first depression (Chapter 12)
?first_arrest: Age of first juvenile arrest (Chapter 12)
?math_dropout: Math course history (Chapter 12)
?honking: Time to horn honking (Chapter 13)
?alcohol_relapse: Weeks to alcohol relapse after treatment (Chapter 13)
?judges: Supreme Court justice tenure (Chapters 13 and 15)
?first_depression_2: Age of first depression (Chapter 13)
?health_workers: Length of health worker employment (Chapter 13)
?rearrest: Days to inmate recidivism (Chapters 14 and 15)
?first_cocaine: Age of first cocaine use (Chapter 15)
?cocaine_relapse_2: Days to cocaine relapse after abstinence (Chapter 15)
?psychiatric_discharge: Days to psychiatric hospital discharge (Chapter 15)
?physicians: Physician career history (Chapter 15)
?monkeys: Piagetian monkeys (Chapter 15)

Vignettes and articles

There is one vignette with tips and tricks for working with longitudinal data:

vignette("longitudinal-data-organization")

There are fourteen articles on the package documentation website demonstrating how to recreate examples from the textbook in R:

Chapter 2: Exploring longitudinal data on change
Chapter 3: Introducing the multilevel model for change
Chapter 4: Doing data analysis with the multilevel model for change
Chapter 5: Treating time more flexibly
Chapter 6: Modeling discontinuous and nonlinear change
Chapter 7: Examining the multilevel model’s error covariance structure
Chapter 8: Modeling change using covariance structure analysis
Chapter 9: A framework for investigating event occurrence
Chapter 10: Describing discrete-time event occurrence data
Chapter 11: Fitting basic discrete-time hazard models
Chapter 12: Extending the discrete-time hazard model
Chapter 13: Describing continuous-time event occurrence data
Chapter 14: Fitting the Cox regression model
Chapter 15: Extending the Cox regression model

Documentation

See at https://mccarthy-m-g.github.io/alda/ and also in the installed package: help(package = "alda").

References

Singer, J. D., & Willett, J. B. (2003). Applied longitudinal data analysis: Modeling change and event occurrence. Oxford University Press, USA. https://doi.org/10.1093/acprof:oso/9780195152968.001.0001

alda's People

Contributors

Stargazers

Watchers

Forkers

gaborcsardi

alda's Issues

Fix GHA math output in gt tables (Chapter 7)

For some reason the math output in gt tables for chapter 7 isn't rendering properly with the GHAs build, even though it renders correctly locally. Specifically, the heterogeneous autoregressive and Toeplitz hypothesized matrices aren't rendering in matrix form, and have a couple random characters above them.

Chapter 13: Clean up code examples and add context with prose

Simulate data for chapter 3

Consider simulating data for chapter 3, instead of just leaving it out entirely.

Chapter 6: Clean up code examples and add context with prose

Chapter 10: Clean up code examples and add context with prose

Code for specifying nonlinear mixed models in Chapter 6

Problem

I need help specifying the mixed logistic growth models for the alda::cognitive_growth experiment covered in Chapter 6, section 6.4, with nlme::nlme() and/or lme4::nlmer(). I'm not sure if I'm getting tripped up on the syntax, the math, or both.

Singer and Willet describe Model A and Model B as follows.

Model A

We adopt the following logistic level-1 function as the hypothesized individual change trajectory for the alda::cognitive_growth experiment:

$$Y_{ij} = 1 + \frac{19}{1 + \Pi_{0i} e^{-(\Pi_{1i} t_{ij})}} + \epsilon_{ij}, \tag{6.8}$$

where $Y_{ij}$ represents the number of moves child $i$ makes (nmoves) prior to a fatal error in game $j$ (game); $\Pi_{0i}$ and $\Pi_{1i}$ are individual growth parameters related to the "intercept" and "slope", respectively; and $\epsilon_{ij} \sim N(0, \sigma^2_\epsilon)$. This model invokes two constraints: (1) all children have identical lower and upper asymptotes; and (2) these asymptotes are set to specific values, 1 and 20, which are the minimum and maximum values of nmoves.

And the following linear level-2 model for variation in the individual growth parameters across children:

$$\begin{align} \Pi_{0i} = \gamma_{00} + \zeta_{0i} \\\ \Pi_{1i} = \gamma_{10} + \zeta_{1i} \end{align} \tag{6.9a}$$

where

$$\begin{bmatrix} \zeta_{0i} \\\ \zeta_{1i} \end{bmatrix} \sim N\begin{pmatrix} \begin{bmatrix} 0\\\ 0 \end{bmatrix}, \begin{bmatrix} \sigma^2_0 & \sigma_{10} \\\ \sigma_{10} & \sigma^2_1 \end{bmatrix} \end{pmatrix}. \tag{6.9b}$$

This model stipulates that the level-1 logistic individual growth parameters, $\Pi_{0i}$ and $\Pi_{1i}$, differ across children around unknown population average values, $\gamma_{00}$ and $\gamma_{10}$, with unknown population residual variances $\zeta_{0i}$ and $\zeta_{1i}$, and covariance $\sigma_{10}$. Note that the logistic function chosen here is typically referred to as a generalized logistic function or Richard's curve.

The table below presents the results of fitting this model to the alda::cognitive_growth data that are reported in the textbook (Table 6.6).

$\gamma_{00}$	$\gamma_{10}$	$\sigma^2_0$	$\sigma^2_1$	$\sigma^2_\epsilon$	$\sigma_{01}$
12.9551	0.1227	0.6761	0.0072	13.4005	−0.0586

In Figure 6.10 they present a prototypical trajectory for the average child using predictions from this model.

Model B

We next ask whether we can predict variation in the individual growth parameters. We illustrate this process by asking whether the level-1 individual growth parameters, $\Pi_{0i}$ and $\Pi_{1i}$, differ by the children’s reading_score on a standardized reading test.

Model B postulates the following level-2 submodel:

$$\begin{align} \Pi_{0i} &= \gamma_{00} + \gamma_{01}(\mathrm{READ}_i - \overline{\mathrm{READ}}) + \zeta_{0i} \\\ \Pi_{1i} &= \gamma_{10} + \gamma_{11}(\mathrm{READ}_i - \overline{\mathrm{READ}}) + \zeta_{1i}, \end{align} \tag{6.10a}$$

where we: (1) center reading_score ($\mathrm{READ}$) on the sample mean to preserve the comparability of level-2 parameters across models; and (2) invoke the same assumptions about the level-2 residuals articulated in equation 6.9b.

The table below presents the results of fitting this model to the alda::cognitive_growth data that are reported in the textbook (Table 6.6).

$\gamma_{00}$	$\gamma_{01}$	$\gamma_{10}$	$\gamma_{11}$	$\sigma^2_0$	$\sigma^2_1$	$\sigma^2_\epsilon$	$\sigma_{01}$
12.8840	-0.3745	0.1223	0.0405	0.5610	0.0060	13.4165	-0.0469

In Figure 6.10 they present fitted trajectories for two prototypical children with reading scores two standard deviations above and below the sample mean using predictions from this model.

Additional context

There is an example website for the book showing how to fit the example models in various programs. For the nonlinear mixed models above, the person who wrote the examples notes that:

This model does not correspond to equation (6.8) in the book. Instead, it corresponds to the following equation:
$$Y_{ij} = 1 + \frac{19}{1 + \Pi_{0} * exp(- (\Pi_{1} + u_1)t - u_0)} + \epsilon_{ij}$$

The models in the book were fit in SAS, and for the SAS code examples the estimates on the website line up with what's reported in the book (search "Table 6.6 on page 231" on the page to jump to the relevant section).

However, this model isn't explained further on the website, and it isn't obvious to me where it came from, since it isn't in the book's errata either. I don't believe this is equivalent to the textbook equations.

Based on the SAS code for Model A, the "u" terms are the unknown population residual variances $\zeta_{0i}$ and $\zeta_{1i}$; the "s" terms are the random effects variances $\sigma^2_0$, $\sigma^2_1$, $\sigma^2_\epsilon$; and the "c" term is the covariance $\sigma_{10}$:

proc nlmixed data=foxngeese1 maxiter=1000;
  title2 'Model A: Unconditional logistic growth trajectory';
  parms G00=12 G10=.1 s2e=20 s2u0=.3 cu10=0 s2u1=.01;
  num = 19;
  den = 1 + G00*(exp(-(G10*GAME +u0 +u1*GAME)));
  model NMOVES ~ normal(1+num/den,s2e);
  random u0 u1 ~ normal([0,0],[s2u0,cu10,s2u1]) subject=id;
run;

The main thing I don't understand is how the u_0 term, which I believe corresponds to $\sigma^2_0$, ended up inside the exp() function instead of being added to $\Pi_{0}$.

For Model B they provide the following SAS code:

data foxngeese1;
  set 'c:\alda\foxngeese_pp';
  read=read-1.95625;
  readgame=read*game;
run;

proc nlmixed data=foxngeese1 maxiter=1000;
  title2 'Model B: Effect of READing scores on 'intercept' and 'slope'';
  parms G00=12 G10=.12 G01=-.4 G11=.04 s2e=14 s2u0=.5 cu10=0 s2u1=.01;
  num = 19;
  den = 1 + G00*(exp(-(G01*READ + G10*GAME +G11*READGAME +u0 +u1*GAME)));
  model NMOVES ~ normal(1+num/den,s2e);
  random u0 u1 ~ normal([0,0],[s2u0,cu10,s2u1]) subject=id;
run;

This makes some sense if you apply substitution rules for $\Pi_{1}$ in the website equation (where the centred reading scores are $x$):

$$Y_{ij} = 1 + \frac{19}{1 + \gamma_{00} * e^{-((\gamma_{10} + \gamma_{11}x + u_1)t - \gamma_{01}x - u_0)}} + \epsilon_{ij},$$

but it still isn't clear to me why the u_0 term is where it is, or why the $\gamma_{01}x$ term is there in the same place now too. My intuition is that these should be added to $\gamma_{00}$ in the equation after applying substitution, so they shouldn't end up inside the exp() function.

Chapter 7: Clean up code examples and add context with prose

Chapter 14: Clean up code examples and add context with prose

Chapter 15: Clean up code examples and add context with prose

Add preface, references, and recommended reading sections to chapter example articles

Chapter 12: Clean up code examples and add context with prose

Chapter 4: Clean up code examples and add context with prose

Chapter 8: Clean up code examples and add context with prose

Chapter 9: Clean up code examples and add context with prose

Chapter 5: Clean up code examples and add context with prose

Vignettes for chapter examples

Consider making vignettes (or pkgdown articles, depending on the size of the package) for the examples demonstrated in each chapter. Each chapter would get its own vignette.

Consider adding a "Case studies" section of articles

The motivation behind this idea is the depression_unemployment data. Although it was only used by Singer and Willett in chapter 5 to demonstrate the multilevel model for change, the data contains information such that it could also be used for survival analysis---indeed,in Ginexi and colleagues' study, they actually did fit use both multilevel models and survival analysis. It would be educational to have an article demonstrating this.

This might also be sufficient as a tips and tricks article.