jonesor / rage Goto Github PK
View Code? Open in Web Editor NEWR tools for performing ageing focussed analyses using matrix population models.
Home Page: https://jonesor.github.io/Rage/
R tools for performing ageing focussed analyses using matrix population models.
Home Page: https://jonesor.github.io/Rage/
The longevity
function returns two components:
These two components seem sufficiently different to me that it might be worthwhile to split them into separate functions, or perhaps add an argument to the existing function to allow the user to select which component(s) to return.
Also, maximum longevity takes ~10x longer to calculate (on my computer: 5.5s vs 0.5s for the whole COMPADRE db), so it would be nice to exclude this extra computation time if the user only wants life expectancy.
I think there's an issue with using startLife > 1
in qsdConverge
. Because the vast majority of matU are reducible (i.e. b/c there are no repro transitions to 'complete the cycle'), convergence to the quasi-stable distribution may depend on the initial population vector, which is a function of startLife
. E.g. in the matrix below, because the stable distribution of the 1st stage is >> 0, and there is no retrogression to stage 1, we would never get convergence with any startLife > 1
.
matU <- rbind(c(0.5, 0.0, 0.0, 0.0),
c(0.1, 0.3, 0.1, 0.0),
c(0.0, 0.1, 0.2, 0.1),
c(0.0, 0.0, 0.1, 0.3))
popbio::stable.stage(matU)
## 0.5000 0.3125 0.1250 0.0625
popdemo::isErgodic(matU)
## FALSE
I'm not sure how to deal with this. Checking isErgodic(matU)
could help, but there are non-ergodic matU that still do converge to the QSD (for some startLife), and a few ergodic but imprimitive matU that don't converge.
standardizedVitalRates
calls four other Rage functions in sequence: rearrangeMatrix
, reprodStages
, collapseMatrix
, then vitalRates
. The first 3 convert an MPM to a standardized format, and the 4th extracts averaged vital rates.
I see no reason to group these 4 components into a single function:
The main reason is that each component function has it's own set of optional arguments (e.g. reprodStages
has different options for dealing with propagule and post-reproductive stages, vitalRates
has different options for how to split and weight vital rates). standardizedVitalRates
doesn't currently support any of these — it simply uses the defaults of the component functions. Adding this support would make standardizedVitalRates
overly complicated, in my opinion.
Second, the four component functions may each fail for different reasons. Combining them in a single function makes debugging more difficult.
Third, a user that wants to extract vital rates from a standardized matrix is likely to want to calculate other things from that standardized matrix as well. But the standardized matrix is not returned by standardizedVitalRates
.
Therefore, I suggest we remove standardizedVitalRates
and encourage users to use the four component functions separately (e.g. with an example on the website or vignette). We could alternatively bundle the first 3 component functions into a new function (e.g. standardizeMatrix
), which I think would be more useful to users, and not quite as complicated.
I still have trouble with this distinction (apparently I'm in good company).
I think the Rage functions that involve vital rates of fecundity (vr_
fns, perturb_vr
, and possibly mpm_to_mx
; using devel branch names here) effectively assume a post-breeding census design. I'm not sure whether it's possible to add pre-breeding census functionality for these, since in a pre-breeding model the newborn survival rate only appears in the fecundity transition — so the birth rate and newborn survival rate are not independently estimable.
I can't decide whether this problem also applies to mpm_to_mx
(and therefore makeLifeTable
), or whether mx trajectories from pre- and post-breeding MPMs are comparable.
Not sure how to resolve this. For now I guess we can just add some documentation to the relevant functions.
dentropy -> dEntropy
kentropy -> kEntropy
It's grown quite diverse. Many of the fns concern age-related analyses (the stated purpose), but others are not really related to ageing (e.g. perturb_
fns, vr_
fns, plotLifeCycle
).
Should we move some of the non-ageing fns to a separate package? I don't have a strong opinion yet, but think it's worth exploring.
As a related side note, plotLifeCycle
has an extensive dependency chain (due to DiagrammR), that has caused some minor CI issues.
Since we're already importing popbio
, I suggest we consistently use popbio::lambda
and popbio::stable.stage
at relevant points within our functions, primarily for ease of code readability, and also for consistency. By my count, the relevant functions include:
collapseMatrix
qsdConverge
matrixElementPerturbation
vitalRatePerturbation
vitalRates
Related to #14 , there's a fair amount of code duplication across a few Rage functions:
makeLifeTable
, dEntropy
, and kEntropy
all use the same loop to calculate survivorship. I believe the loop within longevity
is mathematically equivalent (it's finding the age where lx = 0.01), though coded a bit differently.
makeLifeTable
and dEntropy
use the same loops to calculate age-specific fertility (mx) and clonality (cx).
To avoid some of this duplication and make for easier code maintainability, we could perhaps pull some of the duplicated sections into separate internal functions that get called by the user-facing functions when required.
Alternatively, we could keep the lx/mx/cx functions within makeLifeTable
, and then re-work dEntropy
and kEntropy
so that they take a life table or life table components as arguments, instead of transition matrices.
Hi all
I'm currently working on my fork adding Pace & Shape measurements to Rage
that Annette Baudisch and I have been developing; if we want, these can eventually be incorporated into the package itself (alongside other more standard ones we'll either have or would make sense, e.g. life expectancy, Keyfitz Delta, Gini).
For the new measures (which [humble brag] if I say so myself are more generalisable and comparable than many other shape measures), I need to calculate area under a curve. The only function I know of (MESS::auc
) has a lot of dependencies at 10 or more. I wanna either ask whether people are OK with this, or whether they know of a lighter option (or want to press me to find one), if indeed these measures would be useful in the package eventually.
The maximum longevity component of longevity
finds the age when survivorship (lx) falls to some critical proportion (call this lx_crit
), which is 0.01 with the default setting of initPop = 100
. Specifically, the population starts with initPop
individuals in stage class startLife
, and is projected through time until there is only 1 individual remaining (call this n_crit
, which is fixed at 1).
I think it's potentially confusing that the user effectively adjusts lx_crit
by changing initPop
(i.e. lx_crit = n_crit / initPop = 1 / initPop
).
As far as I can tell, the result depends only on the ratio, not the absolute values of initPop
and n_crit
. So I think it would be more intuitive if initPop
was fixed at 1 and removed as an argument, and the user instead specified a new argument lx_crit
(constrained to the interval [0,1], default = 0.01).
Alternatively, we could keep the function as is, and just improve the documentation.
(Also, related to #15 , I believe the loop within longevity
is mathematically equivalent to the survivorship loop within makeLifeTable
, dEntropy
, and kEntropy
, so some of this duplicated code could be pulled out into a common function.)
I propose using a consistent, human-readable format when creating matrices in the @example
section of Rage functions, e.g.
Format 1
matU <- rbind(c( 0, 0, 0, 0),
c(0.1, 0.1, 0.1, 0),
c( 0, 0.3, 0.3, 0.1),
c( 0, 0, 0.2, 0.1))
Format 2
matU <- rbind(c(0.0, 0.0, 0.0, 0.0),
c(0.1, 0.1, 0.1, 0.0),
c(0.0, 0.3, 0.3, 0.1),
c(0.0, 0.0, 0.2, 0.1))
I think Format 1 is slightly easier to read in the R documentation browser. Thoughts?
The qsdConverge
fn checks for convergence by assessing whether min(dist) >= conv
(where conv
is the convergence threshold, e.g. 5%). If convergence is not reached within the allotted time, the fn returns NA
and a warning.
The fn also returns NA
and the same warning if sum(dist) == 0
, which I guess indicates that the population starts out exactly at the stable distribution. This latter scenario doesn't seem like a problem to me... the fn would otherwise just return 1
if the population starts out at the stable distribution, which seems reasonable. Suggest removing this check.
In life cycles lacking any connection between stage 'start' and the reproductive stage(s), or life cycles lacking reproductive stages altogether, the mature_age
function in the devel branch (taken from lifeTimeRepEvents
) incorrectly returns a value of 1. E.g.
# life cycle with no connection from stage 1 to the repro stage (3)
matU <- rbind(c(0.0, 0.0, 0.0),
c(0.1, 0.1, 0.0),
c(0.0, 0.0, 0.0))
matF <- rbind(c(0.0, 0.0, 4.2),
c(0.0, 0.0, 0.0),
c(0.0, 0.0, 0.0))
Rage::mature_age(matU, matF, start = 1)
>> [1] 1
Suggest updating to return NA
in these cases.
The function mature_distrib
has a similar issue, though only when there is a single reproductive stage class (with no connection from stage 'start').
The Rcompadre repo is the first hit if you google "Rcompadre".
The Rage repo doesn't make page 1 with any of the following (of course search results are personalized, so may vary):
We'll eventually grow in popularity, but I don't think we'd ever be a top hit for just "Rage". I don't have any better ideas at the moment. Just throwing it out there.
makeLifeTable
currently returns a list object with numerous vectors.
Most of these are the constituent columns of a life table, with the addition of R0Fec
and something called TcFec
.
I think that the object returned should be a list object with a data.frame (the life table) and the other two statistics (R0...).
In addition, the description (Rd file) should be updated - it is not correct as it stands.
Please could you check the example can be run without error.
Currently produces -> Error: Expecting matrices with equal dimensions
I think the maxRep
element returned by rearrangeMatrix
is unnecessary, as it's a simple function of another returned element, reproStages
. Suggest removing.
In the initial assignment of the demogstat
variable in matrixElementPerturbation
and vitalRatePerturbation
, I think the if (demogstat == "lambda")
statement should come before the if (is.character(demogstat))
statement. Otherwise the function will fail if demogstat == 'lambda'
.
A few Rage functions (primarily vitalRates
, matrixElementPerturbation
, and vitalRatePerturbation
) involve a lot of intermediate steps, and have complex outputs. I suggest breaking these down into simpler functions to allow users more flexibility (following #20).
Relatedly, I think we should consider transitioning to a naming scheme where groups of similar functions have the same prefix, so that they're easier for users to locate. E.g. Instead of matrixElementPerturbation
and vitalRatePerturbation
, we could use perturb_transitions
and perturb_vitals
. The vital rate functions I propose below could all begin with vitals_
or vr_
, and @iainmstott's shape functions with shape_
.
Break apart vitalRates
I suggest three layers of functions:
(1.) matrix-returning functions (element-specific vital rates of growth, shrinkage, stasis, and reproduction, all conditional on survival)
vr_mat_U
and vr_mat_R
(2.) vector-returning functions (stage-specific vital rate sums)
vr_vec_survival
, vr_vec_growth
, vr_vec_shrinkage
, vr_vec_stasis
, vr_vec_reproduction
, vr_vec_dormancy
(3.) scalar-returning functions (average vital rates across all stages, possibly weighted)
vr_survival
, vr_growth
, vr_shrinkage
, vr_stasis
, vr_reproduction
, vr_dormancy
Each of these should allow the user to specify the set of possible transitions so that we can distinguish between structural zeros and sampled zeros, and to specify stages that should be excluded from calculation (e.g. dormant stages).
Break apart perturbation functions
I haven't thought as much about these, but we could start with pulling the nested loop that's common to both functions into a separate function (i.e. given matA
return a sensitivity matrix based either on lambda or a user-supplied demogstat
argument).
I think we should also avoid returning both sensitivities and elasticities in a single functional call. The type of output could be specified with an additional argument type
, or perhaps we could use separate sensitivity_
and elasticity_
functions.
The mpm_to_
fns (and makeLifeTable
) use arguments N
/nSteps
to specify the number of age classes to calculate over. If a user wants to apply these to many MPMs (e.g. with COMPADRE), they'll probably just set N
to a large constant (e.g. 1e5
). This approach can be computationally time-consuming, and is unnecessary for the many species whose lx trajectories will approach 0 long before N
.
Therefore, suggest adding an argument lxCrit
to these fns (default 1e-4, i.e. 1 in 10,000), and editing such that lx values are calculated until the age class reaches N OR lx drops below lxCrit — whichever occurs first.
On the 150-row Compadre subset, this change reduces the computational time on my computer from 45s to 1s (given N = 1e5
).
If the user wants the fn to ALWAYS calculate to age class N
, they can simply set lxCrit = 0
.
Functions that currently require matU
, matF
, ...
arguments should also accept objects of class CompadreM
, automatically parsing the matrix components or failing (with a helpful error) when a necessary component is missing.
Maybe best to write function generics with methods for the manual entry vs. com(p)adre database objects?
The splitStages
argument can be either "all", "ontogeny", or a vector of standardized matrix classes — e.g. c("prop", "active", "active", "dormant").
I suggest we restructure so that this third option is achieved by setting splitStages = "matrixStages"
and specifying the vector of matrixStages in a separate argument matrixStages
.
I think this would make splitStage
somewhat easier to document and validate, and better matches our use of matrixStages
arguments in other functions.
The weighted
argument can be either FALSE
, "SSD"
, or a vector of stage-specific weights.
I suggest we change "weighted" to "weights", which corresponds slightly better with the available options, and use a default value of NULL
instead of FALSE
for consistency with most other Rage functions (i.e. NULL
corresponds to equal weighting of all stage classes).
Finally, suggest moving the documentation of splitStages
and weighted
from the @return
section to the respective @param
sections.
I think it would be useful, both for us and for users, to have a document summarizing how each Rage function deals with a number of 'special cases' of MPMs (e.g. NAs, all zeros, SurvivalIssue, matU singular).
Not sure what form this should ultimately take / where it should live, but I've created a Google sheet to get us started. It should be editable by anyone with the link, so feel free to add/update/edit.
We need a vignette that guides users through a workflow of using the package.
This can draw on materials used for COMPADRE workshops that are in our compadreDB repository.
More than one Vignette is OK, if necessary.
Develop functions to include various life history traits from Uli's PNAS paper @jonesor
Descriptions of matrices should be made consistent across all functions.
Arguments 'N' (qsd_converge
), 'nSteps' (entropy_
fns), 'maxAge' (longevity
), and 'xmax' (shape_
fns), all refer to the maximum age class to which some quantity should be calculated. Suggest standardizing all to 'xmax'.
Argument 'startLife' is a bit long. Suggest shortening to 'start'.
I think the 'surv' and 'rep' arguments of the shape_
functions could be changed to 'lx' and 'mx', respectively. Arg 'lx' is already used in the lx_to_
functions, and could also be used in the entropy_
functions per #58.
The remaining arguments include a mix of snake_case (e.g. 'exclude_col'), camelCase (e.g. 'reproStages'), and smushedform (e.g. 'demogstat'). Suggest standardizing, though I'm not sure how. I lean slightly toward snake_case (except for matU/matF/matC), which rOpenSci recommends for both function and argument names. But I don't feel too strongly.
Selection of current argument names:
vr_
fns: 'exclude_col', 'weights_col', 'surv_only_na', 'dorm_stages'mpm_
fns: 'reproStages', 'matrixStages'perturb_
fns: 'demogstat'In the perturbation functions, passing the name of a function I've created as a character string to the demogstat
argument doesn't work (except 'lambda'), I think because methods::getFunction
doesn't search the parent environment by default. I suggest switching to base::match.fun
, which does the same thing as getFunction
but searches the parent environment. Switching has the added benefit that we no longer need to import methods
.
(1) The splitStages == "all"
block takes a sum
of weighted vital rates whereas the "ontogeny"
and "matrixStages"
blocks take a mean
. Because the stage-specific vital rates have already been weighted by this point (with weights that sum to 1), I think all blocks should use sum
to obtain a weighted average.
(2) Somewhat separately, I think the weighted sums need a proper divisor (this is related to the issue of structural zeros vs sampled zeros).
The current method sum(weights * vital_rates) / sum(weights)
works if the vector of vital_rates is nonzero for all stage classes, but otherwise we should divide by the sum of the weights for the relevant (i.e. nonzero) stage classes rather than all stage classes.
E.g.
matF <- rbind(c(0, 0, 5),
c(0, 0, 0),
c(0, 0, 0))
weights <- c(1, 1, 1)
weights <- weights / sum(weights)
fecund <- colSums(matF)
fecund_weighted <- weights * fecund
# divide by sum of all weights (current method)
# divisor of sum(weights) is implicit
sum(fecund_weighted) # 1.67
# divide by sum of relevant weights
i <- fecund > 0
sum(fecund_weighted[i]) / sum(weights[i]) # 5.00
In dEntropy
, R0
, and lifeTimeRepEvents
(and possibly makeLifeTable
), I think the conditional structure of the output is overly complicated.
The user may pass:
matF
only, ormatC
only, ormatF
and matC
and the number of items returned depends on which arguments get passed, as do the names of the returned list elements.
In my opinion this conditional behaviour is a pain to document, potentially difficult for new users to understand, and makes the functions harder to program with.
One option for simplification is to only have a single argument for the reproductive matrix (e.g. matR
), to which the user can pass whatever they desire... i.e. one of matF
, matC
, or something like matFC
(= matF
+ matC
).
In this case, I think we would also need to make it easier for the user to derive matFC
as a separate slot/colum within db
(I think @levisc8's proposal in jonesor/Rcompadre#27 to merge the mat
and metadata
slots into a single data
slot goes a long way here).
The function mature_life_expect
(taken from lifeTimeRepEvents
) calculates the difference between life expectancy from stage 'start', and mean age at reproductive maturity (also from stage 'start').
It's described as 'remaining life expectancy at maturity', but I think this description is likely to be confused with 'life expectancy at reproductive maturity' (i.e. life expectancy conditional on achieving reproductive maturity). It's the latter trait that I think most users would be interested in. What mature_life_expect
currently calculates would more accurately be described as 'life expectancy from birth that's remaining at reproductive maturity'.
Suggest deleting mature_life_expect
from Rage, and adding 'from first reproduction' functionality to life_expect
, as in #53.
Outputs from some Rage functions maintain row/column names from the input MPM (matU
/matF
/matC
), where applicable, but others do not. E.g. If matU
has colnames, the output of life_expect
will retain the colname for stage startLife
, whereas the output from longevity
will not.
Suggest standardizing such that:
mpm_collapse
)Each function should begin by checking whether the matrices that are used pass required tests (e.g. isErgodic
, isPrimitive
etc. from popdemo
).
The error/warning messages produced should be consistent accross all functions.
Perhaps also suggesting that the matrices be cleaned using Rcompadre::cleandb
MASS::ginv, would be better than just ginv.
(collapseMatrix
is scheduled to be moved from Rcompadre to Rage, as per jonesor/Rcompadre#28)
The documentation for collapseMatrix
contains the note:
Note: this function is only valid for models without clonality.
I think collapseMatrix
is perfectly valid for models with clonality. The fn was originally written without a matC
option but has since been updated, so I think the note can be removed (unless anyone objects?).
I think we should, however, add a note to the documentation to clarify that collapseMatrix
is only gauranteed to preserve λ and the relative stable distribution, but is not expected to preserve relative reproductive values, sensitivities, net reproductive rates, life expectancy, etc.
Replace matFmu argument in functions standardizevitalrates and reprodStages with more generic argument to identify reproductive stages, because identifying reproductive stages using the mean fecundity matrix is just one possible approach to identifying reproductive stages.
A recent paper in MEE by Torbjørn Ergon et al argues for a greater focus on hazard rates rather than survival probabilities in population analyses.
I think it would be straightforward to add hazard rate outputs to Rage fns vitalRates
, vitalRatePerturbations
, and makeLifeTable
.
If S
is a stage-specific survival probability, the corresponding time-averaged hazard is
h = -log(S)
and the sensitivity of S
with respect to h
is
dS_dh = -exp(-h)
In Matrix Population Models, Caswell discusses methods for calculating R0 in age- vs. stage-classified models (s. 5.3.4). The continuous age-classified version (integral over lx * mx) gives the per-generation population growth rate, which corresponds qualitatively with lambda in terms of being < 1, = 1, or > 1.
To calculate R0 with the same properties from a stage-classified model, Caswell (p. 128) recommends the calculation:
R <- matF %*% N # where N is the fundamental matrix
R0 <- popbio::lambda(R) # i.e. dominant eigenvalue of R
whereas our current calculation is:
R <- matF %*% N
R0 <- R[startLife, startLife]
The two are the same if offspring only arise in stage startLife
, but otherwise differ (only the former is affected by the production of offspring in any stage other than startLife
).
I suggest we update R0 to use the eigenvalue method.
vitalRatePerturbation
doesn't work the way I'd expect. If you pass it an age-structured model (i.e. matU
is a Leslie matrix with survival probabilities along the subdiagonal), it will return non-zero sensitivities for growth, e.g.
matU <- rbind(c(0.0, 0.0, 0.0),
c(0.3, 0.0, 0.0),
c(0.0, 0.5, 0.0))
matF <- rbind(c(0.0, 0.0, 0.8),
c(0.0, 0.0, 0.0),
c(0.0, 0.0, 0.0))
vitalRatePerturbation(matU, matF)
> # SSurvival SGrowth SShrinkage SReproduction SClonality ...
> # 0.8760486 0.8760486 0 0.2054321 0 ...
I would've expected the above model to be decomposed into something with no vital rates for growth, like
matA <- rbind(c( 0, 0, f), # edited... wrote this as a 4x4 initially
c(s1, 0, 0),
c( 0, s2, 0))
I've looked at some stage-structured models too, and compared results to popbio::vitalsens. In general the sensitivities to survival produced by vitalRatePerturbation
correspond to what I'd expect, but sensitivities to the other vital rates do not.
Is there a ref for this method @robcito? I've looked through Caswell 2001, but haven't been able to figure out a method for what I think we want to do here (at least not a general one).
I've copied this over from jonesor/compadreDB#21 (comment) where @RobSalGo first posted this issue.
I think that the stages in the function plotLifeCycle get automatically re-ordered to alphabetical, unless they are given a number before them (which is not very sensible). This makes the reading of the lifecycle from left-to-right a bit less straight forward. It'd be great if we could let the authors choose the order in which the stages appear
Functions longevity
(life expectancy component) and R0
will fail if matU
is singular (i.e. non-invertable). This usually indicates infinite life expectancy due to a 100% stasis loop in the final stage(s).
In jonesor/Rcompadre#30 I suggest adding a column-flag to cleanDB
corresponding to singular matU
, which makes it easier for a user to remove problem rows prior to an analysis.
I propose we also modify longevity
and R0
to return NA
if matU
is singular (and update the documentation accordingly). This would allow the user to retain rows in the db
subset they're working with for which life expectancy or R0 cannot be calculated, which may be convenient if the user is also conducting other analyses with other sets of variables (i.e. the user doesn't have to create a bunch of different subsets of db
for different analyses).
The entropy functions perform calculations on lx/mx trajectories, which the functions first derive from the relevant MPM using mpm_to_lx()
/mpm_to_mx()
.
Suggest making entropy fns act directly on lx/mx trajectories instead of matU/matR, to correspond to the shape_
fns (which act directly on lx/mx trajectories). I think the 2-step workflow is more transparent (i.e. the age-from-stage methods used to derive lx/mx are conceptually independent from the entropy calculation), and allows for alternative age-from-stage methods for deriving lx/mx from an MPM (e.g. #53).
For analyses related to senescence, I generally want to estimate age-related traits starting from the age of first reproduction. Current Rage
functions work great when there is a single obvious stage of first reproduction, but need modification when a life cycle is such that there is more than one possibility.
About 131/422 MPMs in my COMPADRE 'working subset' have multiple possibilities for the stage of first repro. E.g. In the following life cycle (assuming a startLife
of either 1 or 2), individuals might first reproduce either in stage 3 or 4 — stage 4 is twice as likely to be the stage of first reproduction.
matU <- rbind(c(0.0, 0.0, 0.0, 0.0),
c(0.1, 0.1, 0.0, 0.0),
c(0.0, 0.1, 0.6, 0.2),
c(0.0, 0.2, 0.1, 0.1))
matF <- rbind(c(0.0, 0.0, 4.2, 5.1),
c(0.0, 0.0, 0.0, 0.0),
c(0.0, 0.0, 0.0, 0.0),
c(0.0, 0.0, 0.0, 0.0))
Because survivorship from first reproduction varies depending on which stage the first repro occurs in, we'd want to integrate over both possibilities relative to the probability that an individual first reproduces in the given stage.
Suggest working toward this functionality (possibly with new, separate functions) for calculating lx/hx/px, life expectancy, longevity, and time to qsd — all from first reproduction.
collapseMatrix
is currently written in a way that prevents a user from collapsing non-adjacent stages. Specifically, the collapse
argument is a character vector whose elements must reflect single stages or sequences of adjacent stages, e.g.
collapse <- c('1', '2-4', '5')
I suggest we change collapse
to a list
, to allow the user to collapse non-adjacent stages, e.g.
collapse <- list(1, 2:4, 5) # same as above
collapse <- list(1, c(2, 4), c(3, 5)) # collapse non-adjacent stages (e.g. stage 3 inter-reprod.)
I think this would make rearrangeMatrix
unnecessary, because it's only function is to move any inter-reproductive stages to the right side of a matrix, to segregate reproductive and post/non-reproductive stages ahead of collapsing.
If we remove rearrangeMatrix
, reprodStages
can be updated to identify inter-reproductive stages (it already takes the required arguments, i.e. reproStages
).
entropy_k
(formerly kEntropy
) doesn't work as I'd expect with partial survivorship trajectories. For a given hazard function (declining, constant, or increasing) the value of entropy returned depends on how many age classes are used.
For example, the constant-hazard lx trajectory 0.9^(0:10)
yields an entropy of 0.434 (indicating an increasing mortality hazard) whereas the lx trajectory 0.9^(0:100)
yields an entropy of 0.998 (indicating a constant hazard). (Recall entropy < 1 indicates senescence, and entropy > 1 indicates negative senescence.)
x1 <- 0:10
x2 <- 0:100
# with 10 age classes, suggests positive senescence
lx1 <- 0.9^x1 # true lx trajectory has constant hazard
entropy_k(lx1, trapeze = TRUE)
> [1] 0.4341153
# with 100 age classes, suggests negligible senescence
lx2 <- 0.9^x2 # true lx trajectory has constant hazard
entropy_k(lx2, trapeze = TRUE)
> [1] 0.9978724
There are even cases where, for a given rate of change in the mortality hazard, using a different number of age classes qualitatively changes the interpretation regarding positive vs. negative senescence.
# with 10 age classes, suggests positive senescence
hx3 <- exp(-1 - 0.2 * x1) # true hazard function is declining
lx3 <- hx_to_lx(hx3) # survivorship
entropy_k(lx3, trapeze = TRUE)
> [1] 0.8645491
# with 100 age classes, suggests negative senescence
hx4 <- exp(-1 - 0.2 * x2) # true hazard function is declining
lx4 <- hx_to_lx(hx4) # survivorship
entropy_k(lx4)
> [1] 1.692954
At the least, I think we should include a strong warning in the documentation about this behaviour (and direct users to shape_surv
, which does handle partial lx trajectories). On the other hand, users working with MPMs will usually be working with partial survivorship trajectories (e.g. truncated at the quasi-stable distribution), so perhaps we should just remove entropy_k
from Rage altogether?
The function lifeTimeRepEvents
currently produces a list object with up to 8 elements.
I think this should be simplified by breaking apart the function into its constituent functions.
i.e. there can still be a function called lifeTimeRepEvents
that produces such a list, but it would be useful to have separate functions which are called by lifeTimeRepEvents
.
This will solve some problems including (1) users only wanting a single measure, but being forced to calculate 8 (which could be time consuming with big datasets); (2) the function will fail if one of the 8 calculations produces an error; (3) it will become easier to debug where errors are produced with separate functions.
If one or more standardized stages (e.g. pre-reproductive) is not present in the MPM passed to standardizeMatrix
, the collapse
list returned by reprodStages
will contain one or more NA
. In this case, collapseMatrix
will return rows/columns of 0
for the missing stage(s), and then the code at the end of standardizeMatrix
will convert some of these 0
s to NA
.
But it only converts the stasis elements to NA
, rather than the entire row/column of the missing stage.
Suggest moving this NA
conversion to collapseMatrix
to ensure consistent behaviour regardless of whether collapseMatrix
is called directly or indirectly within standardizeMatrix
, and expanding the NA
to the entire missing row/column.
I think the post
argument in reprodStages
— an integer indicator for post-reproductive stages — is unnecessary.
There is a separate argument reproStages
(a logical vector) from which we could just as easily identify post-reproductive stages.
(The use of reprodStages
within standardizeVitalRates
would have to be updated accordingly, but the functionality would remain the same).
I've noticed a few potential bugs in makeLifeTable
, some of which also apply to dEntropy
, and kEntropy
.
makeLifeTable
, dEntropy
, and kEntropy
1. I think the loop to calculate age-specific survivorship
matUtemp = matU
# ...
for (o in 1:nSteps) {
survivorship[o,] = colSums(matUtemp %*% matU)
matUtemp = matUtemp %*% matU
}
should instead be
matUtemp = matU
# ...
for (o in 1:nSteps) {
survivorship[o,] = colSums(matUtemp)
matUtemp = matUtemp %*% matU
}
As written, the code effectively skips over the first age class, for which survivorship should be colSums(matU)
.
Caswell's Matlab code for lx is (Caswell 2001, pg 120):
for x = 1:150
surv(x,:) = sum(T^x);
I think the current R code is effectively doing this:
for x = 1:150
surv(x,:) = sum(T^(x+1));
makeLifeTable
and dEntropy
2. In the loops to calculate age-specific fertility (fertMatrix
) and clonality (clonMatrix
), there are a few potential issues (these only seem to cause problems when startLife > 1
):
diag(t(e) %*% matUtemp2)
is supposed to create a square matrix with vector t(e) %*% matUtemp2
along the diagonal, but because t(e) %*% matUtemp2
returns a 1-row matrix, diag(...)
instead pulls out the single 'diagonal' element of that matrix, [1,1]
. This could be fixed using as.numeric()
, or by using colSums(matUtemp2)
instead of the matrix multiplication.
The single instance of scalar multiplication should be changed to matrix multiplication (Caswell 2001 pg. 120).
The matrix inverse fails anytime a column within matU
contains all zeros (even if it's only the last column). That whole operation is really just dividing columns of matUtemp2
by their sum, which can be coded without the matrix inverse (Caswell 2001 pg. 123).
makeLifeTable
3. In the calculation of Lx
, I think the substract operation should instead be addition, to get the average survivorship between age x and x+1. L[x] = (l[x] + l[x+1]) / 2
4. I think the calculation for Tx
should be sum(L[x]:L[maxAge])
, instead of sum(L[1]:L[x])
(cumsum
is doing the latter).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.