Giter VIP home page Giter VIP logo

spduration's Introduction

spduration

R-CMD-check CRAN version Codecov test coverage

spduration implements a split-population duration model for duration data with time-varying covariates where a significant subset of the population or spells will not experience failure.

library("spduration")
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
# Prepare data
data(coups)
dur.coups <- add_duration(coups, "succ.coup", unitID="gwcode", tID="year",
                          freq="year")

# Estimate model
model.coups <- spdur(duration ~ polity2, atrisk ~ polity2, data = dur.coups,
                     silent = TRUE)
summary(model.coups)
## Call:
## spdur(duration = duration ~ polity2, atrisk = atrisk ~ polity2, 
##     data = dur.coups, silent = TRUE)
## 
## Duration equation: 
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.00150    0.23762  16.840  < 2e-16 ***
## polity2      0.20588    0.03037   6.779 1.21e-11 ***
## 
## Risk equation: 
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)   6.5278     3.2556   2.005   0.0449 *
## polity2       0.8966     0.4084   2.196   0.0281 *
## 
##            Estimate Std. Error t value Pr(>|t|)
## log(alpha) -0.03204    0.11899  -0.269    0.788
## ---
## Signif. codes: *** = 0.001, ** = 0.01, * = 0.05, . = 0.1
plot(model.coups, type = "hazard")

Install

  • the latest released version from CRAN:
install.packages("spduration")
  • the latest development version:
library(devtools)
install_github("andybega/spduration")

Contact

spduration's People

Contributors

andybega avatar cassyld avatar dainachiba avatar nilswmetternich avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

spduration's Issues

missing value error in `plot_hazard2()`

plot(m108_grid1_mdl, type="hazard2"):

 Error in quantile.default(pr, probs = 0.05) : 
  missing values and NaN's not allowed if 'na.rm' is FALSE 
6 stop("missing values and NaN's not allowed if 'na.rm' is FALSE") 
5 quantile.default(pr, probs = 0.05) 
4 quantile(pr, probs = 0.05) at plot_hazard.r#58
3 plot_hazard2(x, ...) at plot.spdur.R#29
2 plot.spdur(m108_grid1_mdl, type = "hazard2") 
1 plot(m108_grid1_mdl, type = "hazard2") 

Check spdurCrisp methods

Check that methods for spdur work with spdurCrisp. There is/was an issue with unevaluated arguments passed to spdur() within the spdurCrisp() wrapper that would break compatibility with other methods.

Rewrote spdurCrisp() to replaced the spdur() function call with the actual spdurCrisp() function call. Hopefully this fixes it since the arguments now should evaluated in the parent frame that is used by the methods like predict.

xtable won't work with duplicate parameter names

> xtable(model.coups)
% latex table generated in R 3.0.2 by xtable 1.7-4 package
% Tue Oct 21 16:31:52 2014
 \begin{table}[ht]
\centering
\begin{tabular}{rrrr}
  \hline
 & Estimate & StdErr & p \\ 
  \hline
1 & 4.00 & 0.24 & 0.00 \\ 
  2 & 0.21 & 0.03 & 0.00 \\ 
  3 & -0.03 & 0.12 & 0.79 \\ 
  4 & 6.53 & 3.26 & 0.04 \\ 
  5 & 0.90 & 0.41 & 0.03 \\ 
   \hline
\end{tabular}
\end{table}
Warning message:
In data.row.names(row.names, rowsi, i) :
  some row.names duplicated: 5 --> row.names NOT used

Implement "weights" option

Like:

library(CBPS)
library(data.table)
library(spduration)
data("bscoup")
dt <- data.table(bscoup)
dt <- dt[complete.cases(dt)]

# get weights by matching on treatment : recentcoups
fit <- CBPS(recentcoups ~ milreg + instab,
            data = dt,method = "exact")

dt[,coup := as.numeric(coup)-1]

# estimate hazard with with weights
model<- glm(coup ~ instab + recentcoups + wealth,
            weights = fitted(fit),data=dt,family=quasibinomial(link="cloglog"))

summary.spdur for models without risk intercept

Currently summary.spdur uses the location of the risk intercept to split the data frame of coefficients into separate pieces for the duration and risk equation:

from summary.spdur.R:

start_split <- which(names(object$coefficients)=='(Risk Intercept)')
end_duration <- start_split - 1

So if there is no risk intercept, i.e. a model with atrisk ~ -1 + ..., these variables will be missing and will lead to an error.

Maybe ID split by attr(model$mf.dur, "terms") %>% attr(., "intercept") in combination with number of terms.

Add example for how to run spdur with mice

How to run spdur with MI data using the mice package?

Something like this should work, provided there is a tidy.spdur implementation:

library(spduration)
library(mice)

# imp.data = some data imputed using mice
mice.data <- as.mids(as.data.frame(imp.data))

# The mice::with.mids function basically exposes the analysis content in expr
# to each imputed version of the dataset. This works for functions like lm()
# and survreg() because they can just directly access the variables and don't
# need the data argument. But with spdur, we are first augmenting the data
# with add_duration(). So basically I'm below re-assembling a data frame that
# I can use with add_duration()
mice.fit <- with(
  data = mice.data,
  expr = {
    # at this point, the variables in imp.data are directly accessible, e.g. as
    # ARREST, without needing to write imp.data$ARREST
    # use that to re-create a data frame. This will need to include all terms
    # that are used by add_duration() and also in spdur() below
    df <- data.frame(ARREST = ARREST, PUBID_1997 = PUBID_1997,
                     CAL_SURVEY_DATE = CAL_SURVEY_DATE,
                     AGE_FARREST = AGE_FARREST)
    df <- add_duration(df, "ARREST", unitID = "PUBID_1997", tID = "CAL_SURVEY_DATE", freq = "month", ongoing = TRUE)
    spdur(duration ~ AGE_FARREST, atrisk ~ AGE_FARREST, distr = "weibull", data = df, silent = TRUE)
  }
)

# mice::pool needs a broom::tidy method for class "spdur" to work;
# add this is a quick fix; I'll add this to the package at some point
tidy.spdur <- function(x, ...) {
  xx <- as.data.frame(x, row.names = FALSE)
  colnames(xx) <- c("term", "estimate", "std.error", "statistic", "p.value")
  xx$group <- NA
  xx$group[grepl("Dur_", xx$term)] <- "duration"
  xx$group[grepl("Risk_", xx$term)] <- "risk"
  xx$group[grepl("alpha", xx$term)] <- "scale"
  xx$term <- gsub("Dur_|Risk_", "", xx$term)
  xx
}

mice.res <- summary(pool(mice.fit), method = "rubin")
mice.res

Submit to CRAN with R 3.4.2

A new R version, 3.4.2, was released last Thursday, but the package for OS X is not up yet, and Travis CI also still uses 3.4.1. Win-builder already had 3.4.2, but need to re-install R when it is ready and check local and Travis again, then submit to CRAN...

Add extract.spdur for texreg::extract?

This does not work but could be starting point:

extract.spdur<-function(model, beside = FALSE, include.duration = TRUE, include.risk = TRUE, 
    include.aic = TRUE, include.loglik = TRUE, include.nobs = TRUE, 
    include.bic=TRUE, include.dist = TRUE,...) 
{
    s <- summary(model, ...)
    dist <- model$distr
    gof <- numeric()
    gof.names <- character()
    gof.decimal <- logical()
    if (include.nobs == TRUE) {
        n <- model$obs
        gof <- c(gof, n)
        gof.names <- c(gof.names, "Num. obs.")
        gof.decimal <- c(gof.decimal, FALSE)
    }
    if (include.loglik == TRUE) {
        lik <- logLik(model)[1]
        gof <- c(gof, lik)
        gof.names <- c(gof.names, "Log Likelihood")
        gof.decimal <- c(gof.decimal, TRUE)
    }
    if (include.aic == TRUE) {
        aic <- AIC(model)
        gof <- c(gof, aic)
        gof.names <- c(gof.names, "AIC")
        gof.decimal <- c(gof.decimal, TRUE)
    }
    if (include.bic == TRUE) {
        bic <- BIC(model)
        gof <- c(gof, bic)
        gof.names <- c(gof.names, "BIC")
        gof.decimal <- c(gof.decimal, TRUE)
    }
    duration <- s$duration
    alpha <- s$alpha
    risk <- s$split
    if (beside == FALSE) {
        if (include.duration == TRUE && include.risk == TRUE) {
            names(duration) <- paste("Duration model:", rownames(duration))
            names(risk) <- paste("Risk model:", rownames(risk))
            coef.block <- rbind(duration, alpha,risk)
        }
        else if (include.duration == TRUE) {
            coef.block <- rbind(duration, alpha)
        }
        else if (include.risk == TRUE) {
            coef.block <- risk
        }
        else {
            stop(paste("Either the include.duration or the include.risk argument", 
                "must be TRUE."))
        }
        names <- rownames(coef.block)
        co <- coef.block[, 1]
        se <- coef.block[, 2]
        pval <- coef.block[, 4]
        tr <- createTexreg(coef.names = names, coef = co, se = se, 
            pvalues = pval, gof.names = gof.names, gof = gof, 
            gof.decimal = gof.decimal)
        return(tr)
    }
    else {
        trList <- list()
        d.names <- c(names(duration), "log(alpha)")
        d.co <- c(duration[, 1], alpha[, 1])
        d.se <- c(duration[, 2], alpha[, 2])
        d.pval <- c(duration[, 4],alpha[, 4])
        r.names <- rownames(risk)
        r.co <- risk[, 1]
        r.se <- risk[, 2]
        r.pval <- risk[, 4]
        if (include.duration == TRUE) {
            tr <- createTexreg(coef.names = d.names, coef = d.co, 
                se = d.se, pvalues = d.pval, gof.names = gof.names, 
                gof = gof, gof.decimal = gof.decimal, 
                model.name = paste("Duration model (", dist,")", sep=""))
            trList[[length(trList) + 1]] <- tr
        }
        if (include.risk == TRUE) {
            tr <- createTexreg(coef.names = r.names, coef = r.co, 
                se = r.se, pvalues = r.pval, gof.names = gof.names, 
                gof = gof, gof.decimal = gof.decimal, model.name = "Risk model")
            trList[[length(trList) + 1]] <- tr
        }
        if (length(trList) == 0) {
            stop(paste("Either the include.duration or the include.risk argument", 
                "must be TRUE."))
        }
        return(trList)
    }
}

setMethod("extract", signature = className("spdur", "spduration"),
    definition = extract.spdur)

predict weibull >1

Happens around lines 76-78:

if (stat == "conditional hazard")
res <- atrisk.t * ft/pmax(1e-10, (cure.t + atrisk.t * st))

from spduration-paper example debugging:

Browse[2]> s0[2014]
2562
0.007962334
Browse[2]> st[2014]
2562
0.001401938
Browse[2]> ft[2014]/st[2014]
2562
1.752906

spdur() and missing value options

spdur() needs an option for how to deal with missing values, similar to glm's na.action option maybe. Right now any missing values in the design matrix will stop the function.

Forecast failure prob x months ahead

Instead of individual monthly forecast probabilities, an option to calculate the cummulative probability of failure in the next x time periods.

Could be as simple as 1 - prod( 1 - h(t) ) over 1 through desired time periods in the future.

clarify how `forecast` works

Documentation should be more explicit about the assumptions, i.e. carry forward extrapolation of x and updating under assumption of no event for duration.

More fundamentally, the way forecasting should be done to best take advantage of existing information depends on the lag structure of the model. The best way to forecast, e.g., if the DV y is lead and covariates are not lagged versus a model in which x is lagged and the DV y is not lead is different. There is no easy way to incorporate knowledge of these modeling assumptions without adding attributes to the data/model frames, so maybe better to change forecast so that failure information (y) and extrapolated covariates (newx?) can be passed explicitly.

Change `spdur` object to record coefs in separate lists

Currently coefficients for duration and risk equations, along with estimate for alpha, are returned by optim in one data frame, and saved in the spdur object in one data frame. Later in summary.spdur they are split based on location of (Risk Intercept). This doesn't work for models with risk ~ -1..., and finding a workaround that also works when the risk equation include factor variables is tricky, because for factor variables a term could expand to level-1 terms.

See how pscl:zeroinfl handles this, i.e. coefficients = list(count = coefc, zero = coefz).

Will need to change summary.spdur and methods like coef.

Release spduration 0.17.2

There was on issue on CRAN with notes for r-devel:

Documented arguments not in \usage in Rd file 'hazard.Rd':
‘x’

Functions with \usage entries need to have the appropriate \alias
entries, and all their arguments documented.
The \usage entries must correspond to syntactically valid R code.
See chapter ‘Writing R documentation files’ in the ‘Writing R
Extensions’ manual.

Prepare for release:

  • git pull
  • Check current CRAN check results
  • Polish NEWS
  • urlchecker::url_check()
  • devtools::build_readme()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • revdepcheck::revdep_check(num_workers = 4)
  • Update cran-comments.md
  • git push

Submit to CRAN:

  • usethis::use_version('patch')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • usethis::use_github_release()
  • usethis::use_dev_version(push = TRUE)

spdur loglog version

library(spduration)

data(insurgency)
duration.ins <- buildDuration(insurgency, 'insurgency', unitID='ccode', tID='date')

model_loglog <- spdur(
duration ~ low_intensity + high_neighbors + exclpop.l1,
atrisk ~ excl_groups_count.l1 + high_neighborhood + high_intensity + exclpop.l1 + lgdppc.l1,
last='end.spell', data=duration.ins, distr="loglog", max.iter=300)

Fitting base loglog...
Error in optim(base.inits, loglog.lik, method = "BFGS", control = list(maxit = max.iter), :
initial value in 'vmmin' is not finite

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.