psolymos / resourceselection Goto Github PK

Resource Selection (Probability) Functions for Use-Availability Data in R

Home Page: https://peter.solymos.org/ResourceSelection/

R 100.00%

cran ecology estimation lele r rsf rspf solymos weighted-distributions

resourceselection's Introduction

Hi there 👋

My name is Péter Sólymos. I am bridging the gap between data and decision making using data analytics, visualization, and infrastructure automation.

R packages by me: https://psolymos.r-universe.dev/
Writing about Shiny hosting: https://hosting.analythium.io/
Edmonton R User Group organizer: https://yegrug.github.io/
Personal website with publications: https://peter.solymos.org/

resourceselection's People

Contributors

Stargazers

Watchers

Forkers

aurielfournier dlizcano kathygcy hugh-allan

resourceselection's Issues

[kdepairs]: Overlay Density to onto Diagonal Histogramme

Dear Mr Solymos,
Thank you so much for this great package! For almost all the data sets I've ever worked on, I always try the kdepairs as the very first step.
However recently I've ran into a bit of a trouble.
I was trying to overlay the density (univariate) onto the histogrammes in the diagonal elements.

I've tried to change the source code but I'm a bit stuck. HELP!

P.S. Please reply either on github or to my email:
[email protected]

#########################################################################
## Scatter Plot Matrix with 2d Contour
kdepairs.default <- function(x, n=25, density=TRUE, contour=TRUE, ...) {
  y <- data.frame(x)
  fun.lower <- function(x1, x2, ...) {
    if (is.factor(x1)) {
      x1 <- as.integer(x1)
    }
    if (is.factor(x2)) {
      x1 <- as.integer(x2)
    }
    OK <- length(unique(x1))>2 && length(unique(x2))>2
    if (!density && !contour)
      n <- 0
    if (n>0 && OK) {
      if (density || contour)
        d <- MASS::kde2d(x1, x2, n=n)
      if (density)
        image(d, col=terrain.colors(50), add=TRUE)
      if (contour)
        graphics::contour(d,add=TRUE)
    } else points(x1, x2)
  }
  fun.upper <- function(x1, x2, ...) {
    if (is.factor(x1)) {
      x1 <- as.integer(x1)
    }
    if (is.factor(x2)) {
      x1 <- as.integer(x2)
    }
    points(x1,x2, col="lightgrey")
    lines(lowess(x1,x2), col="#81D8D0", lty=1, lwd = 2)
    lines(abline(lm(x2~x1), col = "pink3",lty = 1, lwd = 2))
    COR <- cor(x1, x2)
    text(mean(range(x1,na.rm=TRUE)), mean(range(x2,na.rm=TRUE)),
         round(COR, 3), cex=1+abs(COR))
  }
  panel.hist <- function(x, ...) {
    usr <- par("usr"); on.exit(par(usr))
    par(usr = c(usr[1:2], 0, 1.5) )
    h <- hist(x, plot = FALSE)
      #myden = density(x)
      #lines(myden)
    breaks <- h$breaks; nB <- length(breaks)
    y <- h$counts; y <- y/max(y)
    rect(breaks[-nB], 0, breaks[-1], y, col="#81D8D0", ...)
    
    box()
  }

  pairs.default(y, lower.panel=fun.lower, upper.panel=fun.upper, diag.panel=panel.histogram)
  invisible(NULL)
}

Add family method to rsf/rspf objects

This is needed by visreg id scale="response".

Add nonparametric (kde) selection plot

Use density for used and avail data to estimate density, calculate ratio (selection), set mean (1) at N/(N+M).

Function to check that range of used is within range of avail (to avoid div by 0).

This should give nonparametric mep. Overlay the 2.

using non-binary observations

hello, we would like to use your package to calculate hoslem values however currently it only accepts binary observations as input but our data is non-binary, continuous. Is there a way to use it?

dependency 'pbapply' is not available for package 'ResourceSelection'

I tried to install the ResourceSelection and I get following error:
ERROR: dependency 'pbapply' is not available for package 'ResourceSelection'

I tried several repositories and this is always the same. Trying to install pbapply also failed.
I'm using R version 3.0.2 on a Ubuntu server.

Hosmer-Lemeshow test in GLMM (binomial family)

Hi, I ran a GLMM (binomial family) using the lme4 package, and wanted to know if it is appropriate to apply the Hosmer-Lemeshow test to evaluate the model fit. Thank you!

library(lme4)

#GLMM
m1 <- glmer(Canto ~ T + H + Patm + V + Pp + CU + (1|Evento), data = Datos, family = binomial)

Allow changing jitter in mep

Expose the jitter amount so the user can control (make it 0).

rsf, rspf error messages swapped

Implement wrapper for K-fold cross validation

Need error measure and wrapper function.

kdepairs improvements

Improve kdepairs function by recognizing factors (incl. ordinal) and low-unique-valued variables.

continuous-continuous: scatterplot
continuous-discrete: box/violin etc (see mep without quantiles) with proportional width
discrete-discrete: contingency table like plot.

Also improve colour scales for the function, e.g. based on theme.

Add option to use glm in rsf()

Option to use 'PL' (Lele 2009) or 'ML' (logistic glm). glm might result in a speed-up, and could be integrated with visreg package.

Add support for raster objects

Add a function which helps in pulling values (used/avail) from raster and raster stack objects.

Allow newdata to be defined as raster or raster stack.

Need to think about I() and interaction kind of terms (look around how that is done). Polynomials and interactions are really important.

Use parallelOptim

Parallel optim can lead to 80% reduction in comp time:
https://journal.r-project.org/archive/2019/RJ-2019-030/RJ-2019-030.pdf

Document methods

Several methods are not documented, only defined in namespace. Make the help pages more inclusive.

Check NAs when m is constant and throw meaningfull error message

Cases are dropped due to NAs will lead to mismatches in used/avail and ID vectors because objective function can't make a nice rectangular data of the available points.

Catch this case and throw a meaningful message (i.e. metching is OK before cases are dropped, but not after): "Mismatched IDs due to missingness: check NAs in data"

vcov.rsf should not drop non-existent intercept

vcov.rsf <-
function (object, type, ...)
{
    boot <- object$bootstrap
    if (missing(type)) {
        type <- if (is.null(boot))
        "mle" else "boot"
    }
    type <- match.arg(type, c("mle", "boot"))
    if (type == "boot" && is.null(boot))
        stop("no bootstrap results found")
    np <- object$np
    if (type == "boot") {
        rval <- cov(t(boot))
    } else {
        rval <- matrix(NA, np, np)
#        h <- if (object$link == "log") {
#            data.matrix(object$results$hessian[-1,-1,drop=FALSE])
#        } else {
#            object$results$hessian
#        }
#        rval[1:np, 1:np] <- solve(h)
        rval[1:np, 1:np] <- solve(object$results$hessian)
    }
    rval <- data.matrix(rval)
    cf <- coef(object)
    colnames(rval) <- rownames(rval) <- names(cf)
    return(rval)
}

Is the implementation of the Hosmer Lemeshow test correct?

Thank you very much for your very useful package.

By the way, for hoslem.test(), which is the Hosmer Lemeshow test, a
goodness of fit test for multiple logistic regression analysis, I get
the following results when I run it as follows;

data("Titanic")
df <- data.frame(Titanic)
df <- data.frame(Class = rep(df$Class, df$Freq),
                 Sex = rep(df$Sex, df$Freq),
                 Age = rep(df$Age, df$Freq),
                 Survived = rep(df$Survived, df$Freq))
model <- glm(Survived ~ . ,data = df, family = binomial())

library(ResourceSelection)

## ResourceSelection 0.3-5   2019-07-22

HL <- hoslem.test(model$y, model$fitted.values, g = 10)
HL

## 
##  Hosmer and Lemeshow goodness of fit (GOF) test
## 
## data:  model$y, model$fitted.values
## X-squared = 16.733, df = 8, p-value = 0.03301

However, when I check the subgroups, I see the following;

HL$observed

##                
## cutyhat           y0   y1
##   [0.104,0.225] 1211  281
##   (0.225,0.407]  153   70
##   (0.407,0.566]   89   87
##   (0.566,0.736]   13   85
##   (0.736,0.957]   24  188

HL$expected

##                
## cutyhat              yhat0      yhat1
##   [0.104,0.225] 1216.20514  275.79486
##   (0.225,0.407]  139.71270   83.28730
##   (0.407,0.566]   77.99548   98.00452
##   (0.566,0.736]   26.21904   71.78096
##   (0.736,0.957]   29.86764  182.13236

This means that the number of subgroups is 5 instead of the intended 10.
However, the p-value is calculated with 8 degrees of freedom, i.e., the
intended number of subgroups - 2.

I think this is due to the following where hoslem.test() determines
the range of predictions for each subgroup;

qq <- unique(quantile(yhat, probs=seq(0, 1, 1/g)))

Because of the unique() here, the number of subgroups may be less than
intended, but the degree of freedom to compute p-values remains the
same. This is no warning to the user and appears to be calculated
correctly because the calculation is complete.

Is this implementation of the Hosmer-Lemeshow Test correct?

I’m not sure about the stats, so sorry if I’m pointing out the wrong
thing.